From: Dave Chinner <david@fromorbit.com>
To: linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org
Subject: [3.2-rc3] OOM killer doesn't kill the obvious memory hog
Date: Thu, 1 Dec 2011 20:36:44 +1100 [thread overview]
Message-ID: <20111201093644.GW7046@dastard> (raw)
Testing a 17TB filesystem with xfstests on a VM with 4GB RAM, test
017 reliably triggers the OOM killer, which eventually panics the
machine after it has killed everything but the process consuming all
the memory. The console output I captured from the last kill where
the panic occurs:
[ 302.040482] Pid: 16666, comm: xfs_db Not tainted 3.2.0-rc3-dgc+ #105
[ 302.041959] Call Trace:
[ 302.042547] [<ffffffff810debfd>] ? cpuset_print_task_mems_allowed+0x9d/0xb0
[ 302.044380] [<ffffffff8111afae>] dump_header.isra.8+0x7e/0x1c0
[ 302.045770] [<ffffffff8111b22c>] ? oom_badness+0x13c/0x150
[ 302.047074] [<ffffffff8111bb23>] out_of_memory+0x513/0x550
[ 302.048524] [<ffffffff81120976>] __alloc_pages_nodemask+0x726/0x740
[ 302.049993] [<ffffffff81155183>] alloc_pages_current+0xa3/0x110
[ 302.051384] [<ffffffff8111814f>] __page_cache_alloc+0x8f/0xa0
[ 302.052960] [<ffffffff811185be>] ? find_get_page+0x1e/0x90
[ 302.054267] [<ffffffff8111a2dd>] filemap_fault+0x2bd/0x480
[ 302.055570] [<ffffffff8106ead8>] ? flush_tlb_page+0x48/0xb0
[ 302.056748] [<ffffffff81138a1f>] __do_fault+0x6f/0x4f0
[ 302.057616] [<ffffffff81139cfc>] ? do_wp_page+0x2ac/0x740
[ 302.058609] [<ffffffff8113b567>] handle_pte_fault+0xf7/0x8b0
[ 302.059557] [<ffffffff8107933a>] ? finish_task_switch+0x4a/0xf0
[ 302.060718] [<ffffffff8113c035>] handle_mm_fault+0x155/0x250
[ 302.061679] [<ffffffff81acc902>] do_page_fault+0x142/0x4f0
[ 302.062599] [<ffffffff8107958d>] ? set_next_entity+0xad/0xd0
[ 302.063548] [<ffffffff8103f6d2>] ? __switch_to+0x132/0x310
[ 302.064575] [<ffffffff8107933a>] ? finish_task_switch+0x4a/0xf0
[ 302.065586] [<ffffffff81acc405>] do_async_page_fault+0x35/0x80
[ 302.066570] [<ffffffff81ac97b5>] async_page_fault+0x25/0x30
[ 302.067509] Mem-Info:
[ 302.067992] Node 0 DMA per-cpu:
[ 302.068652] CPU 0: hi: 0, btch: 1 usd: 0
[ 302.069444] CPU 1: hi: 0, btch: 1 usd: 0
[ 302.070239] CPU 2: hi: 0, btch: 1 usd: 0
[ 302.071034] CPU 3: hi: 0, btch: 1 usd: 0
[ 302.071830] CPU 4: hi: 0, btch: 1 usd: 0
[ 302.072776] CPU 5: hi: 0, btch: 1 usd: 0
[ 302.073577] CPU 6: hi: 0, btch: 1 usd: 0
[ 302.074374] CPU 7: hi: 0, btch: 1 usd: 0
[ 302.075172] Node 0 DMA32 per-cpu:
[ 302.075745] CPU 0: hi: 186, btch: 31 usd: 0
[ 302.076712] CPU 1: hi: 186, btch: 31 usd: 0
[ 302.077517] CPU 2: hi: 186, btch: 31 usd: 0
[ 302.078313] CPU 3: hi: 186, btch: 31 usd: 1
[ 302.079104] CPU 4: hi: 186, btch: 31 usd: 0
[ 302.080274] CPU 5: hi: 186, btch: 31 usd: 0
[ 302.081482] CPU 6: hi: 186, btch: 31 usd: 0
[ 302.082689] CPU 7: hi: 186, btch: 31 usd: 36
[ 302.084210] Node 0 Normal per-cpu:
[ 302.085104] CPU 0: hi: 186, btch: 31 usd: 1
[ 302.086363] CPU 1: hi: 186, btch: 31 usd: 30
[ 302.087575] CPU 2: hi: 186, btch: 31 usd: 0
[ 302.089193] CPU 3: hi: 186, btch: 31 usd: 16
[ 302.090448] CPU 4: hi: 186, btch: 31 usd: 14
[ 302.091646] CPU 5: hi: 186, btch: 31 usd: 0
[ 302.092992] CPU 6: hi: 186, btch: 31 usd: 30
[ 302.093968] CPU 7: hi: 186, btch: 31 usd: 14
[ 302.094945] active_anon:789505 inactive_anon:197012 isolated_anon:0
[ 302.094946] active_file:11 inactive_file:18 isolated_file:0
[ 302.094947] unevictable:0 dirty:0 writeback:29 unstable:0
[ 302.094948] free:6465 slab_reclaimable:2020 slab_unreclaimable:3473
[ 302.094949] mapped:5 shmem:1 pagetables:2539 bounce:0
[ 302.101211] Node 0 DMA free:15888kB min:28kB low:32kB high:40kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0s
[ 302.108917] lowmem_reserve[]: 0 3512 4017 4017
[ 302.109885] Node 0 DMA32 free:9020kB min:7076kB low:8844kB high:10612kB active_anon:2962672kB inactive_anon:592684kB active_file:44kB inactive_file:0kB unevictable:s
[ 302.117811] lowmem_reserve[]: 0 0 505 505
[ 302.118938] Node 0 Normal free:952kB min:1016kB low:1268kB high:1524kB active_anon:195348kB inactive_anon:195364kB active_file:0kB inactive_file:72kB unevictable:0ks
[ 302.126920] lowmem_reserve[]: 0 0 0 0
[ 302.127744] Node 0 DMA: 0*4kB 0*8kB 1*16kB 0*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15888kB
[ 302.130415] Node 0 DMA32: 68*4kB 48*8kB 35*16kB 16*32kB 9*64kB 3*128kB 2*256kB 2*512kB 1*1024kB 0*2048kB 1*4096kB = 9344kB
[ 302.133101] Node 0 Normal: 117*4kB 1*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 988kB
[ 302.135488] 185 total pagecache pages
[ 302.136455] 149 pages in swap cache
[ 302.137171] Swap cache stats: add 126014, delete 125865, find 94/133
[ 302.138523] Free swap = 0kB
[ 302.139114] Total swap = 497976kB
[ 302.149921] 1048560 pages RAM
[ 302.150591] 36075 pages reserved
[ 302.151254] 35 pages shared
[ 302.151830] 1004770 pages non-shared
[ 302.152922] [ pid ] uid tgid total_vm rss cpu oom_adj oom_score_adj name
[ 302.154450] [ 939] 0 939 5295 1 4 -17 -1000 udevd
[ 302.156160] [ 1002] 0 1002 5294 1 4 -17 -1000 udevd
[ 302.157673] [ 1003] 0 1003 5294 0 4 -17 -1000 udevd
[ 302.159200] [ 2399] 0 2399 1737 0 7 -17 -1000 dhclient
[ 302.161078] [ 2442] 0 2442 12405 0 4 -17 -1000 sshd
[ 302.162581] [ 2446] 0 2446 20357 1 0 -17 -1000 sshd
[ 302.164408] [ 2450] 1000 2450 20357 0 1 -17 -1000 sshd
[ 302.165901] [ 2455] 1000 2455 5592 0 7 -17 -1000 bash
[ 302.167401] [ 2516] 0 2516 20357 1 6 -17 -1000 sshd
[ 302.169199] [ 2520] 1000 2520 20357 0 4 -17 -1000 sshd
[ 302.170702] [ 2527] 1000 2527 5606 1 6 -17 -1000 bash
[ 302.172508] [ 5516] 0 5516 5089 0 2 -17 -1000 sudo
[ 302.174008] [ 5517] 0 5517 2862 1 0 -17 -1000 check
[ 302.175536] [16484] 0 16484 2457 7 0 -17 -1000 017
[ 302.177336] [16665] 0 16665 1036 0 2 -17 -1000 xfs_check
[ 302.179001] [16666] 0 16666 10031571 986414 6 -17 -1000 xfs_db
[ 302.180890] Kernel panic - not syncing: Out of memory and no killable processes...
[ 302.180892]
[ 302.182585] Pid: 16666, comm: xfs_db Not tainted 3.2.0-rc3-dgc+ #105
[ 302.183764] Call Trace:
[ 302.184528] [<ffffffff81abe166>] panic+0x91/0x19d
[ 302.185790] [<ffffffff8111bb38>] out_of_memory+0x528/0x550
[ 302.187244] [<ffffffff81120976>] __alloc_pages_nodemask+0x726/0x740
[ 302.188780] [<ffffffff81155183>] alloc_pages_current+0xa3/0x110
[ 302.189951] [<ffffffff8111814f>] __page_cache_alloc+0x8f/0xa0
[ 302.191039] [<ffffffff811185be>] ? find_get_page+0x1e/0x90
[ 302.192168] [<ffffffff8111a2dd>] filemap_fault+0x2bd/0x480
[ 302.193215] [<ffffffff8106ead8>] ? flush_tlb_page+0x48/0xb0
[ 302.194343] [<ffffffff81138a1f>] __do_fault+0x6f/0x4f0
[ 302.195312] [<ffffffff81139cfc>] ? do_wp_page+0x2ac/0x740
[ 302.196490] [<ffffffff8113b567>] handle_pte_fault+0xf7/0x8b0
[ 302.197554] [<ffffffff8107933a>] ? finish_task_switch+0x4a/0xf0
[ 302.198670] [<ffffffff8113c035>] handle_mm_fault+0x155/0x250
[ 302.199755] [<ffffffff81acc902>] do_page_fault+0x142/0x4f0
[ 302.200921] [<ffffffff8107958d>] ? set_next_entity+0xad/0xd0
[ 302.201987] [<ffffffff8103f6d2>] ? __switch_to+0x132/0x310
[ 302.203023] [<ffffffff8107933a>] ? finish_task_switch+0x4a/0xf0
[ 302.204321] [<ffffffff81acc405>] do_async_page_fault+0x35/0x80
[ 302.205417] [<ffffffff81ac97b5>] async_page_fault+0x25/0x30
It looks to me like the process causing the page fault and trying to
allocate more memory (xfs_db) is also the one consuming all the
memory and by all metrics is the obvious candidate to kill. So, why
does the OOM killer kill everything else but the memory hog and then
panic the machine?
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next reply other threads:[~2011-12-01 9:37 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-12-01 9:36 Dave Chinner [this message]
2011-12-01 9:50 ` [3.2-rc3] OOM killer doesn't kill the obvious memory hog KAMEZAWA Hiroyuki
2011-12-01 12:46 ` Dave Chinner
2011-12-01 22:35 ` David Rientjes
2011-12-02 1:59 ` Dave Chinner
2011-12-02 3:31 ` Dave Chinner
2011-12-02 5:44 ` KAMEZAWA Hiroyuki
2011-12-04 22:04 ` Dave Chinner
2011-12-06 20:31 ` David Rientjes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20111201093644.GW7046@dastard \
--to=david@fromorbit.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).