From: John Weekes <lists.xen@nuclearfallout.net>
To: xen-devel@lists.xensource.com
Subject: OOM problems
Date: Fri, 12 Nov 2010 23:57:22 -0800 [thread overview]
Message-ID: <4CDE44E2.2060807@nuclearfallout.net> (raw)
On machines running many HVM (stubdom-based) domains, I often see errors
like this:
[77176.524094] qemu-dm invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=0
[77176.524102] Pid: 7478, comm: qemu-dm Not tainted 2.6.32.25-g80f7e08 #2
[77176.524109] Call Trace:
[77176.524123] [<ffffffff810897fd>] ? T.413+0xcd/0x290
[77176.524129] [<ffffffff81089ad3>] ? __out_of_memory+0x113/0x180
[77176.524133] [<ffffffff81089b9e>] ? out_of_memory+0x5e/0xc0
[77176.524140] [<ffffffff8108d1cb>] ? __alloc_pages_nodemask+0x69b/0x6b0
[77176.524144] [<ffffffff8108d1f2>] ? __get_free_pages+0x12/0x60
[77176.524152] [<ffffffff810c94e7>] ? __pollwait+0xb7/0x110
[77176.524161] [<ffffffff81262b93>] ? n_tty_poll+0x183/0x1d0
[77176.524165] [<ffffffff8125ea42>] ? tty_poll+0x92/0xa0
[77176.524169] [<ffffffff810c8a92>] ? do_select+0x362/0x670
[77176.524173] [<ffffffff810c9430>] ? __pollwait+0x0/0x110
[77176.524178] [<ffffffff810c9540>] ? pollwake+0x0/0x60
[77176.524183] [<ffffffff810c9540>] ? pollwake+0x0/0x60
[77176.524188] [<ffffffff810c9540>] ? pollwake+0x0/0x60
[77176.524193] [<ffffffff810c9540>] ? pollwake+0x0/0x60
[77176.524197] [<ffffffff810c9540>] ? pollwake+0x0/0x60
[77176.524202] [<ffffffff810c9540>] ? pollwake+0x0/0x60
[77176.524207] [<ffffffff810c9540>] ? pollwake+0x0/0x60
[77176.524212] [<ffffffff810c9540>] ? pollwake+0x0/0x60
[77176.524217] [<ffffffff810c9540>] ? pollwake+0x0/0x60
[77176.524222] [<ffffffff810c8fb5>] ? core_sys_select+0x215/0x350
[77176.524231] [<ffffffff810100af>] ? xen_restore_fl_direct_end+0x0/0x1
[77176.524236] [<ffffffff8100c48d>] ? xen_mc_flush+0x8d/0x1b0
[77176.524243] [<ffffffff81014ffb>] ? xen_hypervisor_callback+0x1b/0x20
[77176.524251] [<ffffffff814b0f5a>] ? error_exit+0x2a/0x60
[77176.524255] [<ffffffff8101485d>] ? retint_restore_args+0x5/0x6
[77176.524263] [<ffffffff8102fd3d>] ? pvclock_clocksource_read+0x4d/0xb0
[77176.524268] [<ffffffff8102fd3d>] ? pvclock_clocksource_read+0x4d/0xb0
[77176.524276] [<ffffffff810663d1>] ? ktime_get_ts+0x61/0xd0
[77176.524281] [<ffffffff810c9354>] ? sys_select+0x44/0x120
[77176.524286] [<ffffffff81013f02>] ? system_call_fastpath+0x16/0x1b
[77176.524290] Mem-Info:
[77176.524293] DMA per-cpu:
[77176.524296] CPU 0: hi: 0, btch: 1 usd: 0
[77176.524300] CPU 1: hi: 0, btch: 1 usd: 0
[77176.524303] CPU 2: hi: 0, btch: 1 usd: 0
[77176.524306] CPU 3: hi: 0, btch: 1 usd: 0
[77176.524310] CPU 4: hi: 0, btch: 1 usd: 0
[77176.524313] CPU 5: hi: 0, btch: 1 usd: 0
[77176.524316] CPU 6: hi: 0, btch: 1 usd: 0
[77176.524318] CPU 7: hi: 0, btch: 1 usd: 0
[77176.524322] CPU 8: hi: 0, btch: 1 usd: 0
[77176.524324] CPU 9: hi: 0, btch: 1 usd: 0
[77176.524327] CPU 10: hi: 0, btch: 1 usd: 0
[77176.524330] CPU 11: hi: 0, btch: 1 usd: 0
[77176.524333] CPU 12: hi: 0, btch: 1 usd: 0
[77176.524336] CPU 13: hi: 0, btch: 1 usd: 0
[77176.524339] CPU 14: hi: 0, btch: 1 usd: 0
[77176.524342] CPU 15: hi: 0, btch: 1 usd: 0
[77176.524345] CPU 16: hi: 0, btch: 1 usd: 0
[77176.524348] CPU 17: hi: 0, btch: 1 usd: 0
[77176.524351] CPU 18: hi: 0, btch: 1 usd: 0
[77176.524354] CPU 19: hi: 0, btch: 1 usd: 0
[77176.524358] CPU 20: hi: 0, btch: 1 usd: 0
[77176.524364] CPU 21: hi: 0, btch: 1 usd: 0
[77176.524367] CPU 22: hi: 0, btch: 1 usd: 0
[77176.524370] CPU 23: hi: 0, btch: 1 usd: 0
[77176.524372] DMA32 per-cpu:
[77176.524374] CPU 0: hi: 186, btch: 31 usd: 81
[77176.524377] CPU 1: hi: 186, btch: 31 usd: 66
[77176.524380] CPU 2: hi: 186, btch: 31 usd: 49
[77176.524385] CPU 3: hi: 186, btch: 31 usd: 67
[77176.524387] CPU 4: hi: 186, btch: 31 usd: 93
[77176.524390] CPU 5: hi: 186, btch: 31 usd: 73
[77176.524393] CPU 6: hi: 186, btch: 31 usd: 50
[77176.524396] CPU 7: hi: 186, btch: 31 usd: 79
[77176.524399] CPU 8: hi: 186, btch: 31 usd: 21
[77176.524402] CPU 9: hi: 186, btch: 31 usd: 38
[77176.524406] CPU 10: hi: 186, btch: 31 usd: 0
[77176.524409] CPU 11: hi: 186, btch: 31 usd: 75
[77176.524412] CPU 12: hi: 186, btch: 31 usd: 1
[77176.524414] CPU 13: hi: 186, btch: 31 usd: 4
[77176.524417] CPU 14: hi: 186, btch: 31 usd: 9
[77176.524420] CPU 15: hi: 186, btch: 31 usd: 0
[77176.524423] CPU 16: hi: 186, btch: 31 usd: 56
[77176.524426] CPU 17: hi: 186, btch: 31 usd: 35
[77176.524429] CPU 18: hi: 186, btch: 31 usd: 32
[77176.524432] CPU 19: hi: 186, btch: 31 usd: 39
[77176.524435] CPU 20: hi: 186, btch: 31 usd: 24
[77176.524438] CPU 21: hi: 186, btch: 31 usd: 0
[77176.524441] CPU 22: hi: 186, btch: 31 usd: 35
[77176.524444] CPU 23: hi: 186, btch: 31 usd: 51
[77176.524447] Normal per-cpu:
[77176.524449] CPU 0: hi: 186, btch: 31 usd: 29
[77176.524453] CPU 1: hi: 186, btch: 31 usd: 1
[77176.524456] CPU 2: hi: 186, btch: 31 usd: 30
[77176.524459] CPU 3: hi: 186, btch: 31 usd: 30
[77176.524463] CPU 4: hi: 186, btch: 31 usd: 30
[77176.524466] CPU 5: hi: 186, btch: 31 usd: 31
[77176.524469] CPU 6: hi: 186, btch: 31 usd: 0
[77176.524471] CPU 7: hi: 186, btch: 31 usd: 0
[77176.524474] CPU 8: hi: 186, btch: 31 usd: 30
[77176.524477] CPU 9: hi: 186, btch: 31 usd: 28
[77176.524480] CPU 10: hi: 186, btch: 31 usd: 0
[77176.524483] CPU 11: hi: 186, btch: 31 usd: 30
[77176.524486] CPU 12: hi: 186, btch: 31 usd: 0
[77176.524489] CPU 13: hi: 186, btch: 31 usd: 0
[77176.524492] CPU 14: hi: 186, btch: 31 usd: 0
[77176.524495] CPU 15: hi: 186, btch: 31 usd: 0
[77176.524498] CPU 16: hi: 186, btch: 31 usd: 0
[77176.524501] CPU 17: hi: 186, btch: 31 usd: 0
[77176.524504] CPU 18: hi: 186, btch: 31 usd: 0
[77176.524507] CPU 19: hi: 186, btch: 31 usd: 0
[77176.524510] CPU 20: hi: 186, btch: 31 usd: 0
[77176.524513] CPU 21: hi: 186, btch: 31 usd: 0
[77176.524516] CPU 22: hi: 186, btch: 31 usd: 0
[77176.524518] CPU 23: hi: 186, btch: 31 usd: 0
[77176.524524] active_anon:5675 inactive_anon:4676 isolated_anon:0
[77176.524526] active_file:146373 inactive_file:153543 isolated_file:480
[77176.524527] unevictable:0 dirty:167539 writeback:322 unstable:0
[77176.524528] free:5017 slab_reclaimable:15640 slab_unreclaimable:8972
[77176.524529] mapped:1114 shmem:7 pagetables:1908 bounce:0
[77176.524536] DMA free:9820kB min:32kB low:40kB high:48kB
active_anon:4kB inactive_anon:0kB active_file:616kB inactive_file:2212kB
unevictable:0kB isolated(anon):0kB isolated(file):0kB present:12740kB
mlocked:0kB dirty:2292kB writeback:0kB mapped:0kB shmem:0kB
slab_reclaimable:72kB slab_unreclaimable:108kB kernel_stack:0kB
pagetables:12kB unstable:0kB bounce:0kB writeback_tmp:0kB
pages_scanned:3040 all_unreclaimable? no
[77176.524541] lowmem_reserve[]: 0 1428 2452 2452
[77176.524551] DMA32 free:7768kB min:3680kB low:4600kB high:5520kB
active_anon:22696kB inactive_anon:18704kB active_file:584580kB
inactive_file:608508kB unevictable:0kB isolated(anon):0kB
isolated(file):1920kB present:1462496kB mlocked:0kB dirty:664128kB
writeback:1276kB mapped:4456kB shmem:28kB slab_reclaimable:62076kB
slab_unreclaimable:32292kB kernel_stack:5120kB pagetables:7620kB
unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:1971808
all_unreclaimable? yes
[77176.524556] lowmem_reserve[]: 0 0 1024 1024
[77176.524564] Normal free:2480kB min:2636kB low:3292kB high:3952kB
active_anon:0kB inactive_anon:0kB active_file:296kB inactive_file:3452kB
unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1048700kB
mlocked:0kB dirty:3736kB writeback:12kB mapped:0kB shmem:0kB
slab_reclaimable:412kB slab_unreclaimable:3488kB kernel_stack:80kB
pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB
pages_scanned:8192 all_unreclaimable? yes
[77176.524569] lowmem_reserve[]: 0 0 0 0
[77176.524574] DMA: 4*4kB 25*8kB 11*16kB 7*32kB 8*64kB 8*128kB 8*256kB
3*512kB 0*1024kB 0*2048kB 1*4096kB = 9832kB
[77176.524587] DMA32: 742*4kB 118*8kB 3*16kB 3*32kB 2*64kB 0*128kB
0*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 7768kB
[77176.524600] Normal: 1*4kB 1*8kB 2*16kB 13*32kB 14*64kB 2*128kB
0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1612kB
[77176.524613] 302308 total pagecache pages
[77176.524615] 1619 pages in swap cache
[77176.524617] Swap cache stats: add 40686, delete 39067, find 24687/26036
[77176.524619] Free swap = 10141956kB
[77176.524621] Total swap = 10239992kB
[77176.577607] 793456 pages RAM
[77176.577611] 436254 pages reserved
[77176.577613] 308627 pages shared
[77176.577615] 49249 pages non-shared
[77176.577620] Out of memory: kill process 5755 (python2.6) score 110492
or a child
[77176.577623] Killed process 5757 (python2.6)
Depending on what gets nuked by the OOM-killer, I am frequently left
with an unusable system that needs to be rebooted.
The machine always has plenty of memory available (1.5 GB devoted to
dom0, of which >1 GB is always just in "cached" state). For instance,
right now, on this same machine:
# free
total used free shared buffers cached
Mem: 1536512 1493112 43400 0 10284 1144904
-/+ buffers/cache: 337924 1198588
Swap: 10239992 74444 10165548
I have seen this OOM problem on a wide range of Xen versions, stretching
as far back as I can remember, including the most recent 4.1-unstable
and 2.6.32 pvops kernel (from yesterday, tested in the hope that they
would fix this). I haven't found a way to reliably reproduce it yet,
but I suspect that the problem relates to reasonably heavy disk or
network activity -- during this last one, I see that a domain was
briefly doing ~200 Mbps of downloads.
Anyone have any ideas on what this could be? Is RAM getting
spontaneously filled because a buffer somewhere grows too quickly, or
something like that? What can I try here?
-John
next reply other threads:[~2010-11-13 7:57 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-11-13 7:57 John Weekes [this message]
2010-11-13 8:14 ` OOM problems Ian Pratt
2010-11-13 8:27 ` John Weekes
2010-11-13 9:13 ` Ian Pratt
2010-11-13 9:43 ` John Weekes
2010-11-13 10:19 ` John Weekes
2010-11-14 9:53 ` Daniel Stodden
2010-11-15 8:55 ` Jan Beulich
2010-11-15 9:40 ` Daniel Stodden
2010-11-15 9:57 ` Jan Beulich
2010-11-15 17:59 ` John Weekes
2010-11-16 19:54 ` John Weekes
2010-11-17 20:10 ` Ian Pratt
2010-11-17 22:02 ` John Weekes
2010-11-18 0:56 ` Ian Pratt
2010-11-18 1:23 ` Daniel Stodden
2010-11-18 3:29 ` John Weekes
2010-11-18 4:08 ` Daniel Stodden
2010-11-18 7:15 ` John Weekes
2010-11-18 10:41 ` Daniel Stodden
2010-11-19 7:27 ` John Weekes
2010-11-15 14:17 ` Stefano Stabellini
2010-11-13 18:15 ` George Shuklin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4CDE44E2.2060807@nuclearfallout.net \
--to=lists.xen@nuclearfallout.net \
--cc=xen-devel@lists.xensource.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.