All of lore.kernel.org
 help / color / mirror / Atom feed
From: John Weekes <lists.xen@nuclearfallout.net>
To: xen-devel@lists.xensource.com
Subject: OOM problems
Date: Fri, 12 Nov 2010 23:57:22 -0800	[thread overview]
Message-ID: <4CDE44E2.2060807@nuclearfallout.net> (raw)

On machines running many HVM (stubdom-based) domains, I often see errors 
like this:

[77176.524094] qemu-dm invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=0
[77176.524102] Pid: 7478, comm: qemu-dm Not tainted 2.6.32.25-g80f7e08 #2
[77176.524109] Call Trace:
[77176.524123]  [<ffffffff810897fd>] ? T.413+0xcd/0x290
[77176.524129]  [<ffffffff81089ad3>] ? __out_of_memory+0x113/0x180
[77176.524133]  [<ffffffff81089b9e>] ? out_of_memory+0x5e/0xc0
[77176.524140]  [<ffffffff8108d1cb>] ? __alloc_pages_nodemask+0x69b/0x6b0
[77176.524144]  [<ffffffff8108d1f2>] ? __get_free_pages+0x12/0x60
[77176.524152]  [<ffffffff810c94e7>] ? __pollwait+0xb7/0x110
[77176.524161]  [<ffffffff81262b93>] ? n_tty_poll+0x183/0x1d0
[77176.524165]  [<ffffffff8125ea42>] ? tty_poll+0x92/0xa0
[77176.524169]  [<ffffffff810c8a92>] ? do_select+0x362/0x670
[77176.524173]  [<ffffffff810c9430>] ? __pollwait+0x0/0x110
[77176.524178]  [<ffffffff810c9540>] ? pollwake+0x0/0x60
[77176.524183]  [<ffffffff810c9540>] ? pollwake+0x0/0x60
[77176.524188]  [<ffffffff810c9540>] ? pollwake+0x0/0x60
[77176.524193]  [<ffffffff810c9540>] ? pollwake+0x0/0x60
[77176.524197]  [<ffffffff810c9540>] ? pollwake+0x0/0x60
[77176.524202]  [<ffffffff810c9540>] ? pollwake+0x0/0x60
[77176.524207]  [<ffffffff810c9540>] ? pollwake+0x0/0x60
[77176.524212]  [<ffffffff810c9540>] ? pollwake+0x0/0x60
[77176.524217]  [<ffffffff810c9540>] ? pollwake+0x0/0x60
[77176.524222]  [<ffffffff810c8fb5>] ? core_sys_select+0x215/0x350
[77176.524231]  [<ffffffff810100af>] ? xen_restore_fl_direct_end+0x0/0x1
[77176.524236]  [<ffffffff8100c48d>] ? xen_mc_flush+0x8d/0x1b0
[77176.524243]  [<ffffffff81014ffb>] ? xen_hypervisor_callback+0x1b/0x20
[77176.524251]  [<ffffffff814b0f5a>] ? error_exit+0x2a/0x60
[77176.524255]  [<ffffffff8101485d>] ? retint_restore_args+0x5/0x6
[77176.524263]  [<ffffffff8102fd3d>] ? pvclock_clocksource_read+0x4d/0xb0
[77176.524268]  [<ffffffff8102fd3d>] ? pvclock_clocksource_read+0x4d/0xb0
[77176.524276]  [<ffffffff810663d1>] ? ktime_get_ts+0x61/0xd0
[77176.524281]  [<ffffffff810c9354>] ? sys_select+0x44/0x120
[77176.524286]  [<ffffffff81013f02>] ? system_call_fastpath+0x16/0x1b
[77176.524290] Mem-Info:
[77176.524293] DMA per-cpu:
[77176.524296] CPU    0: hi:    0, btch:   1 usd:   0
[77176.524300] CPU    1: hi:    0, btch:   1 usd:   0
[77176.524303] CPU    2: hi:    0, btch:   1 usd:   0
[77176.524306] CPU    3: hi:    0, btch:   1 usd:   0
[77176.524310] CPU    4: hi:    0, btch:   1 usd:   0
[77176.524313] CPU    5: hi:    0, btch:   1 usd:   0
[77176.524316] CPU    6: hi:    0, btch:   1 usd:   0
[77176.524318] CPU    7: hi:    0, btch:   1 usd:   0
[77176.524322] CPU    8: hi:    0, btch:   1 usd:   0
[77176.524324] CPU    9: hi:    0, btch:   1 usd:   0
[77176.524327] CPU   10: hi:    0, btch:   1 usd:   0
[77176.524330] CPU   11: hi:    0, btch:   1 usd:   0
[77176.524333] CPU   12: hi:    0, btch:   1 usd:   0
[77176.524336] CPU   13: hi:    0, btch:   1 usd:   0
[77176.524339] CPU   14: hi:    0, btch:   1 usd:   0
[77176.524342] CPU   15: hi:    0, btch:   1 usd:   0
[77176.524345] CPU   16: hi:    0, btch:   1 usd:   0
[77176.524348] CPU   17: hi:    0, btch:   1 usd:   0
[77176.524351] CPU   18: hi:    0, btch:   1 usd:   0
[77176.524354] CPU   19: hi:    0, btch:   1 usd:   0
[77176.524358] CPU   20: hi:    0, btch:   1 usd:   0
[77176.524364] CPU   21: hi:    0, btch:   1 usd:   0
[77176.524367] CPU   22: hi:    0, btch:   1 usd:   0
[77176.524370] CPU   23: hi:    0, btch:   1 usd:   0
[77176.524372] DMA32 per-cpu:
[77176.524374] CPU    0: hi:  186, btch:  31 usd:  81
[77176.524377] CPU    1: hi:  186, btch:  31 usd:  66
[77176.524380] CPU    2: hi:  186, btch:  31 usd:  49
[77176.524385] CPU    3: hi:  186, btch:  31 usd:  67
[77176.524387] CPU    4: hi:  186, btch:  31 usd:  93
[77176.524390] CPU    5: hi:  186, btch:  31 usd:  73
[77176.524393] CPU    6: hi:  186, btch:  31 usd:  50
[77176.524396] CPU    7: hi:  186, btch:  31 usd:  79
[77176.524399] CPU    8: hi:  186, btch:  31 usd:  21
[77176.524402] CPU    9: hi:  186, btch:  31 usd:  38
[77176.524406] CPU   10: hi:  186, btch:  31 usd:   0
[77176.524409] CPU   11: hi:  186, btch:  31 usd:  75
[77176.524412] CPU   12: hi:  186, btch:  31 usd:   1
[77176.524414] CPU   13: hi:  186, btch:  31 usd:   4
[77176.524417] CPU   14: hi:  186, btch:  31 usd:   9
[77176.524420] CPU   15: hi:  186, btch:  31 usd:   0
[77176.524423] CPU   16: hi:  186, btch:  31 usd:  56
[77176.524426] CPU   17: hi:  186, btch:  31 usd:  35
[77176.524429] CPU   18: hi:  186, btch:  31 usd:  32
[77176.524432] CPU   19: hi:  186, btch:  31 usd:  39
[77176.524435] CPU   20: hi:  186, btch:  31 usd:  24
[77176.524438] CPU   21: hi:  186, btch:  31 usd:   0
[77176.524441] CPU   22: hi:  186, btch:  31 usd:  35
[77176.524444] CPU   23: hi:  186, btch:  31 usd:  51
[77176.524447] Normal per-cpu:
[77176.524449] CPU    0: hi:  186, btch:  31 usd:  29
[77176.524453] CPU    1: hi:  186, btch:  31 usd:   1
[77176.524456] CPU    2: hi:  186, btch:  31 usd:  30
[77176.524459] CPU    3: hi:  186, btch:  31 usd:  30
[77176.524463] CPU    4: hi:  186, btch:  31 usd:  30
[77176.524466] CPU    5: hi:  186, btch:  31 usd:  31
[77176.524469] CPU    6: hi:  186, btch:  31 usd:   0
[77176.524471] CPU    7: hi:  186, btch:  31 usd:   0
[77176.524474] CPU    8: hi:  186, btch:  31 usd:  30
[77176.524477] CPU    9: hi:  186, btch:  31 usd:  28
[77176.524480] CPU   10: hi:  186, btch:  31 usd:   0
[77176.524483] CPU   11: hi:  186, btch:  31 usd:  30
[77176.524486] CPU   12: hi:  186, btch:  31 usd:   0
[77176.524489] CPU   13: hi:  186, btch:  31 usd:   0
[77176.524492] CPU   14: hi:  186, btch:  31 usd:   0
[77176.524495] CPU   15: hi:  186, btch:  31 usd:   0
[77176.524498] CPU   16: hi:  186, btch:  31 usd:   0
[77176.524501] CPU   17: hi:  186, btch:  31 usd:   0
[77176.524504] CPU   18: hi:  186, btch:  31 usd:   0
[77176.524507] CPU   19: hi:  186, btch:  31 usd:   0
[77176.524510] CPU   20: hi:  186, btch:  31 usd:   0
[77176.524513] CPU   21: hi:  186, btch:  31 usd:   0
[77176.524516] CPU   22: hi:  186, btch:  31 usd:   0
[77176.524518] CPU   23: hi:  186, btch:  31 usd:   0
[77176.524524] active_anon:5675 inactive_anon:4676 isolated_anon:0
[77176.524526]  active_file:146373 inactive_file:153543 isolated_file:480
[77176.524527]  unevictable:0 dirty:167539 writeback:322 unstable:0
[77176.524528]  free:5017 slab_reclaimable:15640 slab_unreclaimable:8972
[77176.524529]  mapped:1114 shmem:7 pagetables:1908 bounce:0
[77176.524536] DMA free:9820kB min:32kB low:40kB high:48kB 
active_anon:4kB inactive_anon:0kB active_file:616kB inactive_file:2212kB 
unevictable:0kB isolated(anon):0kB isolated(file):0kB present:12740kB 
mlocked:0kB dirty:2292kB writeback:0kB mapped:0kB shmem:0kB 
slab_reclaimable:72kB slab_unreclaimable:108kB kernel_stack:0kB 
pagetables:12kB unstable:0kB bounce:0kB writeback_tmp:0kB 
pages_scanned:3040 all_unreclaimable? no
[77176.524541] lowmem_reserve[]: 0 1428 2452 2452
[77176.524551] DMA32 free:7768kB min:3680kB low:4600kB high:5520kB 
active_anon:22696kB inactive_anon:18704kB active_file:584580kB 
inactive_file:608508kB unevictable:0kB isolated(anon):0kB 
isolated(file):1920kB present:1462496kB mlocked:0kB dirty:664128kB 
writeback:1276kB mapped:4456kB shmem:28kB slab_reclaimable:62076kB 
slab_unreclaimable:32292kB kernel_stack:5120kB pagetables:7620kB 
unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:1971808 
all_unreclaimable? yes
[77176.524556] lowmem_reserve[]: 0 0 1024 1024
[77176.524564] Normal free:2480kB min:2636kB low:3292kB high:3952kB 
active_anon:0kB inactive_anon:0kB active_file:296kB inactive_file:3452kB 
unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1048700kB 
mlocked:0kB dirty:3736kB writeback:12kB mapped:0kB shmem:0kB 
slab_reclaimable:412kB slab_unreclaimable:3488kB kernel_stack:80kB 
pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB 
pages_scanned:8192 all_unreclaimable? yes
[77176.524569] lowmem_reserve[]: 0 0 0 0
[77176.524574] DMA: 4*4kB 25*8kB 11*16kB 7*32kB 8*64kB 8*128kB 8*256kB 
3*512kB 0*1024kB 0*2048kB 1*4096kB = 9832kB
[77176.524587] DMA32: 742*4kB 118*8kB 3*16kB 3*32kB 2*64kB 0*128kB 
0*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 7768kB
[77176.524600] Normal: 1*4kB 1*8kB 2*16kB 13*32kB 14*64kB 2*128kB 
0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1612kB
[77176.524613] 302308 total pagecache pages
[77176.524615] 1619 pages in swap cache
[77176.524617] Swap cache stats: add 40686, delete 39067, find 24687/26036
[77176.524619] Free swap  = 10141956kB
[77176.524621] Total swap = 10239992kB
[77176.577607] 793456 pages RAM
[77176.577611] 436254 pages reserved
[77176.577613] 308627 pages shared
[77176.577615] 49249 pages non-shared
[77176.577620] Out of memory: kill process 5755 (python2.6) score 110492 
or a child
[77176.577623] Killed process 5757 (python2.6)

Depending on what gets nuked by the OOM-killer, I am frequently left 
with an unusable system that needs to be rebooted.

The machine always has plenty of memory available (1.5 GB devoted to 
dom0, of which >1 GB is always just in "cached" state). For instance, 
right now, on this same machine:

# free
              total       used       free     shared    buffers     cached
Mem:       1536512    1493112      43400          0      10284    1144904
-/+ buffers/cache:     337924    1198588
Swap:     10239992      74444   10165548

I have seen this OOM problem on a wide range of Xen versions, stretching 
as far back as I can remember, including the most recent 4.1-unstable 
and 2.6.32 pvops kernel (from yesterday, tested in the hope that they 
would fix this).  I haven't found a way to reliably reproduce it yet, 
but I suspect that the problem relates to reasonably heavy disk or 
network activity -- during this last one, I see that a domain was 
briefly doing ~200 Mbps of downloads.

Anyone have any ideas on what this could be? Is RAM getting 
spontaneously filled because a buffer somewhere grows too quickly, or 
something like that? What can I try here?

-John

             reply	other threads:[~2010-11-13  7:57 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-11-13  7:57 John Weekes [this message]
2010-11-13  8:14 ` OOM problems Ian Pratt
2010-11-13  8:27   ` John Weekes
2010-11-13  9:13     ` Ian Pratt
2010-11-13  9:43       ` John Weekes
2010-11-13 10:19       ` John Weekes
2010-11-14  9:53         ` Daniel Stodden
2010-11-15  8:55       ` Jan Beulich
2010-11-15  9:40         ` Daniel Stodden
2010-11-15  9:57           ` Jan Beulich
2010-11-15 17:59           ` John Weekes
2010-11-16 19:54             ` John Weekes
2010-11-17 20:10               ` Ian Pratt
2010-11-17 22:02                 ` John Weekes
2010-11-18  0:56                   ` Ian Pratt
2010-11-18  1:23                   ` Daniel Stodden
2010-11-18  3:29                     ` John Weekes
2010-11-18  4:08                       ` Daniel Stodden
2010-11-18  7:15                         ` John Weekes
2010-11-18 10:41                           ` Daniel Stodden
2010-11-19  7:27                             ` John Weekes
2010-11-15 14:17         ` Stefano Stabellini
2010-11-13 18:15 ` George Shuklin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4CDE44E2.2060807@nuclearfallout.net \
    --to=lists.xen@nuclearfallout.net \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.