From: James Dingwall <james.dingwall@zynstra.com>
To: Bob Liu <bob.liu@oracle.com>
Cc: xen-devel@lists.xen.org
Subject: Re: Kernel 3.11 / 3.12 OOM killer and Xen ballooning
Date: Thu, 26 Dec 2013 08:42:23 +0000 [thread overview]
Message-ID: <52BBEBEF.8040509@zynstra.com> (raw)
In-Reply-To: <52B3B6D7.50606@oracle.com>
Bob Liu wrote:
> On 12/20/2013 03:08 AM, James Dingwall wrote:
>> Bob Liu wrote:
>>> On 12/12/2013 12:30 AM, James Dingwall wrote:
>>>> Bob Liu wrote:
>>>>> On 12/10/2013 11:27 PM, Konrad Rzeszutek Wilk wrote:
>>>>>> On Tue, Dec 10, 2013 at 02:52:40PM +0000, James Dingwall wrote:
>>>>>>> Konrad Rzeszutek Wilk wrote:
>>>>>>>> On Mon, Dec 09, 2013 at 05:50:29PM +0000, James Dingwall wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> Since 3.11 I have noticed that the OOM killer quite frequently
>>>>>>>>> triggers in my Xen guest domains which use ballooning to
>>>>>>>>> increase/decrease their memory allocation according to their
>>>>>>>>> requirements. One example domain I have has a maximum memory
>>>>>>>>> setting of ~1.5Gb but it usually idles at ~300Mb, it is also
>>>>>>>>> configured with 2Gb swap which is almost 100% free.
>>>>>>>>>
>>>>>>>>> # free
>>>>>>>>> total used free shared buffers
>>>>>>>>> cached
>>>>>>>>> Mem: 272080 248108 23972 0 1448 63064
>>>>>>>>> -/+ buffers/cache: 183596 88484
>>>>>>>>> Swap: 2097148 8 2097140
>>>>>>>>>
>>>>>>>>> There is plenty of available free memory in the hypervisor to
>>>>>>>>> balloon to the maximum size:
>>>>>>>>> # xl info | grep free_mem
>>>>>>>>> free_memory : 14923
>>>>>>>>>
>>>>>>>>> An example trace (they are always the same) from the oom killer in
>>>>>>>>> 3.12 is added below. So far I have not been able to reproduce this
>>>>>>>>> at will so it is difficult to start bisecting it to see if a
>>>>>>>>> particular change introduced this. However it does seem that the
>>>>>>>>> behaviour is wrong because a) ballooning could give the guest more
>>>>>>>>> memory, b) there is lots of swap available which could be used as a
>>>>>>>>> fallback.
>>>>>> Keep in mind that swap with tmem is actually no more swap. Heh, that
>>>>>> sounds odd -but basically pages that are destined for swap end up
>>>>>> going in the tmem code which pipes them up to the hypervisor.
>>>>>>
>>>>>>>>> If other information could help or there are more tests that I
>>>>>>>>> could
>>>>>>>>> run then please let me know.
>>>>>>>> I presume you have enabled 'tmem' both in the hypervisor and in
>>>>>>>> the guest right?
>>>>>>> Yes, domU and dom0 both have the tmem module loaded and tmem
>>>>>>> tmem_dedup=on tmem_compress=on is given on the xen command line.
>>>>>> Excellent. The odd thing is that your swap is not used that much, but
>>>>>> it should be (as that is part of what the self-balloon is suppose to
>>>>>> do).
>>>>>>
>>>>>> Bob, you had a patch for the logic of how self-balloon is suppose
>>>>>> to account for the slab - would this be relevant to this problem?
>>>>>>
>>>>> Perhaps, I have attached the patch.
>>>>> James, could you please apply it and try your application again? You
>>>>> have to rebuild the guest kernel.
>>>>> Oh, and also take a look at whether frontswap is in use, you can check
>>>>> it by watching "cat /sys/kernel/debug/frontswap/*".
>>>> I have tested this patch with a workload where I have previously seen
>>>> failures and so far so good. I'll try to keep a guest with it stressed
>>>> to see if I do get any problems. I don't know if it is expected but I
>>> By the way, besides longer time of kswapd, is this patch work well
>>> during your stress testing?
>>>
>>> Have you seen the OOM killer triggered quite frequently again?(with
>>> selfshrink=true)
>>>
>>> Thanks,
>>> -Bob
>> It was looking good until today (selfshrink=true). The trace below is
>> during a compile of subversion, it looks like the memory has ballooned
>> to almost the maximum permissible but even under pressure the swap disk
>> has hardly come in to use.
>>
> So if without selfshrink the swap disk can be used a lot?
>
> If that's the case, I'm afraid the frontswap-selfshrink in
> xen-selfballoon did something incorrect.
>
> Could you please try this patch which make the frontswap-selfshrink
> slower and add a printk for debug.
> Please still keep selfshrink=true in your test but can with or without
> my previous patch.
> Thanks a lot!
>
The oom trace below was triggered during a compile of gcc. I have the
full dmesg from boot which shows all the printks, please let me know if
you would like to see that.
James
[504372.929678] frontswap selfshrink 5424 pages
[504403.018185] frontswap selfshrink 5152 pages
[504433.124844] frontswap selfshrink 4894 pages
[504468.335358] frontswap selfshrink 12791 pages
[504498.536467] frontswap selfshrink 12152 pages
[504533.813484] frontswap selfshrink 19751 pages
[504589.067299] frontswap selfshrink 19043 pages
[504638.441894] cc1plus invoked oom-killer: gfp_mask=0x280da, order=0,
oom_score_adj=0
[504638.441902] CPU: 1 PID: 21506 Comm: cc1plus Tainted: G W 3.12.5 #88
[504638.441905] ffff88001ca406f8 ffff880002c0fa58 ffffffff8148f200
ffff88001f90e8e8
[504638.441909] ffff88001ca401c0 ffff880002c0faf8 ffffffff8148ccf7
ffff880002c0faa8
[504638.441912] ffffffff810f8d97 ffff880002c0fa88 ffffffff81006dc8
ffff880002c0fa98
[504638.441917] Call Trace:
[504638.441928] [<ffffffff8148f200>] dump_stack+0x46/0x58
[504638.441932] [<ffffffff8148ccf7>] dump_header.isra.9+0x6d/0x1cc
[504638.441938] [<ffffffff810f8d97>] ? super_cache_count+0xa8/0xb8
[504638.441943] [<ffffffff81006dc8>] ? xen_clocksource_read+0x20/0x22
[504638.441946] [<ffffffff81006ea9>] ? xen_clocksource_get_cycles+0x9/0xb
[504638.441951] [<ffffffff81494abe>] ?
_raw_spin_unlock_irqrestore+0x47/0x62
[504638.441957] [<ffffffff81296b27>] ? ___ratelimit+0xcb/0xe8
[504638.441962] [<ffffffff810b2bbf>] oom_kill_process+0x70/0x2fd
[504638.441966] [<ffffffff810bca0e>] ? zone_reclaimable+0x11/0x1e
[504638.441970] [<ffffffff81048779>] ? has_ns_capability_noaudit+0x12/0x19
[504638.441973] [<ffffffff81048792>] ? has_capability_noaudit+0x12/0x14
[504638.441976] [<ffffffff810b32de>] out_of_memory+0x31b/0x34e
[504638.441981] [<ffffffff810b7438>] __alloc_pages_nodemask+0x65b/0x792
[504638.441985] [<ffffffff810e3da3>] alloc_pages_vma+0xd0/0x10c
[504638.441988] [<ffffffff81003f69>] ?
__raw_callee_save_xen_pmd_val+0x11/0x1e
[504638.441993] [<ffffffff810cf7cd>] handle_mm_fault+0x6d4/0xd54
[504638.441996] [<ffffffff81006dc8>] ? xen_clocksource_read+0x20/0x22
[504638.441999] [<ffffffff810115d2>] ? sched_clock+0x9/0xd
[504638.442005] [<ffffffff8106772f>] ? sched_clock_local+0x12/0x75
[504638.442008] [<ffffffff8106823b>] ? arch_vtime_task_switch+0x81/0x86
[504638.442013] [<ffffffff81037f40>] __do_page_fault+0x3d8/0x437
[504638.442016] [<ffffffff81062f1e>] ? finish_task_switch+0xe8/0x144
[504638.442018] [<ffffffff810115d2>] ? sched_clock+0x9/0xd
[504638.442021] [<ffffffff8106772f>] ? sched_clock_local+0x12/0x75
[504638.442025] [<ffffffff810a45cc>] ? __acct_update_integrals+0xb4/0xbf
[504638.442028] [<ffffffff810a493f>] ? acct_account_cputime+0x17/0x19
[504638.442030] [<ffffffff81067c28>] ? account_user_time+0x67/0x92
[504638.442033] [<ffffffff8106811b>] ? vtime_account_user+0x4d/0x52
[504638.442036] [<ffffffff81037fd8>] do_page_fault+0x1a/0x5a
[504638.442041] [<ffffffff810a065f>] ? rcu_user_enter+0xe/0x10
[504638.442044] [<ffffffff81495158>] page_fault+0x28/0x30
[504638.442046] Mem-Info:
[504638.442048] Node 0 DMA per-cpu:
[504638.442050] CPU 0: hi: 0, btch: 1 usd: 0
[504638.442052] CPU 1: hi: 0, btch: 1 usd: 0
[504638.442053] Node 0 DMA32 per-cpu:
[504638.442055] CPU 0: hi: 186, btch: 31 usd: 26
[504638.442057] CPU 1: hi: 186, btch: 31 usd: 72
[504638.442058] Node 0 Normal per-cpu:
[504638.442060] CPU 0: hi: 0, btch: 1 usd: 0
[504638.442061] CPU 1: hi: 0, btch: 1 usd: 0
[504638.442067] active_anon:103684 inactive_anon:103733 isolated_anon:0
active_file:10897 inactive_file:15059 isolated_file:0
unevictable:0 dirty:1 writeback:0 unstable:0
free:1164 slab_reclaimable:2356 slab_unreclaimable:3421
mapped:4413 shmem:200 pagetables:2699 bounce:0
free_cma:0 totalram:249264 balloontarget:315406
[504638.442069] Node 0 DMA free:1964kB min:88kB low:108kB high:132kB
active_anon:4664kB inactive_anon:4736kB active_file:628kB
inactive_file:1420kB unevictable:0kB isolated(anon):0kB
isolated(file):0kB present:15996kB managed:15084kB mlocked:0kB dirty:0kB
writeback:0kB mapped:228kB shmem:0kB slab_reclaimable:184kB
slab_unreclaimable:324kB kernel_stack:48kB pagetables:256kB unstable:0kB
bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:21824
all_unreclaimable? yes
[504638.442078] lowmem_reserve[]: 0 469 469 469
[504638.442081] Node 0 DMA32 free:2692kB min:2728kB low:3408kB
high:4092kB active_anon:175172kB inactive_anon:175184kB
active_file:21244kB inactive_file:35340kB unevictable:0kB
isolated(anon):0kB isolated(file):0kB present:507904kB managed:458288kB
mlocked:0kB dirty:0kB writeback:0kB mapped:7764kB shmem:676kB
slab_reclaimable:6628kB slab_unreclaimable:11496kB kernel_stack:1720kB
pagetables:8444kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB
pages_scanned:613279 all_unreclaimable? yes
[504638.442088] lowmem_reserve[]: 0 0 0 0
[504638.442091] Node 0 Normal free:0kB min:0kB low:0kB high:0kB
active_anon:234900kB inactive_anon:235012kB active_file:21716kB
inactive_file:23476kB unevictable:0kB isolated(anon):0kB
isolated(file):0kB present:524288kB managed:523684kB mlocked:0kB
dirty:4kB writeback:0kB mapped:9660kB shmem:124kB
slab_reclaimable:2612kB slab_unreclaimable:1864kB kernel_stack:136kB
pagetables:2096kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB
pages_scanned:773613 all_unreclaimable? yes
[504638.442098] lowmem_reserve[]: 0 0 0 0
[504638.442101] Node 0 DMA: 1*4kB (R) 3*8kB (R) 1*16kB (R) 0*32kB 0*64kB
1*128kB (R) 1*256kB (R) 1*512kB (R) 1*1024kB (R) 0*2048kB 0*4096kB = 1964kB
[504638.442114] Node 0 DMA32: 673*4kB (UE) 0*8kB 0*16kB 0*32kB 0*64kB
0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 2692kB
[504638.442123] Node 0 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB
0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
[504638.442131] 22294 total pagecache pages
[504638.442133] 11197 pages in swap cache
[504638.442135] Swap cache stats: add 3449125, delete 3437928, find
590699/956067
[504638.442136] Free swap = 1868108kB
[504638.442137] Total swap = 2097148kB
[504638.452335] 262143 pages RAM
[504638.452336] 6697 pages reserved
[504638.452337] 558286 pages shared
[504638.452338] 239987 pages non-shared
[504638.452340] [ pid ] uid tgid total_vm rss nr_ptes swapents
oom_score_adj name
<snip process list>
[504638.452596] Out of memory: Kill process 21506 (cc1plus) score 123 or
sacrifice child
[504638.452598] Killed process 21506 (cc1plus) total-vm:543168kB,
anon-rss:350300kB, file-rss:9520kB
[504659.367289] frontswap selfshrink 18428 pages
[504689.415694] frontswap selfshrink 479 pages
[504719.462401] frontswap selfshrink 456 pages
[504749.506876] frontswap selfshrink 434 pages
[504779.558204] frontswap selfshrink 406 pages
[504809.604425] frontswap selfshrink 386 pages
[504839.654849] frontswap selfshrink 367 pages
next prev parent reply other threads:[~2013-12-26 8:42 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-12-09 17:50 Kernel 3.11 / 3.12 OOM killer and Xen ballooning James Dingwall
2013-12-09 21:48 ` Konrad Rzeszutek Wilk
2013-12-10 14:52 ` James Dingwall
2013-12-10 15:27 ` Konrad Rzeszutek Wilk
2013-12-11 7:22 ` Bob Liu
2013-12-11 9:25 ` James Dingwall
2013-12-11 9:54 ` Bob Liu
2013-12-11 10:16 ` James Dingwall
2013-12-11 16:30 ` James Dingwall
2013-12-12 1:03 ` Bob Liu
2013-12-13 16:59 ` James Dingwall
2013-12-17 6:11 ` Bob Liu
2013-12-18 12:04 ` Bob Liu
2013-12-19 19:08 ` James Dingwall
2013-12-20 3:17 ` Bob Liu
2013-12-20 12:22 ` James Dingwall
2013-12-26 8:42 ` James Dingwall [this message]
2014-01-02 6:25 ` Bob Liu
2014-01-07 9:21 ` James Dingwall
2014-01-09 10:48 ` Bob Liu
2014-01-09 10:54 ` James Dingwall
2014-01-09 11:04 ` James Dingwall
2014-01-15 8:49 ` James Dingwall
2014-01-15 14:41 ` Bob Liu
2014-01-15 16:35 ` James Dingwall
2014-01-16 1:22 ` Bob Liu
2014-01-16 10:52 ` James Dingwall
2014-01-28 17:15 ` James Dingwall
2014-01-29 14:35 ` Bob Liu
2014-01-29 14:45 ` James Dingwall
2014-01-31 16:56 ` Konrad Rzeszutek Wilk
2014-02-03 9:49 ` Daniel Kiper
2014-02-03 10:30 ` Konrad Rzeszutek Wilk
2014-02-03 11:20 ` James Dingwall
2014-02-03 14:00 ` Daniel Kiper
2013-12-10 8:16 ` Jan Beulich
2013-12-10 14:01 ` James Dingwall
2013-12-10 14:25 ` Jan Beulich
2013-12-10 14:52 ` James Dingwall
2013-12-10 14:59 ` Jan Beulich
2013-12-10 15:16 ` James Dingwall
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=52BBEBEF.8040509@zynstra.com \
--to=james.dingwall@zynstra.com \
--cc=bob.liu@oracle.com \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).