All of lore.kernel.org
 help / color / mirror / Atom feed
From: James Dingwall <james.dingwall@zynstra.com>
To: Bob Liu <bob.liu@oracle.com>
Cc: xen-devel@lists.xen.org
Subject: Re: Kernel 3.11 / 3.12 OOM killer and Xen ballooning
Date: Tue, 7 Jan 2014 09:21:04 +0000	[thread overview]
Message-ID: <52CBC700.1060602@zynstra.com> (raw)
In-Reply-To: <52C50661.7060900@oracle.com>

Bob Liu wrote:
> On 12/26/2013 04:42 PM, James Dingwall wrote:
>> Bob Liu wrote:
>>> On 12/20/2013 03:08 AM, James Dingwall wrote:
>>>> Bob Liu wrote:
>>>>> On 12/12/2013 12:30 AM, James Dingwall wrote:
>>>>>> Bob Liu wrote:
>>>>>>> On 12/10/2013 11:27 PM, Konrad Rzeszutek Wilk wrote:
>>>>>>>> On Tue, Dec 10, 2013 at 02:52:40PM +0000, James Dingwall wrote:
>>>>>>>>> Konrad Rzeszutek Wilk wrote:
>>>>>>>>>> On Mon, Dec 09, 2013 at 05:50:29PM +0000, James Dingwall wrote:
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> Since 3.11 I have noticed that the OOM killer quite frequently
>>>>>>>>>>> triggers in my Xen guest domains which use ballooning to
>>>>>>>>>>> increase/decrease their memory allocation according to their
>>>>>>>>>>> requirements.  One example domain I have has a maximum memory
>>>>>>>>>>> setting of ~1.5Gb but it usually idles at ~300Mb, it is also
>>>>>>>>>>> configured with 2Gb swap which is almost 100% free.
>>>>>>>>>>>
>>>>>>>>>>> # free
>>>>>>>>>>>                  total       used       free     shared    buffers
>>>>>>>>>>> cached
>>>>>>>>>>> Mem:        272080     248108      23972          0 1448
>>>>>>>>>>> 63064
>>>>>>>>>>> -/+ buffers/cache:     183596      88484
>>>>>>>>>>> Swap:      2097148          8    2097140
>>>>>>>>>>>
>>>>>>>>>>> There is plenty of available free memory in the hypervisor to
>>>>>>>>>>> balloon to the maximum size:
>>>>>>>>>>> # xl info | grep free_mem
>>>>>>>>>>> free_memory            : 14923
>>>>>>>>>>>
>>>>>>>>>>> An example trace (they are always the same) from the oom
>>>>>>>>>>> killer in
>>>>>>>>>>> 3.12 is added below.  So far I have not been able to reproduce
>>>>>>>>>>> this
>>>>>>>>>>> at will so it is difficult to start bisecting it to see if a
>>>>>>>>>>> particular change introduced this.  However it does seem that the
>>>>>>>>>>> behaviour is wrong because a) ballooning could give the guest
>>>>>>>>>>> more
>>>>>>>>>>> memory, b) there is lots of swap available which could be used
>>>>>>>>>>> as a
>>>>>>>>>>> fallback.
>>>>>>>> Keep in mind that swap with tmem is actually no more swap. Heh, that
>>>>>>>> sounds odd -but basically pages that are destined for swap end up
>>>>>>>> going in the tmem code which pipes them up to the hypervisor.
>>>>>>>>
>>>>>>>>>>> If other information could help or there are more tests that I
>>>>>>>>>>> could
>>>>>>>>>>> run then please let me know.
>>>>>>>>>> I presume you have enabled 'tmem' both in the hypervisor and in
>>>>>>>>>> the guest right?
>>>>>>>>> Yes, domU and dom0 both have the tmem module loaded and  tmem
>>>>>>>>> tmem_dedup=on tmem_compress=on is given on the xen command line.
>>>>>>>> Excellent. The odd thing is that your swap is not used that much,
>>>>>>>> but
>>>>>>>> it should be (as that is part of what the self-balloon is suppose to
>>>>>>>> do).
>>>>>>>>
>>>>>>>> Bob, you had a patch for the logic of how self-balloon is suppose
>>>>>>>> to account for the slab - would this be relevant to this problem?
>>>>>>>>
>>>>>>> Perhaps, I have attached the patch.
>>>>>>> James, could you please apply it and try your application again? You
>>>>>>> have to rebuild the guest kernel.
>>>>>>> Oh, and also take a look at whether frontswap is in use, you can
>>>>>>> check
>>>>>>> it by watching "cat /sys/kernel/debug/frontswap/*".
>>>>>> I have tested this patch with a workload where I have previously seen
>>>>>> failures and so far so good.  I'll try to keep a guest with it
>>>>>> stressed
>>>>>> to see if I do get any problems.  I don't know if it is expected but I
>>>>> By the way, besides longer time of kswapd, is this patch work well
>>>>> during your stress testing?
>>>>>
>>>>> Have you seen the OOM killer triggered quite frequently again?(with
>>>>> selfshrink=true)
>>>>>
>>>>> Thanks,
>>>>> -Bob
>>>> It was looking good until today (selfshrink=true).  The trace below is
>>>> during a compile of subversion, it looks like the memory has ballooned
>>>> to almost the maximum permissible but even under pressure the swap disk
>>>> has hardly come in to use.
>>>>
>>> So if without selfshrink the swap disk can be used a lot?
>>>
>>> If that's the case, I'm afraid the frontswap-selfshrink in
>>> xen-selfballoon did something incorrect.
>>>
>>> Could you please try this patch which make the frontswap-selfshrink
>>> slower and add a printk for debug.
>>> Please still keep selfshrink=true in your test but can with or without
>>> my previous patch.
>>> Thanks a lot!
>>>
>> The oom trace below was triggered during a compile of gcc.  I have the
>> full dmesg from boot which shows all the printks, please let me know if
>> you would like to see that.
>>
> Sorry for the later response.
> Could you confirm that this problem doesn't exist if loading tmem with
> selfshrinking=0 during compile gcc? It seems that you are compiling
> difference packages during your testing.
> This will help to figure out whether selfshrinking is the root cause.
Got an oom with selfshrinking=0, again during a gcc compile. 
Unfortunately I don't have a single test case which demonstrates the 
problem but as I mentioned before it will generally show up under 
compiles of large packages such as glibc, kdelibs, gcc etc.

I don't know if this is a separate or related issue but over the 
holidays I also had a problem with six of the guests on my system where 
kswapd was running at 100% and had clocked up >9000 minutes of cpu time 
even though there was otherwise no load on them.  Of the guests I 
restarted yesterday in this state two have already got in to the same 
state again, they are running a kernel with the first patch that you sent.

/sys/module/tmem/parameters/cleancache Y
/sys/module/tmem/parameters/frontswap Y
/sys/module/tmem/parameters/selfballooning Y
/sys/module/tmem/parameters/selfshrinking N

James

[ 8212.940520] cc1plus invoked oom-killer: gfp_mask=0x200da, order=0, 
oom_score_adj=0
[ 8212.940529] CPU: 1 PID: 23678 Comm: cc1plus Tainted: G W    3.12.5 #88
[ 8212.940532]  ffff88001e38cdf8 ffff88000094f968 ffffffff8148f200 
ffff88001f90e8e8
[ 8212.940536]  ffff88001e38c8c0 ffff88000094fa08 ffffffff8148ccf7 
ffff88000094f9b8
[ 8212.940538]  ffffffff810f8d97 ffff88000094f998 ffffffff81006dc8 
ffff88000094f9a8
[ 8212.940542] Call Trace:
[ 8212.940554]  [<ffffffff8148f200>] dump_stack+0x46/0x58
[ 8212.940558]  [<ffffffff8148ccf7>] dump_header.isra.9+0x6d/0x1cc
[ 8212.940564]  [<ffffffff810f8d97>] ? super_cache_count+0xa8/0xb8
[ 8212.940569]  [<ffffffff81006dc8>] ? xen_clocksource_read+0x20/0x22
[ 8212.940573]  [<ffffffff81006ea9>] ? xen_clocksource_get_cycles+0x9/0xb
[ 8212.940578]  [<ffffffff81494abe>] ? _raw_spin_unlock_irqrestore+0x47/0x62
[ 8212.940583]  [<ffffffff81296b27>] ? ___ratelimit+0xcb/0xe8
[ 8212.940588]  [<ffffffff810b2bbf>] oom_kill_process+0x70/0x2fd
[ 8212.940592]  [<ffffffff810bca0e>] ? zone_reclaimable+0x11/0x1e
[ 8212.940597]  [<ffffffff81048779>] ? has_ns_capability_noaudit+0x12/0x19
[ 8212.940600]  [<ffffffff81048792>] ? has_capability_noaudit+0x12/0x14
[ 8212.940603]  [<ffffffff810b32de>] out_of_memory+0x31b/0x34e
[ 8212.940608]  [<ffffffff810b7438>] __alloc_pages_nodemask+0x65b/0x792
[ 8212.940612]  [<ffffffff810e3da3>] alloc_pages_vma+0xd0/0x10c
[ 8212.940617]  [<ffffffff810dd5a4>] read_swap_cache_async+0x70/0x120
[ 8212.940620]  [<ffffffff810dd6e4>] swapin_readahead+0x90/0xd4
[ 8212.940623]  [<ffffffff81005b35>] ? pte_mfn_to_pfn+0x59/0xcb
[ 8212.940627]  [<ffffffff810cf99d>] handle_mm_fault+0x8a4/0xd54
[ 8212.940630]  [<ffffffff81006dc8>] ? xen_clocksource_read+0x20/0x22
[ 8212.940634]  [<ffffffff810115d2>] ? sched_clock+0x9/0xd
[ 8212.940638]  [<ffffffff8106772f>] ? sched_clock_local+0x12/0x75
[ 8212.940641]  [<ffffffff8106823b>] ? arch_vtime_task_switch+0x81/0x86
[ 8212.940646]  [<ffffffff81037f40>] __do_page_fault+0x3d8/0x437
[ 8212.940649]  [<ffffffff81006dc8>] ? xen_clocksource_read+0x20/0x22
[ 8212.940652]  [<ffffffff810115d2>] ? sched_clock+0x9/0xd
[ 8212.940654]  [<ffffffff8106772f>] ? sched_clock_local+0x12/0x75
[ 8212.940658]  [<ffffffff810a45cc>] ? __acct_update_integrals+0xb4/0xbf
[ 8212.940661]  [<ffffffff810a493f>] ? acct_account_cputime+0x17/0x19
[ 8212.940663]  [<ffffffff81067c28>] ? account_user_time+0x67/0x92
[ 8212.940666]  [<ffffffff8106811b>] ? vtime_account_user+0x4d/0x52
[ 8212.940669]  [<ffffffff81037fd8>] do_page_fault+0x1a/0x5a
[ 8212.940674]  [<ffffffff810a065f>] ? rcu_user_enter+0xe/0x10
[ 8212.940677]  [<ffffffff81495158>] page_fault+0x28/0x30
[ 8212.940679] Mem-Info:
[ 8212.940681] Node 0 DMA per-cpu:
[ 8212.940684] CPU    0: hi:    0, btch:   1 usd:   0
[ 8212.940685] CPU    1: hi:    0, btch:   1 usd:   0
[ 8212.940686] Node 0 DMA32 per-cpu:
[ 8212.940688] CPU    0: hi:  186, btch:  31 usd: 116
[ 8212.940690] CPU    1: hi:  186, btch:  31 usd: 124
[ 8212.940691] Node 0 Normal per-cpu:
[ 8212.940693] CPU    0: hi:    0, btch:   1 usd:   0
[ 8212.940694] CPU    1: hi:    0, btch:   1 usd:   0
[ 8212.940700] active_anon:105765 inactive_anon:105882 isolated_anon:0
  active_file:8412 inactive_file:8612 isolated_file:0
  unevictable:0 dirty:0 writeback:0 unstable:0
  free:1143 slab_reclaimable:3575 slab_unreclaimable:3464
  mapped:3792 shmem:6 pagetables:2534 bounce:0
  free_cma:0 totalram:246132 balloontarget:306242
[ 8212.940702] Node 0 DMA free:1964kB min:88kB low:108kB high:132kB 
active_anon:5092kB inactive_anon:5328kB active_file:416kB 
inactive_file:608kB unevictable:0kB isolated(anon):0kB 
isolated(file):0kB present:15996kB managed:15392kB mlocked:0kB dirty:0kB 
writeback:0kB mapped:320kB shmem:0kB slab_reclaimable:252kB 
slab_unreclaimable:492kB kernel_stack:120kB pagetables:252kB 
unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB 
pages_scanned:26951 all_unreclaimable? yes
[ 8212.940711] lowmem_reserve[]: 0 469 469 469
[ 8212.940715] Node 0 DMA32 free:2608kB min:2728kB low:3408kB 
high:4092kB active_anon:181456kB inactive_anon:181528kB 
active_file:22296kB inactive_file:22644kB unevictable:0kB 
isolated(anon):0kB isolated(file):0kB present:507904kB managed:466364kB 
mlocked:0kB dirty:0kB writeback:0kB mapped:8628kB shmem:20kB 
slab_reclaimable:10756kB slab_unreclaimable:12548kB kernel_stack:1688kB 
pagetables:8876kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB 
pages_scanned:612393 all_unreclaimable? yes
[ 8212.940722] lowmem_reserve[]: 0 0 0 0
[ 8212.940725] Node 0 Normal free:0kB min:0kB low:0kB high:0kB 
active_anon:236512kB inactive_anon:236672kB active_file:10936kB 
inactive_file:11196kB unevictable:0kB isolated(anon):0kB 
isolated(file):0kB present:524288kB managed:502772kB mlocked:0kB 
dirty:0kB writeback:0kB mapped:6220kB shmem:4kB slab_reclaimable:3292kB 
slab_unreclaimable:816kB kernel_stack:64kB pagetables:1008kB 
unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB 
pages_scanned:745963 all_unreclaimable? yes
[ 8212.940732] lowmem_reserve[]: 0 0 0 0
[ 8212.940735] Node 0 DMA: 1*4kB (R) 0*8kB 4*16kB (R) 1*32kB (R) 1*64kB 
(R) 2*128kB (R) 0*256kB 1*512kB (R) 1*1024kB (R) 0*2048kB 0*4096kB = 1956kB
[ 8212.940747] Node 0 DMA32: 652*4kB (U) 0*8kB 0*16kB 0*32kB 0*64kB 
0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 2608kB
[ 8212.940756] Node 0 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 
0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
[ 8212.940765] 16847 total pagecache pages
[ 8212.940766] 8381 pages in swap cache
[ 8212.940768] Swap cache stats: add 741397, delete 733016, find 
250268/342284
[ 8212.940769] Free swap  = 1925576kB
[ 8212.940770] Total swap = 2097148kB
[ 8212.951044] 262143 pages RAM
[ 8212.951046] 11939 pages reserved
[ 8212.951047] 540820 pages shared
[ 8212.951048] 240248 pages non-shared
[ 8212.951050] [ pid ]   uid  tgid total_vm      rss nr_ptes swapents 
oom_score_adj name
<snip process list>
[ 8212.951310] Out of memory: Kill process 23721 (cc1plus) score 119 or 
sacrifice child
[ 8212.951313] Killed process 23721 (cc1plus) total-vm:530268kB, 
anon-rss:350980kB, file-rss:9408kB
[54810.683658] kjournald starting.  Commit interval 5 seconds
[54810.684381] EXT3-fs (xvda1): using internal journal
[54810.684402] EXT3-fs (xvda1): mounted filesystem with writeback data mode

  reply	other threads:[~2014-01-07  9:21 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-09 17:50 Kernel 3.11 / 3.12 OOM killer and Xen ballooning James Dingwall
2013-12-09 21:48 ` Konrad Rzeszutek Wilk
2013-12-10 14:52   ` James Dingwall
2013-12-10 15:27     ` Konrad Rzeszutek Wilk
2013-12-11  7:22       ` Bob Liu
2013-12-11  9:25         ` James Dingwall
2013-12-11  9:54           ` Bob Liu
2013-12-11 10:16             ` James Dingwall
2013-12-11 16:30         ` James Dingwall
2013-12-12  1:03           ` Bob Liu
2013-12-13 16:59             ` James Dingwall
2013-12-17  6:11               ` Bob Liu
2013-12-18 12:04           ` Bob Liu
2013-12-19 19:08             ` James Dingwall
2013-12-20  3:17               ` Bob Liu
2013-12-20 12:22                 ` James Dingwall
2013-12-26  8:42                 ` James Dingwall
2014-01-02  6:25                   ` Bob Liu
2014-01-07  9:21                     ` James Dingwall [this message]
2014-01-09 10:48                       ` Bob Liu
2014-01-09 10:54                         ` James Dingwall
2014-01-09 11:04                         ` James Dingwall
2014-01-15  8:49                         ` James Dingwall
2014-01-15 14:41                           ` Bob Liu
2014-01-15 16:35                             ` James Dingwall
2014-01-16  1:22                               ` Bob Liu
2014-01-16 10:52                                 ` James Dingwall
2014-01-28 17:15                                 ` James Dingwall
2014-01-29 14:35                                   ` Bob Liu
2014-01-29 14:45                                     ` James Dingwall
2014-01-31 16:56                                       ` Konrad Rzeszutek Wilk
2014-02-03  9:49                                         ` Daniel Kiper
2014-02-03 10:30                                           ` Konrad Rzeszutek Wilk
2014-02-03 11:20                                           ` James Dingwall
2014-02-03 14:00                                             ` Daniel Kiper
2013-12-10  8:16 ` Jan Beulich
2013-12-10 14:01   ` James Dingwall
2013-12-10 14:25     ` Jan Beulich
2013-12-10 14:52       ` James Dingwall
2013-12-10 14:59         ` Jan Beulich
2013-12-10 15:16           ` James Dingwall
  -- strict thread matches above, loose matches on Subject: below --
2013-11-21 11:28 James Dingwall

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52CBC700.1060602@zynstra.com \
    --to=james.dingwall@zynstra.com \
    --cc=bob.liu@oracle.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.