From mboxrd@z Thu Jan 1 00:00:00 1970 From: James Dingwall Subject: Re: Kernel 3.11 / 3.12 OOM killer and Xen ballooning Date: Thu, 26 Dec 2013 08:42:23 +0000 Message-ID: <52BBEBEF.8040509@zynstra.com> References: <52A602E5.3080300@zynstra.com> <20131209214816.GA3000@phenom.dumpdata.com> <52A72AB8.9060707@zynstra.com> <20131210152746.GF3184@phenom.dumpdata.com> <52A812B0.6060607@oracle.com> <52A89334.3090007@zynstra.com> <52B18F44.2030500@oracle.com> <52B3443F.5060704@zynstra.com> <52B3B6D7.50606@oracle.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <52B3B6D7.50606@oracle.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Bob Liu Cc: xen-devel@lists.xen.org List-Id: xen-devel@lists.xenproject.org Bob Liu wrote: > On 12/20/2013 03:08 AM, James Dingwall wrote: >> Bob Liu wrote: >>> On 12/12/2013 12:30 AM, James Dingwall wrote: >>>> Bob Liu wrote: >>>>> On 12/10/2013 11:27 PM, Konrad Rzeszutek Wilk wrote: >>>>>> On Tue, Dec 10, 2013 at 02:52:40PM +0000, James Dingwall wrote: >>>>>>> Konrad Rzeszutek Wilk wrote: >>>>>>>> On Mon, Dec 09, 2013 at 05:50:29PM +0000, James Dingwall wrote: >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> Since 3.11 I have noticed that the OOM killer quite frequently >>>>>>>>> triggers in my Xen guest domains which use ballooning to >>>>>>>>> increase/decrease their memory allocation according to their >>>>>>>>> requirements. One example domain I have has a maximum memory >>>>>>>>> setting of ~1.5Gb but it usually idles at ~300Mb, it is also >>>>>>>>> configured with 2Gb swap which is almost 100% free. >>>>>>>>> >>>>>>>>> # free >>>>>>>>> total used free shared buffers >>>>>>>>> cached >>>>>>>>> Mem: 272080 248108 23972 0 1448 63064 >>>>>>>>> -/+ buffers/cache: 183596 88484 >>>>>>>>> Swap: 2097148 8 2097140 >>>>>>>>> >>>>>>>>> There is plenty of available free memory in the hypervisor to >>>>>>>>> balloon to the maximum size: >>>>>>>>> # xl info | grep free_mem >>>>>>>>> free_memory : 14923 >>>>>>>>> >>>>>>>>> An example trace (they are always the same) from the oom killer in >>>>>>>>> 3.12 is added below. So far I have not been able to reproduce this >>>>>>>>> at will so it is difficult to start bisecting it to see if a >>>>>>>>> particular change introduced this. However it does seem that the >>>>>>>>> behaviour is wrong because a) ballooning could give the guest more >>>>>>>>> memory, b) there is lots of swap available which could be used as a >>>>>>>>> fallback. >>>>>> Keep in mind that swap with tmem is actually no more swap. Heh, that >>>>>> sounds odd -but basically pages that are destined for swap end up >>>>>> going in the tmem code which pipes them up to the hypervisor. >>>>>> >>>>>>>>> If other information could help or there are more tests that I >>>>>>>>> could >>>>>>>>> run then please let me know. >>>>>>>> I presume you have enabled 'tmem' both in the hypervisor and in >>>>>>>> the guest right? >>>>>>> Yes, domU and dom0 both have the tmem module loaded and tmem >>>>>>> tmem_dedup=on tmem_compress=on is given on the xen command line. >>>>>> Excellent. The odd thing is that your swap is not used that much, but >>>>>> it should be (as that is part of what the self-balloon is suppose to >>>>>> do). >>>>>> >>>>>> Bob, you had a patch for the logic of how self-balloon is suppose >>>>>> to account for the slab - would this be relevant to this problem? >>>>>> >>>>> Perhaps, I have attached the patch. >>>>> James, could you please apply it and try your application again? You >>>>> have to rebuild the guest kernel. >>>>> Oh, and also take a look at whether frontswap is in use, you can check >>>>> it by watching "cat /sys/kernel/debug/frontswap/*". >>>> I have tested this patch with a workload where I have previously seen >>>> failures and so far so good. I'll try to keep a guest with it stressed >>>> to see if I do get any problems. I don't know if it is expected but I >>> By the way, besides longer time of kswapd, is this patch work well >>> during your stress testing? >>> >>> Have you seen the OOM killer triggered quite frequently again?(with >>> selfshrink=true) >>> >>> Thanks, >>> -Bob >> It was looking good until today (selfshrink=true). The trace below is >> during a compile of subversion, it looks like the memory has ballooned >> to almost the maximum permissible but even under pressure the swap disk >> has hardly come in to use. >> > So if without selfshrink the swap disk can be used a lot? > > If that's the case, I'm afraid the frontswap-selfshrink in > xen-selfballoon did something incorrect. > > Could you please try this patch which make the frontswap-selfshrink > slower and add a printk for debug. > Please still keep selfshrink=true in your test but can with or without > my previous patch. > Thanks a lot! > The oom trace below was triggered during a compile of gcc. I have the full dmesg from boot which shows all the printks, please let me know if you would like to see that. James [504372.929678] frontswap selfshrink 5424 pages [504403.018185] frontswap selfshrink 5152 pages [504433.124844] frontswap selfshrink 4894 pages [504468.335358] frontswap selfshrink 12791 pages [504498.536467] frontswap selfshrink 12152 pages [504533.813484] frontswap selfshrink 19751 pages [504589.067299] frontswap selfshrink 19043 pages [504638.441894] cc1plus invoked oom-killer: gfp_mask=0x280da, order=0, oom_score_adj=0 [504638.441902] CPU: 1 PID: 21506 Comm: cc1plus Tainted: G W 3.12.5 #88 [504638.441905] ffff88001ca406f8 ffff880002c0fa58 ffffffff8148f200 ffff88001f90e8e8 [504638.441909] ffff88001ca401c0 ffff880002c0faf8 ffffffff8148ccf7 ffff880002c0faa8 [504638.441912] ffffffff810f8d97 ffff880002c0fa88 ffffffff81006dc8 ffff880002c0fa98 [504638.441917] Call Trace: [504638.441928] [] dump_stack+0x46/0x58 [504638.441932] [] dump_header.isra.9+0x6d/0x1cc [504638.441938] [] ? super_cache_count+0xa8/0xb8 [504638.441943] [] ? xen_clocksource_read+0x20/0x22 [504638.441946] [] ? xen_clocksource_get_cycles+0x9/0xb [504638.441951] [] ? _raw_spin_unlock_irqrestore+0x47/0x62 [504638.441957] [] ? ___ratelimit+0xcb/0xe8 [504638.441962] [] oom_kill_process+0x70/0x2fd [504638.441966] [] ? zone_reclaimable+0x11/0x1e [504638.441970] [] ? has_ns_capability_noaudit+0x12/0x19 [504638.441973] [] ? has_capability_noaudit+0x12/0x14 [504638.441976] [] out_of_memory+0x31b/0x34e [504638.441981] [] __alloc_pages_nodemask+0x65b/0x792 [504638.441985] [] alloc_pages_vma+0xd0/0x10c [504638.441988] [] ? __raw_callee_save_xen_pmd_val+0x11/0x1e [504638.441993] [] handle_mm_fault+0x6d4/0xd54 [504638.441996] [] ? xen_clocksource_read+0x20/0x22 [504638.441999] [] ? sched_clock+0x9/0xd [504638.442005] [] ? sched_clock_local+0x12/0x75 [504638.442008] [] ? arch_vtime_task_switch+0x81/0x86 [504638.442013] [] __do_page_fault+0x3d8/0x437 [504638.442016] [] ? finish_task_switch+0xe8/0x144 [504638.442018] [] ? sched_clock+0x9/0xd [504638.442021] [] ? sched_clock_local+0x12/0x75 [504638.442025] [] ? __acct_update_integrals+0xb4/0xbf [504638.442028] [] ? acct_account_cputime+0x17/0x19 [504638.442030] [] ? account_user_time+0x67/0x92 [504638.442033] [] ? vtime_account_user+0x4d/0x52 [504638.442036] [] do_page_fault+0x1a/0x5a [504638.442041] [] ? rcu_user_enter+0xe/0x10 [504638.442044] [] page_fault+0x28/0x30 [504638.442046] Mem-Info: [504638.442048] Node 0 DMA per-cpu: [504638.442050] CPU 0: hi: 0, btch: 1 usd: 0 [504638.442052] CPU 1: hi: 0, btch: 1 usd: 0 [504638.442053] Node 0 DMA32 per-cpu: [504638.442055] CPU 0: hi: 186, btch: 31 usd: 26 [504638.442057] CPU 1: hi: 186, btch: 31 usd: 72 [504638.442058] Node 0 Normal per-cpu: [504638.442060] CPU 0: hi: 0, btch: 1 usd: 0 [504638.442061] CPU 1: hi: 0, btch: 1 usd: 0 [504638.442067] active_anon:103684 inactive_anon:103733 isolated_anon:0 active_file:10897 inactive_file:15059 isolated_file:0 unevictable:0 dirty:1 writeback:0 unstable:0 free:1164 slab_reclaimable:2356 slab_unreclaimable:3421 mapped:4413 shmem:200 pagetables:2699 bounce:0 free_cma:0 totalram:249264 balloontarget:315406 [504638.442069] Node 0 DMA free:1964kB min:88kB low:108kB high:132kB active_anon:4664kB inactive_anon:4736kB active_file:628kB inactive_file:1420kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15996kB managed:15084kB mlocked:0kB dirty:0kB writeback:0kB mapped:228kB shmem:0kB slab_reclaimable:184kB slab_unreclaimable:324kB kernel_stack:48kB pagetables:256kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:21824 all_unreclaimable? yes [504638.442078] lowmem_reserve[]: 0 469 469 469 [504638.442081] Node 0 DMA32 free:2692kB min:2728kB low:3408kB high:4092kB active_anon:175172kB inactive_anon:175184kB active_file:21244kB inactive_file:35340kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:507904kB managed:458288kB mlocked:0kB dirty:0kB writeback:0kB mapped:7764kB shmem:676kB slab_reclaimable:6628kB slab_unreclaimable:11496kB kernel_stack:1720kB pagetables:8444kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:613279 all_unreclaimable? yes [504638.442088] lowmem_reserve[]: 0 0 0 0 [504638.442091] Node 0 Normal free:0kB min:0kB low:0kB high:0kB active_anon:234900kB inactive_anon:235012kB active_file:21716kB inactive_file:23476kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:524288kB managed:523684kB mlocked:0kB dirty:4kB writeback:0kB mapped:9660kB shmem:124kB slab_reclaimable:2612kB slab_unreclaimable:1864kB kernel_stack:136kB pagetables:2096kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:773613 all_unreclaimable? yes [504638.442098] lowmem_reserve[]: 0 0 0 0 [504638.442101] Node 0 DMA: 1*4kB (R) 3*8kB (R) 1*16kB (R) 0*32kB 0*64kB 1*128kB (R) 1*256kB (R) 1*512kB (R) 1*1024kB (R) 0*2048kB 0*4096kB = 1964kB [504638.442114] Node 0 DMA32: 673*4kB (UE) 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 2692kB [504638.442123] Node 0 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB [504638.442131] 22294 total pagecache pages [504638.442133] 11197 pages in swap cache [504638.442135] Swap cache stats: add 3449125, delete 3437928, find 590699/956067 [504638.442136] Free swap = 1868108kB [504638.442137] Total swap = 2097148kB [504638.452335] 262143 pages RAM [504638.452336] 6697 pages reserved [504638.452337] 558286 pages shared [504638.452338] 239987 pages non-shared [504638.452340] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [504638.452596] Out of memory: Kill process 21506 (cc1plus) score 123 or sacrifice child [504638.452598] Killed process 21506 (cc1plus) total-vm:543168kB, anon-rss:350300kB, file-rss:9520kB [504659.367289] frontswap selfshrink 18428 pages [504689.415694] frontswap selfshrink 479 pages [504719.462401] frontswap selfshrink 456 pages [504749.506876] frontswap selfshrink 434 pages [504779.558204] frontswap selfshrink 406 pages [504809.604425] frontswap selfshrink 386 pages [504839.654849] frontswap selfshrink 367 pages