From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx178.postini.com [74.125.245.178]) by kanga.kvack.org (Postfix) with SMTP id F003F6B002B for ; Thu, 11 Oct 2012 04:52:32 -0400 (EDT) Received: by mail-wi0-f179.google.com with SMTP id hq7so1539640wib.8 for ; Thu, 11 Oct 2012 01:52:31 -0700 (PDT) Message-ID: <507688CC.9000104@suse.cz> Date: Thu, 11 Oct 2012 10:52:28 +0200 From: Jiri Slaby MIME-Version: 1.0 Subject: kswapd0: wxcessive CPU usage Content-Type: text/plain; charset=ISO-8859-2 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: linux-mm@kvack.org, LKML , Jiri Slaby Hi, with 3.6.0-next-20121008, kswapd0 is spinning my CPU at 100% for 1 minute or so. If I try to suspend to RAM, this trace appears: kswapd0 R running task 0 577 2 0x00000000 0000000000000000 00000000000000c0 cccccccccccccccd ffff8801c4146800 ffff8801c4b15c88 ffffffff8116ee05 0000000000003e32 ffff8801c3a79000 ffff8801c4b15ca8 ffffffff8116fdf8 ffff8801c480f398 ffff8801c3a79000 Call Trace: [] ? put_super+0x25/0x40 [] ? grab_super_passive+0x24/0xa0 [] ? prune_super+0x149/0x1b0 [] ? shrink_slab+0xa1/0x2d0 [] ? kswapd+0x66d/0xb60 [] ? try_to_free_pages+0x180/0x180 [] ? kthread+0xc0/0xd0 [] ? kthread_create_on_node+0x130/0x130 [] ? ret_from_fork+0x7c/0x90 [] ? kthread_create_on_node+0x130/0x130 # cat /proc/vmstat nr_free_pages 239962 nr_inactive_anon 89825 nr_active_anon 711136 nr_inactive_file 60386 nr_active_file 46668 nr_unevictable 0 nr_mlock 0 nr_anon_pages 500678 nr_mapped 41319 nr_file_pages 319317 nr_dirty 45 nr_writeback 0 nr_slab_reclaimable 21909 nr_slab_unreclaimable 21598 nr_page_table_pages 12131 nr_kernel_stack 491 nr_unstable 0 nr_bounce 0 nr_vmscan_write 1674280 nr_vmscan_immediate_reclaim 301662 nr_writeback_temp 0 nr_isolated_anon 0 nr_isolated_file 0 nr_shmem 212263 nr_dirtied 10620227 nr_written 9260939 nr_anon_transparent_hugepages 172 nr_free_cma 0 nr_dirty_threshold 31459 nr_dirty_background_threshold 15729 pgpgin 31311778 pgpgout 38987552 pswpin 0 pswpout 0 pgalloc_dma 0 pgalloc_dma32 245169455 pgalloc_normal 279685864 pgalloc_movable 0 pgfree 537318727 pgactivate 13126755 pgdeactivate 2482953 pgfault 645947575 pgmajfault 193427 pgrefill_dma 0 pgrefill_dma32 1124272 pgrefill_normal 1998033 pgrefill_movable 0 pgsteal_kswapd_dma 0 pgsteal_kswapd_dma32 2531015 pgsteal_kswapd_normal 3403006 pgsteal_kswapd_movable 0 pgsteal_direct_dma 0 pgsteal_direct_dma32 362488 pgsteal_direct_normal 1134511 pgsteal_direct_movable 0 pgscan_kswapd_dma 0 pgscan_kswapd_dma32 2693620 pgscan_kswapd_normal 5836491 pgscan_kswapd_movable 0 pgscan_direct_dma 0 pgscan_direct_dma32 368374 pgscan_direct_normal 1658486 pgscan_direct_movable 0 pgscan_direct_throttle 0 pginodesteal 258410 slabs_scanned 86459392 kswapd_inodesteal 3907549 kswapd_low_wmark_hit_quickly 15408 kswapd_high_wmark_hit_quickly 23113 kswapd_skip_congestion_wait 10 pageoutrun 2165627235 allocstall 11256 pgrotated 219624 compact_blocks_moved 4862077 compact_pages_moved 1970005 compact_pagemigrate_failed 1726156 compact_stall 21275 compact_fail 6589 compact_success 14686 htlb_buddy_alloc_success 0 htlb_buddy_alloc_fail 0 unevictable_pgs_culled 2799 unevictable_pgs_scanned 0 unevictable_pgs_rescued 22563 unevictable_pgs_mlocked 22563 unevictable_pgs_munlocked 22563 unevictable_pgs_cleared 0 unevictable_pgs_stranded 0 thp_fault_alloc 18725 thp_fault_fallback 64868 thp_collapse_alloc 9216 thp_collapse_alloc_failed 2031 thp_split 2146 Any ideas what it could be? -- js suse labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx160.postini.com [74.125.245.160]) by kanga.kvack.org (Postfix) with SMTP id 6FC926B005D for ; Thu, 11 Oct 2012 09:45:12 -0400 (EDT) Subject: Re: kswapd0: wxcessive CPU usage In-Reply-To: Your message of "Thu, 11 Oct 2012 10:52:28 +0200." <507688CC.9000104@suse.cz> From: Valdis.Kletnieks@vt.edu References: <507688CC.9000104@suse.cz> Mime-Version: 1.0 Content-Type: multipart/signed; boundary="==_Exmh_1349963080_1985P"; micalg=pgp-sha1; protocol="application/pgp-signature" Content-Transfer-Encoding: 7bit Date: Thu, 11 Oct 2012 09:44:40 -0400 Message-ID: <106695.1349963080@turing-police.cc.vt.edu> Sender: owner-linux-mm@kvack.org List-ID: To: Jiri Slaby Cc: linux-mm@kvack.org, LKML , Jiri Slaby --==_Exmh_1349963080_1985P Content-Type: text/plain; charset="us-ascii" Content-Id: <106687.1349963080.1@turing-police.cc.vt.edu> On Thu, 11 Oct 2012 10:52:28 +0200, Jiri Slaby said: > Hi, > > with 3.6.0-next-20121008, kswapd0 is spinning my CPU at 100% for 1 > minute or so. > [] ? put_super+0x25/0x40 > [] ? grab_super_passive+0x24/0xa0 > [] ? prune_super+0x149/0x1b0 > [] ? shrink_slab+0xa1/0x2d0 > [] ? kswapd+0x66d/0xb60 > [] ? try_to_free_pages+0x180/0x180 > [] ? kthread+0xc0/0xd0 > [] ? kthread_create_on_node+0x130/0x130 > [] ? ret_from_fork+0x7c/0x90 > [] ? kthread_create_on_node+0x130/0x130 I don't know what it is, I haven't finished bisecting it - but I can confirm that I started seeing the same problem 2 or 3 weeks ago. Note that said call trace does *NOT* require a suspend - I don't do suspend on my laptop and I'm seeing kswapd burn CPU with similar traces. # cat /proc/31/stack [] grab_super_passive+0x44/0x76 [] prune_super+0x3a/0x13c [] shrink_slab+0x95/0x301 [] kswapd+0x5c8/0x902 [] kthread+0x9d/0xa5 [] ret_from_fork+0x7c/0x90 [] 0xffffffffffffffff # cat /proc/31/stack [] put_super+0x29/0x2d [] drop_super+0x1b/0x20 [] prune_super+0x12a/0x13c [] shrink_slab+0x95/0x301 [] kswapd+0x5c8/0x902 [] kthread+0x9d/0xa5 [] ret_from_fork+0x7c/0x90 [] 0xffffffffffffffff So at least we know we're not hallucinating. :) --==_Exmh_1349963080_1985P Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) Comment: Exmh version 2.5 07/13/2001 iQIVAwUBUHbNSAdmEQWDXROgAQIYfBAAgeHEAz7FgfpzNDpcV4yZGL2B+VHPHovO Y8TjqAVUB4YEVt7NV215wuh2hX+W21ycqdw6yIJZKipP680Qi+MN+8KO9ayie1nQ yrE/SDPlGzZjZyKctRLrKKV/GLcw8H9TsVNC46L7s1OyguW9GBS+7KMg2LBIRY6A LDoutg1c2WrFp9EmeGOy2tvSmmSjjC08hUvezQwP7POtX7iDdjcTjvuoX9KwZErL EoyzU32Kehwh6xjVTipAd1glIsjR/qeR9EsVBY2yNJN+jUEouF6TYIpod0zumujo RNTkBYY5KlCd0lJJ924wqP9+YyTM9GoGfgCyvOA8uVQdVwrtYv04PF2szLGqDGSE xk8G189iE/K1RsMFXvWOnXkHfylf5H4eveTYSWLvDZXr4c8rvQASosTi/u6Qwaa+ 3hC30YoHe5Jps+fD3eY3vZeevo+KGrULq0p6bfNOOcMBARFkb5lViytI0RHSfEZM uBSyBD67vHEQ0FKskyqyugTJPjoh3clFzedJTbsadYY7mi3b52t8TSjcYCJfBDhj hwCDf9rSNLbyvWoJviz3P2MmqgvnDHrX7zX5h6z+iBxnJPZHT3FnwfrrokF3Pk50 HOME2lvw8oz/Je96tALRIWeJ4GzfIeA9F1ZIUGhTIP1fSJqvRuf6QdQrZJ3VM4V+ 8vhVI9NXC+g= =bkF7 -----END PGP SIGNATURE----- --==_Exmh_1349963080_1985P-- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx122.postini.com [74.125.245.122]) by kanga.kvack.org (Postfix) with SMTP id D225D6B002B for ; Thu, 11 Oct 2012 11:34:28 -0400 (EDT) Received: by mail-we0-f169.google.com with SMTP id u3so1293278wey.14 for ; Thu, 11 Oct 2012 08:34:27 -0700 (PDT) Message-ID: <5076E700.2030909@suse.cz> Date: Thu, 11 Oct 2012 17:34:24 +0200 From: Jiri Slaby MIME-Version: 1.0 Subject: Re: kswapd0: wxcessive CPU usage References: <507688CC.9000104@suse.cz> <106695.1349963080@turing-police.cc.vt.edu> In-Reply-To: <106695.1349963080@turing-police.cc.vt.edu> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Valdis.Kletnieks@vt.edu Cc: linux-mm@kvack.org, LKML , Jiri Slaby On 10/11/2012 03:44 PM, Valdis.Kletnieks@vt.edu wrote: > So at least we know we're not hallucinating. :) Just a thought? Do you have raid? -- js suse labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx189.postini.com [74.125.245.189]) by kanga.kvack.org (Postfix) with SMTP id 7826A6B002B for ; Thu, 11 Oct 2012 13:57:23 -0400 (EDT) Subject: Re: kswapd0: wxcessive CPU usage In-Reply-To: Your message of "Thu, 11 Oct 2012 17:34:24 +0200." <5076E700.2030909@suse.cz> From: Valdis.Kletnieks@vt.edu References: <507688CC.9000104@suse.cz> <106695.1349963080@turing-police.cc.vt.edu> <5076E700.2030909@suse.cz> Mime-Version: 1.0 Content-Type: multipart/signed; boundary="==_Exmh_1349978211_1985P"; micalg=pgp-sha1; protocol="application/pgp-signature" Content-Transfer-Encoding: 7bit Date: Thu, 11 Oct 2012 13:56:51 -0400 Message-ID: <118079.1349978211@turing-police.cc.vt.edu> Sender: owner-linux-mm@kvack.org List-ID: To: Jiri Slaby Cc: linux-mm@kvack.org, LKML , Jiri Slaby --==_Exmh_1349978211_1985P Content-Type: text/plain; charset=us-ascii On Thu, 11 Oct 2012 17:34:24 +0200, Jiri Slaby said: > On 10/11/2012 03:44 PM, Valdis.Kletnieks@vt.edu wrote: > > So at least we know we're not hallucinating. :) > > Just a thought? Do you have raid? Nope, just a 160G laptop spinning hard drive. Filesystems are ext4 on LVM on a cryptoLUKS partition on /dev/sda2. --==_Exmh_1349978211_1985P Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) Comment: Exmh version 2.5 07/13/2001 iQIVAwUBUHcIYwdmEQWDXROgAQJyShAAgujgX/Bqb0muTiM7zQML12rouyjrzGJ9 IrOmR/ckASQiO7mYpGJscQYv/jGSYB248K9ethpm/4lMFuRj/1vXv3se7jhzuwzG lBa7CQ43+1tq7utPhRBcGF6uvo/2Ofep078TxX5wp4w5tfma8QamH2Ol7kDxnuMj TYlhRYyNXVHcTAfVGDghVsY3PBjx8xkDoxNiddkYUPmdM+Ul3EDQerJYElkbks0q 8lCUhCk7RyG7UfD/HdmxSdhpXC937cqvHWIfR2lvKsGV5TOaYGxpvZUo57qp1Ls7 iQTJ5tqZAKelEiUoC+lgsPxzxh4oMMD65Iv7yPBnRrQxnQ4E2RfocUfwJZ4sTIxg MYm+I4VXoIIOEurvyx3xj/1iBEOFrW2YmLCbFxSOXJaAJn7yrhcCNeeBTII1lwrL zKPluU7EcjI2T7PIqVk8nUjTrK1kEYgyC+MR1RVbsFdlyjlRyVzhdCYQ0kVoAuM7 qMKGkvj/SRyLxkoWpK84/xn+Skcof4yPpz9DxGtdET2SDZQ6Qajmve6P4wNehkF+ U6R8lDkxEqHqgSO8xVkIVOwfF1MADByPCmcNfepep5kQY5wTuOM+3b15L1saPu54 um5CXS+Fifd4WF65MJ8E//yF1LiljC1tZVdHHD580B9XbvO1oUagnMp7lBNLBFXB FtnM3eLt06I= =0Zb8 -----END PGP SIGNATURE----- --==_Exmh_1349978211_1985P-- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx106.postini.com [74.125.245.106]) by kanga.kvack.org (Postfix) with SMTP id 7E41E6B005D for ; Thu, 11 Oct 2012 13:59:38 -0400 (EDT) Received: by mail-wg0-f45.google.com with SMTP id dq12so1424972wgb.26 for ; Thu, 11 Oct 2012 10:59:36 -0700 (PDT) Message-ID: <50770905.5070904@suse.cz> Date: Thu, 11 Oct 2012 19:59:33 +0200 From: Jiri Slaby MIME-Version: 1.0 Subject: Re: kswapd0: wxcessive CPU usage References: <507688CC.9000104@suse.cz> <106695.1349963080@turing-police.cc.vt.edu> <5076E700.2030909@suse.cz> <118079.1349978211@turing-police.cc.vt.edu> In-Reply-To: <118079.1349978211@turing-police.cc.vt.edu> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Valdis.Kletnieks@vt.edu Cc: Jiri Slaby , linux-mm@kvack.org, LKML On 10/11/2012 07:56 PM, Valdis.Kletnieks@vt.edu wrote: > On Thu, 11 Oct 2012 17:34:24 +0200, Jiri Slaby said: >> On 10/11/2012 03:44 PM, Valdis.Kletnieks@vt.edu wrote: >>> So at least we know we're not hallucinating. :) >> >> Just a thought? Do you have raid? > > Nope, just a 160G laptop spinning hard drive. Filesystems are ext4 > on LVM on a cryptoLUKS partition on /dev/sda2. Ok, it's maybe compaction. Do you have CONFIG_COMPACTION=y? -- js suse labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx127.postini.com [74.125.245.127]) by kanga.kvack.org (Postfix) with SMTP id 545916B005A for ; Thu, 11 Oct 2012 14:20:03 -0400 (EDT) Subject: Re: kswapd0: wxcessive CPU usage In-Reply-To: Your message of "Thu, 11 Oct 2012 19:59:33 +0200." <50770905.5070904@suse.cz> From: Valdis.Kletnieks@vt.edu References: <507688CC.9000104@suse.cz> <106695.1349963080@turing-police.cc.vt.edu> <5076E700.2030909@suse.cz> <118079.1349978211@turing-police.cc.vt.edu> <50770905.5070904@suse.cz> Mime-Version: 1.0 Content-Type: multipart/signed; boundary="==_Exmh_1349979570_1985P"; micalg=pgp-sha1; protocol="application/pgp-signature" Content-Transfer-Encoding: 7bit Date: Thu, 11 Oct 2012 14:19:30 -0400 Message-ID: <119175.1349979570@turing-police.cc.vt.edu> Sender: owner-linux-mm@kvack.org List-ID: To: Jiri Slaby Cc: Jiri Slaby , linux-mm@kvack.org, LKML --==_Exmh_1349979570_1985P Content-Type: text/plain; charset="us-ascii" Content-Id: <119168.1349979570.1@turing-police.cc.vt.edu> On Thu, 11 Oct 2012 19:59:33 +0200, Jiri Slaby said: > On 10/11/2012 07:56 PM, Valdis.Kletnieks@vt.edu wrote: > > On Thu, 11 Oct 2012 17:34:24 +0200, Jiri Slaby said: > >> On 10/11/2012 03:44 PM, Valdis.Kletnieks@vt.edu wrote: > >>> So at least we know we're not hallucinating. :) > >> > >> Just a thought? Do you have raid? > > > > Nope, just a 160G laptop spinning hard drive. Filesystems are ext4 > > on LVM on a cryptoLUKS partition on /dev/sda2. > > Ok, it's maybe compaction. Do you have CONFIG_COMPACTION=y? # zgrep COMPAC /proc/config.gz CONFIG_COMPACTION=y Hope that tells you something useful. --==_Exmh_1349979570_1985P Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) Comment: Exmh version 2.5 07/13/2001 iQIVAwUBUHcNsgdmEQWDXROgAQKPxxAAi/iFROmawBnICQaf90Mxx3QxFj7Wunob uYgQ+31FZ2txxUvTD/X3CNBhrQMUo9RKProRG1YrmzqP21q6JAUkYyce8dT7F/XS bQO7Z31Qe0zm3OJNVou1HVtOnjoqGvdiG0W6eQfHUwrEQ1Pv12xgYb3nONDkqm+/ OpSOPJ7QuRkQMvgGWdXuwZgVn196IloXaicPHb/oZS9pwLEck6dWMG9tCM4LP1lN 8IfYja7p7jXySaoC7N9E6ZU9ZjtdtnnADLK9EHLt1uvTZT3DIu54hXPL/q4pwZBj 6tVSc+5fpMeDEkDZra6xOMvysCOMi0DmmZx38UHc0BKWUhOsfTXZgQl4+IbkCEP4 7p7rb1Y69dHTNeB1Q6+AXW1vdy/gPjicsgh+4lGeqqn16qtARl8uJTtS69YFXs/D pMhnCMORoVrqDnRk4NkwQALmVaux1xUkEltRhLH8O4lVmA87F3v16OSi+OOwT0cb sARzr/6ZRFpeMf2A/lLn1JPjYvIIffBg0no51MElExIZpq/qobn1k8V5oBVTXebr ZP3J2RFavlY/rRqmwlRwUUA6ZsF4fNYHSF02jrgS0E+qDkgrXE8bVpLAOm5w1/nP LULidsf9Xr5K1W81JLQZX533NrFEIj/ZRwWJlOivO0jcDUxorMYs5C4+g3PgnnsC 5kYiU0eGqWo= =7uB1 -----END PGP SIGNATURE----- --==_Exmh_1349979570_1985P-- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx143.postini.com [74.125.245.143]) by kanga.kvack.org (Postfix) with SMTP id C90DB6B005A for ; Thu, 11 Oct 2012 18:08:17 -0400 (EDT) Received: by mail-wi0-f173.google.com with SMTP id hm4so7553804wib.8 for ; Thu, 11 Oct 2012 15:08:16 -0700 (PDT) Message-ID: <5077434D.7080008@suse.cz> Date: Fri, 12 Oct 2012 00:08:13 +0200 From: Jiri Slaby MIME-Version: 1.0 Subject: Re: kswapd0: excessive CPU usage References: <507688CC.9000104@suse.cz> <106695.1349963080@turing-police.cc.vt.edu> <5076E700.2030909@suse.cz> <118079.1349978211@turing-police.cc.vt.edu> <50770905.5070904@suse.cz> <119175.1349979570@turing-police.cc.vt.edu> In-Reply-To: <119175.1349979570@turing-police.cc.vt.edu> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Valdis.Kletnieks@vt.edu Cc: Jiri Slaby , linux-mm@kvack.org, LKML On 10/11/2012 08:19 PM, Valdis.Kletnieks@vt.edu wrote: > # zgrep COMPAC /proc/config.gz > CONFIG_COMPACTION=y > > Hope that tells you something useful. It just supports my another theory. This seems to fix it for me: --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1830,8 +1830,8 @@ static inline bool should_continue_reclaim(struct lruvec *lruvec, */ pages_for_compaction = (2UL << sc->order); - pages_for_compaction = scale_for_compaction(pages_for_compaction, - lruvec, sc); +/* pages_for_compaction = scale_for_compaction(pages_for_compaction, + lruvec, sc);*/ inactive_lru_pages = get_lru_size(lruvec, LRU_INACTIVE_FILE); if (nr_swap_pages > 0) inactive_lru_pages += get_lru_size(lruvec, LRU_INACTIVE_ANON); And for you? (It's an effective revert of "mm: vmscan: scale number of pages reclaimed by reclaim/compaction based on failures".) regards, -- js suse labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx200.postini.com [74.125.245.200]) by kanga.kvack.org (Postfix) with SMTP id 277B76B005A for ; Thu, 11 Oct 2012 18:14:15 -0400 (EDT) Date: Thu, 11 Oct 2012 15:14:13 -0700 From: Andrew Morton Subject: Re: kswapd0: wxcessive CPU usage Message-Id: <20121011151413.3ab58542.akpm@linux-foundation.org> In-Reply-To: <507688CC.9000104@suse.cz> References: <507688CC.9000104@suse.cz> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Jiri Slaby Cc: linux-mm@kvack.org, LKML , Jiri Slaby On Thu, 11 Oct 2012 10:52:28 +0200 Jiri Slaby wrote: > with 3.6.0-next-20121008, kswapd0 is spinning my CPU at 100% for 1 > minute or so. If I try to suspend to RAM, this trace appears: > kswapd0 R running task 0 577 2 0x00000000 > 0000000000000000 00000000000000c0 cccccccccccccccd ffff8801c4146800 > ffff8801c4b15c88 ffffffff8116ee05 0000000000003e32 ffff8801c3a79000 > ffff8801c4b15ca8 ffffffff8116fdf8 ffff8801c480f398 ffff8801c3a79000 > Call Trace: > [] ? put_super+0x25/0x40 > [] ? grab_super_passive+0x24/0xa0 > [] ? prune_super+0x149/0x1b0 > [] ? shrink_slab+0xa1/0x2d0 > [] ? kswapd+0x66d/0xb60 > [] ? try_to_free_pages+0x180/0x180 > [] ? kthread+0xc0/0xd0 > [] ? kthread_create_on_node+0x130/0x130 > [] ? ret_from_fork+0x7c/0x90 > [] ? kthread_create_on_node+0x130/0x130 Could you please do a sysrq-T a few times while it's spinning, to confirm that this trace is consistently the culprit? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx152.postini.com [74.125.245.152]) by kanga.kvack.org (Postfix) with SMTP id F1E3A6B002B for ; Thu, 11 Oct 2012 18:26:04 -0400 (EDT) Received: by mail-we0-f169.google.com with SMTP id u3so1553118wey.14 for ; Thu, 11 Oct 2012 15:26:03 -0700 (PDT) Message-ID: <50774779.8000005@suse.cz> Date: Fri, 12 Oct 2012 00:26:01 +0200 From: Jiri Slaby MIME-Version: 1.0 Subject: Re: kswapd0: wxcessive CPU usage References: <507688CC.9000104@suse.cz> <20121011151413.3ab58542.akpm@linux-foundation.org> In-Reply-To: <20121011151413.3ab58542.akpm@linux-foundation.org> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Andrew Morton Cc: Jiri Slaby , linux-mm@kvack.org, LKML On 10/12/2012 12:14 AM, Andrew Morton wrote: > Could you please do a sysrq-T a few times while it's spinning, to > confirm that this trace is consistently the culprit? For me yes, shrink_slab is in the most of the traces. -- js suse labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx145.postini.com [74.125.245.145]) by kanga.kvack.org (Postfix) with SMTP id 3C6F56B0068 for ; Fri, 12 Oct 2012 08:38:01 -0400 (EDT) Received: by mail-we0-f169.google.com with SMTP id u3so1949726wey.14 for ; Fri, 12 Oct 2012 05:38:00 -0700 (PDT) Message-ID: <50780F26.7070007@suse.cz> Date: Fri, 12 Oct 2012 14:37:58 +0200 From: Jiri Slaby MIME-Version: 1.0 Subject: Re: kswapd0: excessive CPU usage References: <507688CC.9000104@suse.cz> <106695.1349963080@turing-police.cc.vt.edu> <5076E700.2030909@suse.cz> <118079.1349978211@turing-police.cc.vt.edu> <50770905.5070904@suse.cz> <119175.1349979570@turing-police.cc.vt.edu> <5077434D.7080008@suse.cz> In-Reply-To: <5077434D.7080008@suse.cz> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Valdis.Kletnieks@vt.edu Cc: Jiri Slaby , linux-mm@kvack.org, LKML , Mel Gorman , Andrew Morton On 10/12/2012 12:08 AM, Jiri Slaby wrote: > (It's an effective revert of "mm: vmscan: scale number of pages > reclaimed by reclaim/compaction based on failures".) Given kswapd had hours of runtime in ps/top output yesterday in the morning and after the revert it's now 2 minutes in sum for the last 24h, I would say, it's gone. Mel, you wrote me it's unlikely the patch, but not impossible in the end. Can you take a look, please? If you need some trace-cmd output or anything, just let us know. This is x86_64, 6G of RAM, no swap. FWIW EXT4, SLUB, COMPACTION all enabled/used. thanks, -- js suse labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx171.postini.com [74.125.245.171]) by kanga.kvack.org (Postfix) with SMTP id E59F76B00A1 for ; Fri, 12 Oct 2012 09:57:31 -0400 (EDT) Date: Fri, 12 Oct 2012 14:57:26 +0100 From: Mel Gorman Subject: Re: kswapd0: excessive CPU usage Message-ID: <20121012135726.GY29125@suse.de> References: <507688CC.9000104@suse.cz> <106695.1349963080@turing-police.cc.vt.edu> <5076E700.2030909@suse.cz> <118079.1349978211@turing-police.cc.vt.edu> <50770905.5070904@suse.cz> <119175.1349979570@turing-police.cc.vt.edu> <5077434D.7080008@suse.cz> <50780F26.7070007@suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <50780F26.7070007@suse.cz> Sender: owner-linux-mm@kvack.org List-ID: To: Jiri Slaby Cc: Valdis.Kletnieks@vt.edu, Jiri Slaby , linux-mm@kvack.org, LKML , Andrew Morton On Fri, Oct 12, 2012 at 02:37:58PM +0200, Jiri Slaby wrote: > On 10/12/2012 12:08 AM, Jiri Slaby wrote: > > (It's an effective revert of "mm: vmscan: scale number of pages > > reclaimed by reclaim/compaction based on failures".) > > Given kswapd had hours of runtime in ps/top output yesterday in the > morning and after the revert it's now 2 minutes in sum for the last 24h, > I would say, it's gone. > > Mel, you wrote me it's unlikely the patch, but not impossible in the > end. Can you take a look, please? If you need some trace-cmd output or > anything, just let us know. > > This is x86_64, 6G of RAM, no swap. FWIW EXT4, SLUB, COMPACTION all > enabled/used. > Can you monitor the behaviour of this patch please? Please keep a particular eye on kswapd activity and the amount of free memory. If free memory is spiking it might indicate that kswapd is still too aggressive with the loss of the __GFP_NO_KSWAPD flag. One way to tell is to record /proc/vmstat over time and see what the pgsteal_* figures look like. If they are climbing aggressively during what should be normal usage then it might show that kswapd is still too aggressive when asked to reclaim for THP. Thanks very much. ---8<--- mm: vmscan: scale number of pages reclaimed by reclaim/compaction only in direct reclaim Jiri Slaby reported the following: (It's an effective revert of "mm: vmscan: scale number of pages reclaimed by reclaim/compaction based on failures".) Given kswapd had hours of runtime in ps/top output yesterday in the morning and after the revert it's now 2 minutes in sum for the last 24h, I would say, it's gone. The intention of the patch in question was to compensate for the loss of lumpy reclaim. Part of the reason lumpy reclaim worked is because it aggressively reclaimed pages and this patch was meant to be a sane compromise. When compaction fails, it gets deferred and both compaction and reclaim/compaction is deferred avoid excessive reclaim. However, since commit c6543459 (mm: remove __GFP_NO_KSWAPD), kswapd is woken up each time and continues reclaiming which was not taken into account when the patch was developed. As it is not taking deferred compaction into account in this path it scans aggressively before falling out and making the compaction_deferred check in compaction_ready. This patch avoids kswapd scaling pages for reclaim and leaves the aggressive reclaim to the process attempting the THP allocation. Signed-off-by: Mel Gorman --- mm/vmscan.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 2624edc..2b7edfa 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1763,14 +1763,20 @@ static bool in_reclaim_compaction(struct scan_control *sc) #ifdef CONFIG_COMPACTION /* * If compaction is deferred for sc->order then scale the number of pages - * reclaimed based on the number of consecutive allocation failures + * reclaimed based on the number of consecutive allocation failures. This + * scaling only happens for direct reclaim as it is about to attempt + * compaction. If compaction fails, future allocations will be deferred + * and reclaim avoided. On the other hand, kswapd does not take compaction + * deferral into account so if it scaled, it could scan excessively even + * though allocations are temporarily not being attempted. */ static unsigned long scale_for_compaction(unsigned long pages_for_compaction, struct lruvec *lruvec, struct scan_control *sc) { struct zone *zone = lruvec_zone(lruvec); - if (zone->compact_order_failed <= sc->order) + if (zone->compact_order_failed <= sc->order && + !current_is_kswapd()) pages_for_compaction <<= zone->compact_defer_shift; return pages_for_compaction; } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx106.postini.com [74.125.245.106]) by kanga.kvack.org (Postfix) with SMTP id BB88F6B007D for ; Mon, 15 Oct 2012 05:54:18 -0400 (EDT) Received: by mail-we0-f169.google.com with SMTP id u3so3498009wey.14 for ; Mon, 15 Oct 2012 02:54:17 -0700 (PDT) Message-ID: <507BDD45.1070705@suse.cz> Date: Mon, 15 Oct 2012 11:54:13 +0200 From: Jiri Slaby MIME-Version: 1.0 Subject: Re: kswapd0: excessive CPU usage References: <507688CC.9000104@suse.cz> <106695.1349963080@turing-police.cc.vt.edu> <5076E700.2030909@suse.cz> <118079.1349978211@turing-police.cc.vt.edu> <50770905.5070904@suse.cz> <119175.1349979570@turing-police.cc.vt.edu> <5077434D.7080008@suse.cz> <50780F26.7070007@suse.cz> <20121012135726.GY29125@suse.de> In-Reply-To: <20121012135726.GY29125@suse.de> Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Mel Gorman Cc: Valdis.Kletnieks@vt.edu, Jiri Slaby , linux-mm@kvack.org, LKML , Andrew Morton On 10/12/2012 03:57 PM, Mel Gorman wrote: > mm: vmscan: scale number of pages reclaimed by reclaim/compaction only in direct reclaim > > Jiri Slaby reported the following: > > (It's an effective revert of "mm: vmscan: scale number of pages > reclaimed by reclaim/compaction based on failures".) > Given kswapd had hours of runtime in ps/top output yesterday in the > morning and after the revert it's now 2 minutes in sum for the last 24h, > I would say, it's gone. > > The intention of the patch in question was to compensate for the loss of > lumpy reclaim. Part of the reason lumpy reclaim worked is because it > aggressively reclaimed pages and this patch was meant to be a > sane compromise. > > When compaction fails, it gets deferred and both compaction and > reclaim/compaction is deferred avoid excessive reclaim. However, since > commit c6543459 (mm: remove __GFP_NO_KSWAPD), kswapd is woken up each time > and continues reclaiming which was not taken into account when the patch > was developed. > > As it is not taking deferred compaction into account in this path it scans > aggressively before falling out and making the compaction_deferred check in > compaction_ready. This patch avoids kswapd scaling pages for reclaim and > leaves the aggressive reclaim to the process attempting the THP > allocation. > > Signed-off-by: Mel Gorman > --- > mm/vmscan.c | 10 ++++++++-- > 1 file changed, 8 insertions(+), 2 deletions(-) > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 2624edc..2b7edfa 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -1763,14 +1763,20 @@ static bool in_reclaim_compaction(struct scan_control *sc) > #ifdef CONFIG_COMPACTION > /* > * If compaction is deferred for sc->order then scale the number of pages > - * reclaimed based on the number of consecutive allocation failures > + * reclaimed based on the number of consecutive allocation failures. This > + * scaling only happens for direct reclaim as it is about to attempt > + * compaction. If compaction fails, future allocations will be deferred > + * and reclaim avoided. On the other hand, kswapd does not take compaction > + * deferral into account so if it scaled, it could scan excessively even > + * though allocations are temporarily not being attempted. > */ > static unsigned long scale_for_compaction(unsigned long pages_for_compaction, > struct lruvec *lruvec, struct scan_control *sc) > { > struct zone *zone = lruvec_zone(lruvec); > > - if (zone->compact_order_failed <= sc->order) > + if (zone->compact_order_failed <= sc->order && > + !current_is_kswapd()) > pages_for_compaction <<= zone->compact_defer_shift; > return pages_for_compaction; > } Yes, applying this instead of the revert fixes the issue as well. thanks, -- js suse labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx114.postini.com [74.125.245.114]) by kanga.kvack.org (Postfix) with SMTP id 5BE236B0085 for ; Mon, 15 Oct 2012 07:09:41 -0400 (EDT) Date: Mon, 15 Oct 2012 12:09:37 +0100 From: Mel Gorman Subject: Re: kswapd0: excessive CPU usage Message-ID: <20121015110937.GE29125@suse.de> References: <507688CC.9000104@suse.cz> <106695.1349963080@turing-police.cc.vt.edu> <5076E700.2030909@suse.cz> <118079.1349978211@turing-police.cc.vt.edu> <50770905.5070904@suse.cz> <119175.1349979570@turing-police.cc.vt.edu> <5077434D.7080008@suse.cz> <50780F26.7070007@suse.cz> <20121012135726.GY29125@suse.de> <507BDD45.1070705@suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <507BDD45.1070705@suse.cz> Sender: owner-linux-mm@kvack.org List-ID: To: Jiri Slaby Cc: Valdis.Kletnieks@vt.edu, Jiri Slaby , linux-mm@kvack.org, LKML , Andrew Morton On Mon, Oct 15, 2012 at 11:54:13AM +0200, Jiri Slaby wrote: > On 10/12/2012 03:57 PM, Mel Gorman wrote: > > mm: vmscan: scale number of pages reclaimed by reclaim/compaction only in direct reclaim > > > > Jiri Slaby reported the following: > > > > (It's an effective revert of "mm: vmscan: scale number of pages > > reclaimed by reclaim/compaction based on failures".) > > Given kswapd had hours of runtime in ps/top output yesterday in the > > morning and after the revert it's now 2 minutes in sum for the last 24h, > > I would say, it's gone. > > > > The intention of the patch in question was to compensate for the loss of > > lumpy reclaim. Part of the reason lumpy reclaim worked is because it > > aggressively reclaimed pages and this patch was meant to be a > > sane compromise. > > > > When compaction fails, it gets deferred and both compaction and > > reclaim/compaction is deferred avoid excessive reclaim. However, since > > commit c6543459 (mm: remove __GFP_NO_KSWAPD), kswapd is woken up each time > > and continues reclaiming which was not taken into account when the patch > > was developed. > > > > As it is not taking deferred compaction into account in this path it scans > > aggressively before falling out and making the compaction_deferred check in > > compaction_ready. This patch avoids kswapd scaling pages for reclaim and > > leaves the aggressive reclaim to the process attempting the THP > > allocation. > > > > Signed-off-by: Mel Gorman > > --- > > mm/vmscan.c | 10 ++++++++-- > > 1 file changed, 8 insertions(+), 2 deletions(-) > > > > diff --git a/mm/vmscan.c b/mm/vmscan.c > > index 2624edc..2b7edfa 100644 > > --- a/mm/vmscan.c > > +++ b/mm/vmscan.c > > @@ -1763,14 +1763,20 @@ static bool in_reclaim_compaction(struct scan_control *sc) > > #ifdef CONFIG_COMPACTION > > /* > > * If compaction is deferred for sc->order then scale the number of pages > > - * reclaimed based on the number of consecutive allocation failures > > + * reclaimed based on the number of consecutive allocation failures. This > > + * scaling only happens for direct reclaim as it is about to attempt > > + * compaction. If compaction fails, future allocations will be deferred > > + * and reclaim avoided. On the other hand, kswapd does not take compaction > > + * deferral into account so if it scaled, it could scan excessively even > > + * though allocations are temporarily not being attempted. > > */ > > static unsigned long scale_for_compaction(unsigned long pages_for_compaction, > > struct lruvec *lruvec, struct scan_control *sc) > > { > > struct zone *zone = lruvec_zone(lruvec); > > > > - if (zone->compact_order_failed <= sc->order) > > + if (zone->compact_order_failed <= sc->order && > > + !current_is_kswapd()) > > pages_for_compaction <<= zone->compact_defer_shift; > > return pages_for_compaction; > > } > > Yes, applying this instead of the revert fixes the issue as well. > Thanks Jiri. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx136.postini.com [74.125.245.136]) by kanga.kvack.org (Postfix) with SMTP id A73F66B006C for ; Mon, 29 Oct 2012 06:52:07 -0400 (EDT) Message-ID: <508E5FD3.1060105@leemhuis.info> Date: Mon, 29 Oct 2012 11:52:03 +0100 From: Thorsten Leemhuis MIME-Version: 1.0 Subject: Re: kswapd0: excessive CPU usage References: <507688CC.9000104@suse.cz> <106695.1349963080@turing-police.cc.vt.edu> <5076E700.2030909@suse.cz> <118079.1349978211@turing-police.cc.vt.edu> <50770905.5070904@suse.cz> <119175.1349979570@turing-police.cc.vt.edu> <5077434D.7080008@suse.cz> <50780F26.7070007@suse.cz> <20121012135726.GY29125@suse.de> <507BDD45.1070705@suse.cz> <20121015110937.GE29125@suse.de> In-Reply-To: <20121015110937.GE29125@suse.de> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Mel Gorman Cc: Jiri Slaby , Valdis.Kletnieks@vt.edu, Jiri Slaby , linux-mm@kvack.org, LKML , Andrew Morton Hi! On 15.10.2012 13:09, Mel Gorman wrote: > On Mon, Oct 15, 2012 at 11:54:13AM +0200, Jiri Slaby wrote: >> On 10/12/2012 03:57 PM, Mel Gorman wrote: >>> mm: vmscan: scale number of pages reclaimed by reclaim/compaction only in direct reclaim >>> Jiri Slaby reported the following: > [...] >>> diff --git a/mm/vmscan.c b/mm/vmscan.c >>> index 2624edc..2b7edfa 100644 >>> --- a/mm/vmscan.c >>> +++ b/mm/vmscan.c >>> @@ -1763,14 +1763,20 @@ static bool in_reclaim_compaction(struct scan_control *sc) >>> #ifdef CONFIG_COMPACTION >>> /* >>> * If compaction is deferred for sc->order then scale the number of pages >>> - * reclaimed based on the number of consecutive allocation failures >>> + * reclaimed based on the number of consecutive allocation failures. This >>> + * scaling only happens for direct reclaim as it is about to attempt >>> + * compaction. If compaction fails, future allocations will be deferred >>> + * and reclaim avoided. On the other hand, kswapd does not take compaction >>> + * deferral into account so if it scaled, it could scan excessively even >>> + * though allocations are temporarily not being attempted. >>> */ >>> static unsigned long scale_for_compaction(unsigned long pages_for_compaction, >>> struct lruvec *lruvec, struct scan_control *sc) >>> { >>> struct zone *zone = lruvec_zone(lruvec); >>> >>> - if (zone->compact_order_failed <= sc->order) >>> + if (zone->compact_order_failed <= sc->order && >>> + !current_is_kswapd()) >>> pages_for_compaction <<= zone->compact_defer_shift; >>> return pages_for_compaction; >>> } >> Yes, applying this instead of the revert fixes the issue as well. Just wondering, is there a reason why this patch wasn't applied to mainline? Did it simply fall through the cracks? Or am I missing something? I'm asking because I think I stil see the issue on 3.7-rc2-git-checkout-from-friday. Seems Fedora rawhide users are hitting it, too: https://bugzilla.redhat.com/show_bug.cgi?id=866988 Or are we seeing something different which just looks similar? I can test the patch if it needs further testing, but from the discussion I got the impression that everything is clear and the patch ready for merging. CU knurd -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx108.postini.com [74.125.245.108]) by kanga.kvack.org (Postfix) with SMTP id 712D18D0003 for ; Tue, 30 Oct 2012 15:18:49 -0400 (EDT) Date: Tue, 30 Oct 2012 19:18:43 +0000 From: Mel Gorman Subject: Re: kswapd0: excessive CPU usage Message-ID: <20121030191843.GH3888@suse.de> References: <5076E700.2030909@suse.cz> <118079.1349978211@turing-police.cc.vt.edu> <50770905.5070904@suse.cz> <119175.1349979570@turing-police.cc.vt.edu> <5077434D.7080008@suse.cz> <50780F26.7070007@suse.cz> <20121012135726.GY29125@suse.de> <507BDD45.1070705@suse.cz> <20121015110937.GE29125@suse.de> <508E5FD3.1060105@leemhuis.info> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <508E5FD3.1060105@leemhuis.info> Sender: owner-linux-mm@kvack.org List-ID: To: Thorsten Leemhuis Cc: Jiri Slaby , Valdis.Kletnieks@vt.edu, Jiri Slaby , linux-mm@kvack.org, LKML , Andrew Morton On Mon, Oct 29, 2012 at 11:52:03AM +0100, Thorsten Leemhuis wrote: > Hi! > > On 15.10.2012 13:09, Mel Gorman wrote: > >On Mon, Oct 15, 2012 at 11:54:13AM +0200, Jiri Slaby wrote: > >>On 10/12/2012 03:57 PM, Mel Gorman wrote: > >>>mm: vmscan: scale number of pages reclaimed by reclaim/compaction only in direct reclaim > >>>Jiri Slaby reported the following: > > [...] > >>>diff --git a/mm/vmscan.c b/mm/vmscan.c > >>>index 2624edc..2b7edfa 100644 > >>>--- a/mm/vmscan.c > >>>+++ b/mm/vmscan.c > >>>@@ -1763,14 +1763,20 @@ static bool in_reclaim_compaction(struct scan_control *sc) > >>> #ifdef CONFIG_COMPACTION > >>> /* > >>> * If compaction is deferred for sc->order then scale the number of pages > >>>- * reclaimed based on the number of consecutive allocation failures > >>>+ * reclaimed based on the number of consecutive allocation failures. This > >>>+ * scaling only happens for direct reclaim as it is about to attempt > >>>+ * compaction. If compaction fails, future allocations will be deferred > >>>+ * and reclaim avoided. On the other hand, kswapd does not take compaction > >>>+ * deferral into account so if it scaled, it could scan excessively even > >>>+ * though allocations are temporarily not being attempted. > >>> */ > >>> static unsigned long scale_for_compaction(unsigned long pages_for_compaction, > >>> struct lruvec *lruvec, struct scan_control *sc) > >>> { > >>> struct zone *zone = lruvec_zone(lruvec); > >>> > >>>- if (zone->compact_order_failed <= sc->order) > >>>+ if (zone->compact_order_failed <= sc->order && > >>>+ !current_is_kswapd()) > >>> pages_for_compaction <<= zone->compact_defer_shift; > >>> return pages_for_compaction; > >>> } > >>Yes, applying this instead of the revert fixes the issue as well. > > Just wondering, is there a reason why this patch wasn't applied to > mainline? Did it simply fall through the cracks? Or am I missing > something? > It's because a problem was reported related to the patch (off-list, whoops). I'm waiting to hear if a second patch fixes the problem or not. > I'm asking because I think I stil see the issue on > 3.7-rc2-git-checkout-from-friday. Seems Fedora rawhide users are > hitting it, too: > https://bugzilla.redhat.com/show_bug.cgi?id=866988 > I like the steps to reproduce. Is step 3 profit? > Or are we seeing something different which just looks similar? I can > test the patch if it needs further testing, but from the discussion > I got the impression that everything is clear and the patch ready > for merging. It could be the same issue. Can you test with the "mm: vmscan: scale number of pages reclaimed by reclaim/compaction only in direct reclaim" patch and the following on top please? Thanks. ---8<--- mm: page_alloc: Do not wake kswapd if the request is for THP but deferred Since commit c6543459 (mm: remove __GFP_NO_KSWAPD), kswapd gets woken for every THP request in the slow path. If compaction has been deferred the waker will not compact or enter direct reclaim on its own behalf but kswapd is still woken to reclaim free pages that no one may consume. If compaction was deferred because pages and slab was not reclaimable then kswapd is just consuming cycles for no gain. This patch avoids waking kswapd if the compaction has been deferred. It'll still wake when compaction is running to reduce the latency of THP allocations. Signed-off-by: Mel Gorman --- mm/page_alloc.c | 21 +++++++++++++++++++-- 1 file changed, 19 insertions(+), 2 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index bb90971..e72674c 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2378,6 +2378,15 @@ bool gfp_pfmemalloc_allowed(gfp_t gfp_mask) return !!(gfp_to_alloc_flags(gfp_mask) & ALLOC_NO_WATERMARKS); } +/* Returns true if the allocation is likely for THP */ +static bool is_thp_alloc(gfp_t gfp_mask, unsigned int order) +{ + if (order == pageblock_order && + (gfp_mask & (__GFP_MOVABLE|__GFP_REPEAT)) == __GFP_MOVABLE) + return true; + return false; +} + static inline struct page * __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, struct zonelist *zonelist, enum zone_type high_zoneidx, @@ -2416,7 +2425,15 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, goto nopage; restart: - wake_all_kswapd(order, zonelist, high_zoneidx, + /* + * kswapd is woken except when this is a THP request and compaction + * is deferred. If we are backing off reclaim/compaction then kswapd + * should not be awake aggressively reclaiming with no consumers of + * the freed pages + */ + if (!(is_thp_alloc(gfp_mask, order) && + compaction_deferred(preferred_zone, order))) + wake_all_kswapd(order, zonelist, high_zoneidx, zone_idx(preferred_zone)); /* @@ -2494,7 +2511,7 @@ rebalance: * system then fail the allocation instead of entering direct reclaim. */ if ((deferred_compaction || contended_compaction) && - (gfp_mask & (__GFP_MOVABLE|__GFP_REPEAT)) == __GFP_MOVABLE) + is_thp_alloc(gfp_mask, order)) goto nopage; /* Try direct reclaim and then allocating */ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx114.postini.com [74.125.245.114]) by kanga.kvack.org (Postfix) with SMTP id 3A74A6B0044 for ; Wed, 31 Oct 2012 07:25:17 -0400 (EDT) Message-ID: <50910A99.5050707@leemhuis.info> Date: Wed, 31 Oct 2012 12:25:13 +0100 From: Thorsten Leemhuis MIME-Version: 1.0 Subject: Re: kswapd0: excessive CPU usage References: <5076E700.2030909@suse.cz> <118079.1349978211@turing-police.cc.vt.edu> <50770905.5070904@suse.cz> <119175.1349979570@turing-police.cc.vt.edu> <5077434D.7080008@suse.cz> <50780F26.7070007@suse.cz> <20121012135726.GY29125@suse.de> <507BDD45.1070705@suse.cz> <20121015110937.GE29125@suse.de> <508E5FD3.1060105@leemhuis.info> <20121030191843.GH3888@suse.de> In-Reply-To: <20121030191843.GH3888@suse.de> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Mel Gorman Cc: Jiri Slaby , Valdis.Kletnieks@vt.edu, Jiri Slaby , linux-mm@kvack.org, LKML , Andrew Morton On 30.10.2012 20:18, Mel Gorman wrote: > On Mon, Oct 29, 2012 at 11:52:03AM +0100, Thorsten Leemhuis wrote: >> On 15.10.2012 13:09, Mel Gorman wrote: >>> On Mon, Oct 15, 2012 at 11:54:13AM +0200, Jiri Slaby wrote: >>>> On 10/12/2012 03:57 PM, Mel Gorman wrote: >>>>> mm: vmscan: scale number of pages reclaimed by reclaim/compaction only in direct reclaim >>>>> Jiri Slaby reported the following: > [...] >>>> Yes, applying this instead of the revert fixes the issue as well. >> Just wondering, is there a reason why this patch wasn't applied to >> mainline? Did it simply fall through the cracks? Or am I missing >> something? > It's because a problem was reported related to the patch (off-list, > whoops). I'm waiting to hear if a second patch fixes the problem or not. Anything in particular I should look out for while testing? >> I'm asking because I think I stil see the issue on >> 3.7-rc2-git-checkout-from-friday. Seems Fedora rawhide users are >> hitting it, too: >> https://bugzilla.redhat.com/show_bug.cgi?id=866988 > I like the steps to reproduce. One of those cases where the bugzilla bug template was not very helpful or where it was not used as intended (you decide) :-) > Is step 3 profit? Yes, but psst, don't tell anyone; step 4 (world domination! for real!) is also hidden to keep that part of the big plan a secret for now ;-) >> Or are we seeing something different which just looks similar? I can >> test the patch if it needs further testing, but from the discussion >> I got the impression that everything is clear and the patch ready >> for merging. > It could be the same issue. Can you test with the "mm: vmscan: scale > number of pages reclaimed by reclaim/compaction only in direct reclaim" > patch and the following on top please? Built a vanilla mainline kernel with those two patches and installed it on the machine where I was seeing problems high kswapd0 load on 3.7-rc3. Ran it an hour yesterday and a few hours today; seems the patches fix the issue for me as kswapd behaves: $ LC_ALL=C ps -aux | grep 'kswapd' root 62 0.0 0.0 0 0 ? S Oct30 0:05 [kswapd0] So everything is looking fine again so far thx to the two patches -- hopefully it stays that way even after hitting "send" in my mailer in a few seconds. CU knurd -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx139.postini.com [74.125.245.139]) by kanga.kvack.org (Postfix) with SMTP id 952876B0062 for ; Wed, 31 Oct 2012 11:04:46 -0400 (EDT) Date: Wed, 31 Oct 2012 15:04:38 +0000 From: Mel Gorman Subject: Re: kswapd0: excessive CPU usage Message-ID: <20121031150438.GK3888@suse.de> References: <50770905.5070904@suse.cz> <119175.1349979570@turing-police.cc.vt.edu> <5077434D.7080008@suse.cz> <50780F26.7070007@suse.cz> <20121012135726.GY29125@suse.de> <507BDD45.1070705@suse.cz> <20121015110937.GE29125@suse.de> <508E5FD3.1060105@leemhuis.info> <20121030191843.GH3888@suse.de> <50910A99.5050707@leemhuis.info> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <50910A99.5050707@leemhuis.info> Sender: owner-linux-mm@kvack.org List-ID: To: Thorsten Leemhuis Cc: Jiri Slaby , Valdis.Kletnieks@vt.edu, Jiri Slaby , linux-mm@kvack.org, LKML , Andrew Morton On Wed, Oct 31, 2012 at 12:25:13PM +0100, Thorsten Leemhuis wrote: > On 30.10.2012 20:18, Mel Gorman wrote: > >On Mon, Oct 29, 2012 at 11:52:03AM +0100, Thorsten Leemhuis wrote: > >>On 15.10.2012 13:09, Mel Gorman wrote: > >>>On Mon, Oct 15, 2012 at 11:54:13AM +0200, Jiri Slaby wrote: > >>>>On 10/12/2012 03:57 PM, Mel Gorman wrote: > >>>>>mm: vmscan: scale number of pages reclaimed by reclaim/compaction only in direct reclaim > >>>>>Jiri Slaby reported the following: > >[...] > >>>>Yes, applying this instead of the revert fixes the issue as well. > >>Just wondering, is there a reason why this patch wasn't applied to > >>mainline? Did it simply fall through the cracks? Or am I missing > >>something? > >It's because a problem was reported related to the patch (off-list, > >whoops). I'm waiting to hear if a second patch fixes the problem or not. > > Anything in particular I should look out for while testing? > Excessive reclaim, high CPU usage by kswapd, processes getting stick in isolate_migratepages or isolate_freepages. > >>I'm asking because I think I stil see the issue on > >>3.7-rc2-git-checkout-from-friday. Seems Fedora rawhide users are > >>hitting it, too: > >>https://bugzilla.redhat.com/show_bug.cgi?id=866988 > >I like the steps to reproduce. > > One of those cases where the bugzilla bug template was not very > helpful or where it was not used as intended (you decide) :-) > It wins at entertainment value if nothing else :) > >Is step 3 profit? > > Yes, but psst, don't tell anyone; step 4 (world domination! for > real!) is also hidden to keep that part of the big plan a secret for > now ;-) > No doubt it's the default private comment #1 ! > >>Or are we seeing something different which just looks similar? I can > >>test the patch if it needs further testing, but from the discussion > >>I got the impression that everything is clear and the patch ready > >>for merging. > >It could be the same issue. Can you test with the "mm: vmscan: scale > >number of pages reclaimed by reclaim/compaction only in direct reclaim" > >patch and the following on top please? > > Built a vanilla mainline kernel with those two patches and installed > it on the machine where I was seeing problems high kswapd0 load on > 3.7-rc3. Ran it an hour yesterday and a few hours today; seems the > patches fix the issue for me as kswapd behaves: > > $ LC_ALL=C ps -aux | grep 'kswapd' > root 62 0.0 0.0 0 0 ? S Oct30 0:05 [kswapd0] > > So everything is looking fine again so far thx to the two patches > -- hopefully it stays that way even after hitting "send" in my > mailer in a few seconds. > Ok, great. Keep an eye on it please. If Jiri Slaby reports similar success then I'll collapse the two patches together and resend to Andrew. Thanks. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx162.postini.com [74.125.245.162]) by kanga.kvack.org (Postfix) with SMTP id E9F886B004D for ; Fri, 2 Nov 2012 06:44:09 -0400 (EDT) Message-ID: <5093A3F4.8090108@redhat.com> Date: Fri, 02 Nov 2012 11:44:04 +0100 From: Zdenek Kabelac MIME-Version: 1.0 Subject: Re: kswapd0: excessive CPU usage References: <507688CC.9000104@suse.cz> <106695.1349963080@turing-police.cc.vt.edu> <5076E700.2030909@suse.cz> <118079.1349978211@turing-police.cc.vt.edu> <50770905.5070904@suse.cz> <119175.1349979570@turing-police.cc.vt.edu> <5077434D.7080008@suse.cz> <50780F26.7070007@suse.cz> <20121012135726.GY29125@suse.de> <507BDD45.1070705@suse.cz> <20121015110937.GE29125@suse.de> In-Reply-To: <20121015110937.GE29125@suse.de> Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Mel Gorman Cc: Jiri Slaby , Valdis.Kletnieks@vt.edu, Jiri Slaby , linux-mm@kvack.org, LKML , Andrew Morton Dne 15.10.2012 13:09, Mel Gorman napsal(a): > On Mon, Oct 15, 2012 at 11:54:13AM +0200, Jiri Slaby wrote: >> On 10/12/2012 03:57 PM, Mel Gorman wrote: >>> mm: vmscan: scale number of pages reclaimed by reclaim/compaction only in direct reclaim >>> >>> Jiri Slaby reported the following: >>> >>> (It's an effective revert of "mm: vmscan: scale number of pages >>> reclaimed by reclaim/compaction based on failures".) >>> Given kswapd had hours of runtime in ps/top output yesterday in the >>> morning and after the revert it's now 2 minutes in sum for the last 24h, >>> I would say, it's gone. >>> >>> The intention of the patch in question was to compensate for the loss of >>> lumpy reclaim. Part of the reason lumpy reclaim worked is because it >>> aggressively reclaimed pages and this patch was meant to be a >>> sane compromise. >>> >>> When compaction fails, it gets deferred and both compaction and >>> reclaim/compaction is deferred avoid excessive reclaim. However, since >>> commit c6543459 (mm: remove __GFP_NO_KSWAPD), kswapd is woken up each time >>> and continues reclaiming which was not taken into account when the patch >>> was developed. >>> >>> As it is not taking deferred compaction into account in this path it scans >>> aggressively before falling out and making the compaction_deferred check in >>> compaction_ready. This patch avoids kswapd scaling pages for reclaim and >>> leaves the aggressive reclaim to the process attempting the THP >>> allocation. >>> >>> Signed-off-by: Mel Gorman >>> --- >>> mm/vmscan.c | 10 ++++++++-- >>> 1 file changed, 8 insertions(+), 2 deletions(-) >>> >>> diff --git a/mm/vmscan.c b/mm/vmscan.c >>> index 2624edc..2b7edfa 100644 >>> --- a/mm/vmscan.c >>> +++ b/mm/vmscan.c >>> @@ -1763,14 +1763,20 @@ static bool in_reclaim_compaction(struct scan_control *sc) >>> #ifdef CONFIG_COMPACTION >>> /* >>> * If compaction is deferred for sc->order then scale the number of pages >>> - * reclaimed based on the number of consecutive allocation failures >>> + * reclaimed based on the number of consecutive allocation failures. This >>> + * scaling only happens for direct reclaim as it is about to attempt >>> + * compaction. If compaction fails, future allocations will be deferred >>> + * and reclaim avoided. On the other hand, kswapd does not take compaction >>> + * deferral into account so if it scaled, it could scan excessively even >>> + * though allocations are temporarily not being attempted. >>> */ >>> static unsigned long scale_for_compaction(unsigned long pages_for_compaction, >>> struct lruvec *lruvec, struct scan_control *sc) >>> { >>> struct zone *zone = lruvec_zone(lruvec); >>> >>> - if (zone->compact_order_failed <= sc->order) >>> + if (zone->compact_order_failed <= sc->order && >>> + !current_is_kswapd()) >>> pages_for_compaction <<= zone->compact_defer_shift; >>> return pages_for_compaction; >>> } >> >> Yes, applying this instead of the revert fixes the issue as well. >> > I've applied this patch on 3.7.0-rc3 kernel - and I still see excessive CPU usage - mainly after suspend/resume Here is just simple kswapd backtrace from running kernel: kswapd0 R running task 0 30 2 0x00000000 ffff8801331ddae8 0000000000000082 ffff880135b8a340 0000000000000008 ffff880135b8a340 ffff8801331ddfd8 ffff8801331ddfd8 ffff8801331ddfd8 ffff880071db8000 ffff880135b8a340 0000000000000286 ffff8801331dc000 Call Trace: [] preempt_schedule+0x42/0x60 [] _raw_spin_unlock+0x55/0x60 [] put_super+0x31/0x40 [] drop_super+0x22/0x30 [] prune_super+0x149/0x1b0 [] shrink_slab+0xba/0x510 [] ? mem_cgroup_iter+0x17a/0x2e0 [] ? mem_cgroup_iter+0xca/0x2e0 [] balance_pgdat+0x629/0x7f0 [] kswapd+0x174/0x620 [] ? __init_waitqueue_head+0x60/0x60 [] ? balance_pgdat+0x7f0/0x7f0 [] kthread+0xdb/0xe0 [] ? kthread_create_on_node+0x140/0x140 [] ret_from_fork+0x7c/0xb0 [] ? kthread_create_on_node+0x140/0x140 Zdenek -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx170.postini.com [74.125.245.170]) by kanga.kvack.org (Postfix) with SMTP id E76D56B004D for ; Fri, 2 Nov 2012 06:53:41 -0400 (EDT) Received: by mail-ee0-f41.google.com with SMTP id c4so2220229eek.14 for ; Fri, 02 Nov 2012 03:53:40 -0700 (PDT) Message-ID: <5093A631.5020209@suse.cz> Date: Fri, 02 Nov 2012 11:53:37 +0100 From: Jiri Slaby MIME-Version: 1.0 Subject: Re: kswapd0: excessive CPU usage References: <507688CC.9000104@suse.cz> <106695.1349963080@turing-police.cc.vt.edu> <5076E700.2030909@suse.cz> <118079.1349978211@turing-police.cc.vt.edu> <50770905.5070904@suse.cz> <119175.1349979570@turing-police.cc.vt.edu> <5077434D.7080008@suse.cz> <50780F26.7070007@suse.cz> <20121012135726.GY29125@suse.de> <507BDD45.1070705@suse.cz> <20121015110937.GE29125@suse.de> <5093A3F4.8090108@redhat.com> In-Reply-To: <5093A3F4.8090108@redhat.com> Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Zdenek Kabelac Cc: Mel Gorman , Valdis.Kletnieks@vt.edu, Jiri Slaby , linux-mm@kvack.org, LKML , Andrew Morton On 11/02/2012 11:44 AM, Zdenek Kabelac wrote: >>> Yes, applying this instead of the revert fixes the issue as well. > > I've applied this patch on 3.7.0-rc3 kernel - and I still see excessive > CPU usage - mainly after suspend/resume > > Here is just simple kswapd backtrace from running kernel: Yup, this is what we were seeing with the former patch only too. Try to apply the other one too: https://patchwork.kernel.org/patch/1673231/ For me I would say, it is fixed by the two patches now. I won't be able to report later, since I'm leaving to a conference tomorrow. > kswapd0 R running task 0 30 2 0x00000000 ... > [] shrink_slab+0xba/0x510 thanks, -- js suse labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx145.postini.com [74.125.245.145]) by kanga.kvack.org (Postfix) with SMTP id 6D8A96B0044 for ; Fri, 2 Nov 2012 15:45:12 -0400 (EDT) Received: by mail-ee0-f41.google.com with SMTP id c4so2533452eek.14 for ; Fri, 02 Nov 2012 12:45:10 -0700 (PDT) Message-ID: <509422C3.1000803@suse.cz> Date: Fri, 02 Nov 2012 20:45:07 +0100 From: Jiri Slaby MIME-Version: 1.0 Subject: Re: kswapd0: excessive CPU usage References: <507688CC.9000104@suse.cz> <106695.1349963080@turing-police.cc.vt.edu> <5076E700.2030909@suse.cz> <118079.1349978211@turing-police.cc.vt.edu> <50770905.5070904@suse.cz> <119175.1349979570@turing-police.cc.vt.edu> <5077434D.7080008@suse.cz> <50780F26.7070007@suse.cz> <20121012135726.GY29125@suse.de> <507BDD45.1070705@suse.cz> <20121015110937.GE29125@suse.de> <5093A3F4.8090108@redhat.com> <5093A631.5020209@suse.cz> In-Reply-To: <5093A631.5020209@suse.cz> Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Mel Gorman Cc: Zdenek Kabelac , Valdis.Kletnieks@vt.edu, Jiri Slaby , linux-mm@kvack.org, LKML , Andrew Morton On 11/02/2012 11:53 AM, Jiri Slaby wrote: > On 11/02/2012 11:44 AM, Zdenek Kabelac wrote: >>>> Yes, applying this instead of the revert fixes the issue as well. >> >> I've applied this patch on 3.7.0-rc3 kernel - and I still see excessive >> CPU usage - mainly after suspend/resume >> >> Here is just simple kswapd backtrace from running kernel: > > Yup, this is what we were seeing with the former patch only too. Try to > apply the other one too: > https://patchwork.kernel.org/patch/1673231/ > > For me I would say, it is fixed by the two patches now. I won't be able > to report later, since I'm leaving to a conference tomorrow. Damn it. It recurred right now, with both patches applied. After I started a java program which consumed some more memory. Though there are still 2 gigs free, kswap is spinning: [] __cond_resched+0x2a/0x40 [] shrink_slab+0x1c0/0x2d0 [] kswapd+0x66d/0xb60 [] kthread+0xc0/0xd0 [] ret_from_fork+0x7c/0xb0 [] 0xffffffffffffffff -- js suse labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx115.postini.com [74.125.245.115]) by kanga.kvack.org (Postfix) with SMTP id 989266B004D for ; Sun, 4 Nov 2012 06:26:40 -0500 (EST) Message-ID: <509650EA.5060508@redhat.com> Date: Sun, 04 Nov 2012 12:26:34 +0100 From: Zdenek Kabelac MIME-Version: 1.0 Subject: Re: kswapd0: excessive CPU usage References: <507688CC.9000104@suse.cz> <106695.1349963080@turing-police.cc.vt.edu> <5076E700.2030909@suse.cz> <118079.1349978211@turing-police.cc.vt.edu> <50770905.5070904@suse.cz> <119175.1349979570@turing-police.cc.vt.edu> <5077434D.7080008@suse.cz> <50780F26.7070007@suse.cz> <20121012135726.GY29125@suse.de> <507BDD45.1070705@suse.cz> <20121015110937.GE29125@suse.de> <5093A3F4.8090108@redhat.com> <5093A631.5020209@suse.cz> <509422C3.1000803@suse.cz> In-Reply-To: <509422C3.1000803@suse.cz> Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Jiri Slaby Cc: Mel Gorman , Valdis.Kletnieks@vt.edu, Jiri Slaby , linux-mm@kvack.org, LKML , Andrew Morton Dne 2.11.2012 20:45, Jiri Slaby napsal(a): > On 11/02/2012 11:53 AM, Jiri Slaby wrote: >> On 11/02/2012 11:44 AM, Zdenek Kabelac wrote: >>>>> Yes, applying this instead of the revert fixes the issue as well. >>> >>> I've applied this patch on 3.7.0-rc3 kernel - and I still see excessive >>> CPU usage - mainly after suspend/resume >>> >>> Here is just simple kswapd backtrace from running kernel: >> >> Yup, this is what we were seeing with the former patch only too. Try to >> apply the other one too: >> https://patchwork.kernel.org/patch/1673231/ >> >> For me I would say, it is fixed by the two patches now. I won't be able >> to report later, since I'm leaving to a conference tomorrow. > > Damn it. It recurred right now, with both patches applied. After I > started a java program which consumed some more memory. Though there are > still 2 gigs free, kswap is spinning: > [] __cond_resched+0x2a/0x40 > [] shrink_slab+0x1c0/0x2d0 > [] kswapd+0x66d/0xb60 > [] kthread+0xc0/0xd0 > [] ret_from_fork+0x7c/0xb0 > [] 0xffffffffffffffff > Yep - wanted to report myself again and noticed your replay. Yes - I've now also both patches installed - and I still observe kswapd eating my CPU. It seems (at least for me) that prior suspend and resume is way to trigger it more frequently. However there is a change in behaviour - while before kswapd was running almost indefinitely now the> CPU spikes are in the range of minutes. (i.e. uptime ~2days - kswapd has over 32minutes CPU time) My machine has 4GB, and no swap (disabled) firefox (22mins), thunderbird(3mins) and pidgin(0.5min) are the 3 most memory and CPU hungry apps for this moment. Zdenek -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx117.postini.com [74.125.245.117]) by kanga.kvack.org (Postfix) with SMTP id 9E4086B002B for ; Sun, 4 Nov 2012 11:34:17 -0500 (EST) Message-ID: <5096999F.1040405@redhat.com> Date: Sun, 04 Nov 2012 11:36:47 -0500 From: Rik van Riel MIME-Version: 1.0 Subject: Re: kswapd0: excessive CPU usage References: <5076E700.2030909@suse.cz> <118079.1349978211@turing-police.cc.vt.edu> <50770905.5070904@suse.cz> <119175.1349979570@turing-police.cc.vt.edu> <5077434D.7080008@suse.cz> <50780F26.7070007@suse.cz> <20121012135726.GY29125@suse.de> <507BDD45.1070705@suse.cz> <20121015110937.GE29125@suse.de> <508E5FD3.1060105@leemhuis.info> <20121030191843.GH3888@suse.de> In-Reply-To: <20121030191843.GH3888@suse.de> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Mel Gorman Cc: Thorsten Leemhuis , Jiri Slaby , Valdis.Kletnieks@vt.edu, Jiri Slaby , linux-mm@kvack.org, LKML , Andrew Morton On 10/30/2012 03:18 PM, Mel Gorman wrote: > restart: > - wake_all_kswapd(order, zonelist, high_zoneidx, > + /* > + * kswapd is woken except when this is a THP request and compaction > + * is deferred. If we are backing off reclaim/compaction then kswapd > + * should not be awake aggressively reclaiming with no consumers of > + * the freed pages > + */ > + if (!(is_thp_alloc(gfp_mask, order) && > + compaction_deferred(preferred_zone, order))) > + wake_all_kswapd(order, zonelist, high_zoneidx, > zone_idx(preferred_zone)); What is special about thp allocations here? Surely other large allocations that keep failing should get the same treatment, of not waking up kswapd if compaction is deferred? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx129.postini.com [74.125.245.129]) by kanga.kvack.org (Postfix) with SMTP id 970CC6B002B for ; Mon, 5 Nov 2012 09:24:54 -0500 (EST) Date: Mon, 5 Nov 2012 14:24:49 +0000 From: Mel Gorman Subject: [PATCH] Revert "mm: vmscan: scale number of pages reclaimed by reclaim/compaction based on failures" Message-ID: <20121105142449.GI8218@suse.de> References: <50770905.5070904@suse.cz> <119175.1349979570@turing-police.cc.vt.edu> <5077434D.7080008@suse.cz> <50780F26.7070007@suse.cz> <20121012135726.GY29125@suse.de> <507BDD45.1070705@suse.cz> <20121015110937.GE29125@suse.de> <5093A3F4.8090108@redhat.com> <5093A631.5020209@suse.cz> <509422C3.1000803@suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <509422C3.1000803@suse.cz> Sender: owner-linux-mm@kvack.org List-ID: To: Andrew Morton Cc: Zdenek Kabelac , Valdis.Kletnieks@vt.edu, Jiri Slaby , linux-mm@kvack.org, Rik van Riel , Jiri Slaby , LKML Jiri Slaby reported the following: (It's an effective revert of "mm: vmscan: scale number of pages reclaimed by reclaim/compaction based on failures".) Given kswapd had hours of runtime in ps/top output yesterday in the morning and after the revert it's now 2 minutes in sum for the last 24h, I would say, it's gone. The intention of the patch in question was to compensate for the loss of lumpy reclaim. Part of the reason lumpy reclaim worked is because it aggressively reclaimed pages and this patch was meant to be a sane compromise. When compaction fails, it gets deferred and both compaction and reclaim/compaction is deferred avoid excessive reclaim. However, since commit c6543459 (mm: remove __GFP_NO_KSWAPD), kswapd is woken up each time and continues reclaiming which was not taken into account when the patch was developed. Attempts to address the problem ended up just changing the shape of the problem instead of fixing it. The release window gets closer and while a THP allocation failing is not a major problem, kswapd chewing up a lot of CPU is. This patch reverts "mm: vmscan: scale number of pages reclaimed by reclaim/compaction based on failures" and will be revisited in the future. Signed-off-by: Mel Gorman --- mm/vmscan.c | 25 ------------------------- 1 file changed, 25 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 2624edc..e081ee8 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1760,28 +1760,6 @@ static bool in_reclaim_compaction(struct scan_control *sc) return false; } -#ifdef CONFIG_COMPACTION -/* - * If compaction is deferred for sc->order then scale the number of pages - * reclaimed based on the number of consecutive allocation failures - */ -static unsigned long scale_for_compaction(unsigned long pages_for_compaction, - struct lruvec *lruvec, struct scan_control *sc) -{ - struct zone *zone = lruvec_zone(lruvec); - - if (zone->compact_order_failed <= sc->order) - pages_for_compaction <<= zone->compact_defer_shift; - return pages_for_compaction; -} -#else -static unsigned long scale_for_compaction(unsigned long pages_for_compaction, - struct lruvec *lruvec, struct scan_control *sc) -{ - return pages_for_compaction; -} -#endif - /* * Reclaim/compaction is used for high-order allocation requests. It reclaims * order-0 pages before compacting the zone. should_continue_reclaim() returns @@ -1829,9 +1807,6 @@ static inline bool should_continue_reclaim(struct lruvec *lruvec, * inactive lists are large enough, continue reclaiming */ pages_for_compaction = (2UL << sc->order); - - pages_for_compaction = scale_for_compaction(pages_for_compaction, - lruvec, sc); inactive_lru_pages = get_lru_size(lruvec, LRU_INACTIVE_FILE); if (nr_swap_pages > 0) inactive_lru_pages += get_lru_size(lruvec, LRU_INACTIVE_ANON); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx129.postini.com [74.125.245.129]) by kanga.kvack.org (Postfix) with SMTP id DB0186B0044 for ; Tue, 6 Nov 2012 05:15:57 -0500 (EST) Date: Tue, 6 Nov 2012 11:15:54 +0100 From: Johannes Hirte Subject: Re: [PATCH] Revert "mm: vmscan: scale number of pages reclaimed by reclaim/compaction based on failures" Message-ID: <20121106111554.1896c3f3@fem.tu-ilmenau.de> In-Reply-To: <20121105142449.GI8218@suse.de> References: <50770905.5070904@suse.cz> <119175.1349979570@turing-police.cc.vt.edu> <5077434D.7080008@suse.cz> <50780F26.7070007@suse.cz> <20121012135726.GY29125@suse.de> <507BDD45.1070705@suse.cz> <20121015110937.GE29125@suse.de> <5093A3F4.8090108@redhat.com> <5093A631.5020209@suse.cz> <509422C3.1000803@suse.cz> <20121105142449.GI8218@suse.de> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Mel Gorman Cc: Andrew Morton , Zdenek Kabelac , Valdis.Kletnieks@vt.edu, Jiri Slaby , linux-mm@kvack.org, Rik van Riel , Jiri Slaby , LKML Am Mon, 5 Nov 2012 14:24:49 +0000 schrieb Mel Gorman : > Jiri Slaby reported the following: > > (It's an effective revert of "mm: vmscan: scale number of > pages reclaimed by reclaim/compaction based on failures".) Given > kswapd had hours of runtime in ps/top output yesterday in the morning > and after the revert it's now 2 minutes in sum for the last > 24h, I would say, it's gone. > > The intention of the patch in question was to compensate for the loss > of lumpy reclaim. Part of the reason lumpy reclaim worked is because > it aggressively reclaimed pages and this patch was meant to be a sane > compromise. > > When compaction fails, it gets deferred and both compaction and > reclaim/compaction is deferred avoid excessive reclaim. However, since > commit c6543459 (mm: remove __GFP_NO_KSWAPD), kswapd is woken up each > time and continues reclaiming which was not taken into account when > the patch was developed. > > Attempts to address the problem ended up just changing the shape of > the problem instead of fixing it. The release window gets closer and > while a THP allocation failing is not a major problem, kswapd chewing > up a lot of CPU is. This patch reverts "mm: vmscan: scale number of > pages reclaimed by reclaim/compaction based on failures" and will be > revisited in the future. > > Signed-off-by: Mel Gorman > --- > mm/vmscan.c | 25 ------------------------- > 1 file changed, 25 deletions(-) > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 2624edc..e081ee8 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -1760,28 +1760,6 @@ static bool in_reclaim_compaction(struct > scan_control *sc) return false; > } > > -#ifdef CONFIG_COMPACTION > -/* > - * If compaction is deferred for sc->order then scale the number of > pages > - * reclaimed based on the number of consecutive allocation failures > - */ > -static unsigned long scale_for_compaction(unsigned long > pages_for_compaction, > - struct lruvec *lruvec, struct scan_control > *sc) -{ > - struct zone *zone = lruvec_zone(lruvec); > - > - if (zone->compact_order_failed <= sc->order) > - pages_for_compaction <<= zone->compact_defer_shift; > - return pages_for_compaction; > -} > -#else > -static unsigned long scale_for_compaction(unsigned long > pages_for_compaction, > - struct lruvec *lruvec, struct scan_control > *sc) -{ > - return pages_for_compaction; > -} > -#endif > - > /* > * Reclaim/compaction is used for high-order allocation requests. It > reclaims > * order-0 pages before compacting the zone. > should_continue_reclaim() returns @@ -1829,9 +1807,6 @@ static inline > bool should_continue_reclaim(struct lruvec *lruvec, > * inactive lists are large enough, continue reclaiming > */ > pages_for_compaction = (2UL << sc->order); > - > - pages_for_compaction = > scale_for_compaction(pages_for_compaction, > - lruvec, sc); > inactive_lru_pages = get_lru_size(lruvec, LRU_INACTIVE_FILE); > if (nr_swap_pages > 0) > inactive_lru_pages += get_lru_size(lruvec, > LRU_INACTIVE_ANON); -- Even with this patch I see kswapd0 very often on top. Much more than with kernel 3.6. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx124.postini.com [74.125.245.124]) by kanga.kvack.org (Postfix) with SMTP id 21C236B002B for ; Thu, 8 Nov 2012 23:22:14 -0500 (EST) Received: from /spool/local by e34.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 8 Nov 2012 21:22:13 -0700 Received: from d03relay03.boulder.ibm.com (d03relay03.boulder.ibm.com [9.17.195.228]) by d03dlp01.boulder.ibm.com (Postfix) with ESMTP id 7FBF7C40003 for ; Thu, 8 Nov 2012 21:22:07 -0700 (MST) Received: from d03av01.boulder.ibm.com (d03av01.boulder.ibm.com [9.17.195.167]) by d03relay03.boulder.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id qA94M9IQ243780 for ; Thu, 8 Nov 2012 21:22:09 -0700 Received: from d03av01.boulder.ibm.com (loopback [127.0.0.1]) by d03av01.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id qA94M8SS005269 for ; Thu, 8 Nov 2012 21:22:09 -0700 Message-ID: <509C84ED.8090605@linux.vnet.ibm.com> Date: Thu, 08 Nov 2012 22:22:05 -0600 From: Seth Jennings MIME-Version: 1.0 Subject: Re: kswapd0: excessive CPU usage References: <507688CC.9000104@suse.cz> <106695.1349963080@turing-police.cc.vt.edu> <5076E700.2030909@suse.cz> <118079.1349978211@turing-police.cc.vt.edu> <50770905.5070904@suse.cz> <119175.1349979570@turing-police.cc.vt.edu> <5077434D.7080008@suse.cz> <50780F26.7070007@suse.cz> <20121012135726.GY29125@suse.de> <507BDD45.1070705@suse.cz> <20121015110937.GE29125@suse.de> <5093A3F4.8090108@redhat.com> <5093A631.5020209@suse.cz> <509422C3.1000803@suse.cz> In-Reply-To: <509422C3.1000803@suse.cz> Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Jiri Slaby Cc: Mel Gorman , Zdenek Kabelac , Valdis.Kletnieks@vt.edu, Jiri Slaby , linux-mm@kvack.org, LKML , Andrew Morton , Rik van Riel , Robert Jennings On 11/02/2012 02:45 PM, Jiri Slaby wrote: > On 11/02/2012 11:53 AM, Jiri Slaby wrote: >> On 11/02/2012 11:44 AM, Zdenek Kabelac wrote: >>>>> Yes, applying this instead of the revert fixes the issue as well. >>> >>> I've applied this patch on 3.7.0-rc3 kernel - and I still see excessive >>> CPU usage - mainly after suspend/resume >>> >>> Here is just simple kswapd backtrace from running kernel: >> >> Yup, this is what we were seeing with the former patch only too. Try to >> apply the other one too: >> https://patchwork.kernel.org/patch/1673231/ >> >> For me I would say, it is fixed by the two patches now. I won't be able >> to report later, since I'm leaving to a conference tomorrow. > > Damn it. It recurred right now, with both patches applied. After I > started a java program which consumed some more memory. Though there are > still 2 gigs free, kswap is spinning: > [] __cond_resched+0x2a/0x40 > [] shrink_slab+0x1c0/0x2d0 > [] kswapd+0x66d/0xb60 > [] kthread+0xc0/0xd0 > [] ret_from_fork+0x7c/0xb0 > [] 0xffffffffffffffff I'm also hitting this issue in v3.7-rc4. It appears that the last release not effected by this issue was v3.3. Bisecting the changes included for v3.4-rc1 showed that this commit introduced the issue: fe2c2a106663130a5ab45cb0e3414b52df2fff0c is the first bad commit commit fe2c2a106663130a5ab45cb0e3414b52df2fff0c Author: Rik van Riel Date: Wed Mar 21 16:33:51 2012 -0700 vmscan: reclaim at order 0 when compaction is enabled ... This is plausible since the issue seems to be in the kswapd + compaction realm. I've yet to figure out exactly what about this commit results in kswapd spinning. I would be interested if someone can confirm this finding. -- Seth -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx172.postini.com [74.125.245.172]) by kanga.kvack.org (Postfix) with SMTP id 924D06B002B for ; Fri, 9 Nov 2012 03:07:51 -0500 (EST) Message-ID: <509CB9D1.6060704@redhat.com> Date: Fri, 09 Nov 2012 09:07:45 +0100 From: Zdenek Kabelac MIME-Version: 1.0 Subject: Re: kswapd0: excessive CPU usage References: <507688CC.9000104@suse.cz> <106695.1349963080@turing-police.cc.vt.edu> <5076E700.2030909@suse.cz> <118079.1349978211@turing-police.cc.vt.edu> <50770905.5070904@suse.cz> <119175.1349979570@turing-police.cc.vt.edu> <5077434D.7080008@suse.cz> <50780F26.7070007@suse.cz> <20121012135726.GY29125@suse.de> <507BDD45.1070705@suse.cz> <20121015110937.GE29125@suse.de> <5093A3F4.8090108@redhat.com> <5093A631.5020209@suse.cz> <509422C3.1000803@suse.cz> <509C84ED.8090605@linux.vnet.ibm.com> In-Reply-To: <509C84ED.8090605@linux.vnet.ibm.com> Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Seth Jennings Cc: Jiri Slaby , Mel Gorman , Valdis.Kletnieks@vt.edu, Jiri Slaby , linux-mm@kvack.org, LKML , Andrew Morton , Rik van Riel , Robert Jennings Dne 9.11.2012 05:22, Seth Jennings napsal(a): > On 11/02/2012 02:45 PM, Jiri Slaby wrote: >> On 11/02/2012 11:53 AM, Jiri Slaby wrote: >>> On 11/02/2012 11:44 AM, Zdenek Kabelac wrote: >>>>>> Yes, applying this instead of the revert fixes the issue as well. >>>> >>>> I've applied this patch on 3.7.0-rc3 kernel - and I still see excessive >>>> CPU usage - mainly after suspend/resume >>>> >>>> Here is just simple kswapd backtrace from running kernel: >>> >>> Yup, this is what we were seeing with the former patch only too. Try to >>> apply the other one too: >>> https://patchwork.kernel.org/patch/1673231/ >>> >>> For me I would say, it is fixed by the two patches now. I won't be able >>> to report later, since I'm leaving to a conference tomorrow. >> >> Damn it. It recurred right now, with both patches applied. After I >> started a java program which consumed some more memory. Though there are >> still 2 gigs free, kswap is spinning: >> [] __cond_resched+0x2a/0x40 >> [] shrink_slab+0x1c0/0x2d0 >> [] kswapd+0x66d/0xb60 >> [] kthread+0xc0/0xd0 >> [] ret_from_fork+0x7c/0xb0 >> [] 0xffffffffffffffff > > I'm also hitting this issue in v3.7-rc4. It appears that the last > release not effected by this issue was v3.3. Bisecting the changes > included for v3.4-rc1 showed that this commit introduced the issue: > > fe2c2a106663130a5ab45cb0e3414b52df2fff0c is the first bad commit > commit fe2c2a106663130a5ab45cb0e3414b52df2fff0c > Author: Rik van Riel > Date: Wed Mar 21 16:33:51 2012 -0700 > > vmscan: reclaim at order 0 when compaction is enabled > ... > > This is plausible since the issue seems to be in the kswapd + compaction > realm. I've yet to figure out exactly what about this commit results in > kswapd spinning. > > I would be interested if someone can confirm this finding. > > -- > Seth > On my system 3.7-rc4 the problem seems to be effectively solved by revert patch: https://lkml.org/lkml/2012/11/5/308 i.e. in 2 days uptime kswapd0 eats 6 seconds which is IMHO ok - I'm not observing any busy loops on CPU with kswapd0. Zdenek -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx205.postini.com [74.125.245.205]) by kanga.kvack.org (Postfix) with SMTP id ED2606B002B for ; Fri, 9 Nov 2012 03:36:46 -0500 (EST) Date: Fri, 9 Nov 2012 08:36:37 +0000 From: Mel Gorman Subject: Re: [PATCH] Revert "mm: vmscan: scale number of pages reclaimed by reclaim/compaction based on failures" Message-ID: <20121109083637.GD8218@suse.de> References: <5077434D.7080008@suse.cz> <50780F26.7070007@suse.cz> <20121012135726.GY29125@suse.de> <507BDD45.1070705@suse.cz> <20121015110937.GE29125@suse.de> <5093A3F4.8090108@redhat.com> <5093A631.5020209@suse.cz> <509422C3.1000803@suse.cz> <20121105142449.GI8218@suse.de> <20121106111554.1896c3f3@fem.tu-ilmenau.de> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <20121106111554.1896c3f3@fem.tu-ilmenau.de> Sender: owner-linux-mm@kvack.org List-ID: To: Johannes Hirte Cc: Andrew Morton , Zdenek Kabelac , Valdis.Kletnieks@vt.edu, Jiri Slaby , linux-mm@kvack.org, Rik van Riel , Jiri Slaby , LKML On Tue, Nov 06, 2012 at 11:15:54AM +0100, Johannes Hirte wrote: > Am Mon, 5 Nov 2012 14:24:49 +0000 > schrieb Mel Gorman : > > > Jiri Slaby reported the following: > > > > (It's an effective revert of "mm: vmscan: scale number of > > pages reclaimed by reclaim/compaction based on failures".) Given > > kswapd had hours of runtime in ps/top output yesterday in the morning > > and after the revert it's now 2 minutes in sum for the last > > 24h, I would say, it's gone. > > > > The intention of the patch in question was to compensate for the loss > > of lumpy reclaim. Part of the reason lumpy reclaim worked is because > > it aggressively reclaimed pages and this patch was meant to be a sane > > compromise. > > > > When compaction fails, it gets deferred and both compaction and > > reclaim/compaction is deferred avoid excessive reclaim. However, since > > commit c6543459 (mm: remove __GFP_NO_KSWAPD), kswapd is woken up each > > time and continues reclaiming which was not taken into account when > > the patch was developed. > > > > Attempts to address the problem ended up just changing the shape of > > the problem instead of fixing it. The release window gets closer and > > while a THP allocation failing is not a major problem, kswapd chewing > > up a lot of CPU is. This patch reverts "mm: vmscan: scale number of > > pages reclaimed by reclaim/compaction based on failures" and will be > > revisited in the future. > > > > Signed-off-by: Mel Gorman > > --- > > mm/vmscan.c | 25 ------------------------- > > 1 file changed, 25 deletions(-) > > > > diff --git a/mm/vmscan.c b/mm/vmscan.c > > index 2624edc..e081ee8 100644 > > --- a/mm/vmscan.c > > +++ b/mm/vmscan.c > > @@ -1760,28 +1760,6 @@ static bool in_reclaim_compaction(struct > > scan_control *sc) return false; > > } > > > > -#ifdef CONFIG_COMPACTION > > -/* > > - * If compaction is deferred for sc->order then scale the number of > > pages > > - * reclaimed based on the number of consecutive allocation failures > > - */ > > -static unsigned long scale_for_compaction(unsigned long > > pages_for_compaction, > > - struct lruvec *lruvec, struct scan_control > > *sc) -{ > > - struct zone *zone = lruvec_zone(lruvec); > > - > > - if (zone->compact_order_failed <= sc->order) > > - pages_for_compaction <<= zone->compact_defer_shift; > > - return pages_for_compaction; > > -} > > -#else > > -static unsigned long scale_for_compaction(unsigned long > > pages_for_compaction, > > - struct lruvec *lruvec, struct scan_control > > *sc) -{ > > - return pages_for_compaction; > > -} > > -#endif > > - > > /* > > * Reclaim/compaction is used for high-order allocation requests. It > > reclaims > > * order-0 pages before compacting the zone. > > should_continue_reclaim() returns @@ -1829,9 +1807,6 @@ static inline > > bool should_continue_reclaim(struct lruvec *lruvec, > > * inactive lists are large enough, continue reclaiming > > */ > > pages_for_compaction = (2UL << sc->order); > > - > > - pages_for_compaction = > > scale_for_compaction(pages_for_compaction, > > - lruvec, sc); > > inactive_lru_pages = get_lru_size(lruvec, LRU_INACTIVE_FILE); > > if (nr_swap_pages > 0) > > inactive_lru_pages += get_lru_size(lruvec, > > LRU_INACTIVE_ANON); -- > > Even with this patch I see kswapd0 very often on top. Much more than > with kernel 3.6. How severe is the CPU usage? The higher usage can be explained by "mm: remove __GFP_NO_KSWAPD" which allows kswapd to compact memory to reduce the amount of time processes spend in compaction but will result in the CPU cost being incurred by kswapd. Is it really high like the bug was reporting with high usage over long periods of time or do you just see it using 2-6% of CPU for short periods? Thanks. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx124.postini.com [74.125.245.124]) by kanga.kvack.org (Postfix) with SMTP id 4BA726B002B for ; Fri, 9 Nov 2012 03:40:53 -0500 (EST) Date: Fri, 9 Nov 2012 08:40:48 +0000 From: Mel Gorman Subject: Re: kswapd0: excessive CPU usage Message-ID: <20121109084048.GE8218@suse.de> References: <119175.1349979570@turing-police.cc.vt.edu> <5077434D.7080008@suse.cz> <50780F26.7070007@suse.cz> <20121012135726.GY29125@suse.de> <507BDD45.1070705@suse.cz> <20121015110937.GE29125@suse.de> <5093A3F4.8090108@redhat.com> <5093A631.5020209@suse.cz> <509422C3.1000803@suse.cz> <509C84ED.8090605@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <509C84ED.8090605@linux.vnet.ibm.com> Sender: owner-linux-mm@kvack.org List-ID: To: Seth Jennings Cc: Jiri Slaby , Zdenek Kabelac , Valdis.Kletnieks@vt.edu, Jiri Slaby , linux-mm@kvack.org, LKML , Andrew Morton , Rik van Riel , Robert Jennings On Thu, Nov 08, 2012 at 10:22:05PM -0600, Seth Jennings wrote: > On 11/02/2012 02:45 PM, Jiri Slaby wrote: > > On 11/02/2012 11:53 AM, Jiri Slaby wrote: > >> On 11/02/2012 11:44 AM, Zdenek Kabelac wrote: > >>>>> Yes, applying this instead of the revert fixes the issue as well. > >>> > >>> I've applied this patch on 3.7.0-rc3 kernel - and I still see excessive > >>> CPU usage - mainly after suspend/resume > >>> > >>> Here is just simple kswapd backtrace from running kernel: > >> > >> Yup, this is what we were seeing with the former patch only too. Try to > >> apply the other one too: > >> https://patchwork.kernel.org/patch/1673231/ > >> > >> For me I would say, it is fixed by the two patches now. I won't be able > >> to report later, since I'm leaving to a conference tomorrow. > > > > Damn it. It recurred right now, with both patches applied. After I > > started a java program which consumed some more memory. Though there are > > still 2 gigs free, kswap is spinning: > > [] __cond_resched+0x2a/0x40 > > [] shrink_slab+0x1c0/0x2d0 > > [] kswapd+0x66d/0xb60 > > [] kthread+0xc0/0xd0 > > [] ret_from_fork+0x7c/0xb0 > > [] 0xffffffffffffffff > > I'm also hitting this issue in v3.7-rc4. It appears that the last > release not effected by this issue was v3.3. Bisecting the changes > included for v3.4-rc1 showed that this commit introduced the issue: > > fe2c2a106663130a5ab45cb0e3414b52df2fff0c is the first bad commit > commit fe2c2a106663130a5ab45cb0e3414b52df2fff0c > Author: Rik van Riel > Date: Wed Mar 21 16:33:51 2012 -0700 > > vmscan: reclaim at order 0 when compaction is enabled > ... > > This is plausible since the issue seems to be in the kswapd + compaction > realm. I've yet to figure out exactly what about this commit results in > kswapd spinning. > > I would be interested if someone can confirm this finding. > I cannot confirm the actual finding as I don't see the same sort of problems. However, this does make sense and was more or less expected. Reclaiming at order-0 would have forced compaction to be used more instead of lumpy reclaim (less CPU usage but greater system distruption that is harder to measure). Shortly after, lumpy reclaim was removed entirely so now larger amounts of CPU time is spent compacting memory that previously would have been reclaimed. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx120.postini.com [74.125.245.120]) by kanga.kvack.org (Postfix) with SMTP id 10C3E6B004D for ; Fri, 9 Nov 2012 04:06:40 -0500 (EST) Date: Fri, 9 Nov 2012 09:06:35 +0000 From: Mel Gorman Subject: Re: kswapd0: excessive CPU usage Message-ID: <20121109090635.GG8218@suse.de> References: <5077434D.7080008@suse.cz> <50780F26.7070007@suse.cz> <20121012135726.GY29125@suse.de> <507BDD45.1070705@suse.cz> <20121015110937.GE29125@suse.de> <5093A3F4.8090108@redhat.com> <5093A631.5020209@suse.cz> <509422C3.1000803@suse.cz> <509C84ED.8090605@linux.vnet.ibm.com> <509CB9D1.6060704@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <509CB9D1.6060704@redhat.com> Sender: owner-linux-mm@kvack.org List-ID: To: Zdenek Kabelac Cc: Seth Jennings , Jiri Slaby , Valdis.Kletnieks@vt.edu, Jiri Slaby , linux-mm@kvack.org, LKML , Andrew Morton , Rik van Riel , Robert Jennings On Fri, Nov 09, 2012 at 09:07:45AM +0100, Zdenek Kabelac wrote: > >fe2c2a106663130a5ab45cb0e3414b52df2fff0c is the first bad commit > >commit fe2c2a106663130a5ab45cb0e3414b52df2fff0c > >Author: Rik van Riel > >Date: Wed Mar 21 16:33:51 2012 -0700 > > > > vmscan: reclaim at order 0 when compaction is enabled > >... > > > >This is plausible since the issue seems to be in the kswapd + compaction > >realm. I've yet to figure out exactly what about this commit results in > >kswapd spinning. > > > >I would be interested if someone can confirm this finding. > > > >-- > >Seth > > > > > On my system 3.7-rc4 the problem seems to be effectively solved by > revert patch: https://lkml.org/lkml/2012/11/5/308 > Ok, while there is still a question on whether it's enough I think it's sensible to at least start with the obvious one. Thanks very much. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx170.postini.com [74.125.245.170]) by kanga.kvack.org (Postfix) with SMTP id 6EB906B005D for ; Fri, 9 Nov 2012 04:13:03 -0500 (EST) Date: Fri, 9 Nov 2012 09:12:58 +0000 From: Mel Gorman Subject: Re: [PATCH] Revert "mm: vmscan: scale number of pages reclaimed by reclaim/compaction based on failures" Message-ID: <20121109091258.GH8218@suse.de> References: <119175.1349979570@turing-police.cc.vt.edu> <5077434D.7080008@suse.cz> <50780F26.7070007@suse.cz> <20121012135726.GY29125@suse.de> <507BDD45.1070705@suse.cz> <20121015110937.GE29125@suse.de> <5093A3F4.8090108@redhat.com> <5093A631.5020209@suse.cz> <509422C3.1000803@suse.cz> <20121105142449.GI8218@suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <20121105142449.GI8218@suse.de> Sender: owner-linux-mm@kvack.org List-ID: To: Andrew Morton Cc: Zdenek Kabelac , Valdis.Kletnieks@vt.edu, Jiri Slaby , linux-mm@kvack.org, Rik van Riel , Jiri Slaby , LKML On Mon, Nov 05, 2012 at 02:24:49PM +0000, Mel Gorman wrote: > Jiri Slaby reported the following: > > (It's an effective revert of "mm: vmscan: scale number of pages > reclaimed by reclaim/compaction based on failures".) Given kswapd > had hours of runtime in ps/top output yesterday in the morning > and after the revert it's now 2 minutes in sum for the last 24h, > I would say, it's gone. > > The intention of the patch in question was to compensate for the loss > of lumpy reclaim. Part of the reason lumpy reclaim worked is because > it aggressively reclaimed pages and this patch was meant to be a sane > compromise. > > When compaction fails, it gets deferred and both compaction and > reclaim/compaction is deferred avoid excessive reclaim. However, since > commit c6543459 (mm: remove __GFP_NO_KSWAPD), kswapd is woken up each time > and continues reclaiming which was not taken into account when the patch > was developed. > > Attempts to address the problem ended up just changing the shape of the > problem instead of fixing it. The release window gets closer and while a > THP allocation failing is not a major problem, kswapd chewing up a lot of > CPU is. This patch reverts "mm: vmscan: scale number of pages reclaimed > by reclaim/compaction based on failures" and will be revisited in the future. > > Signed-off-by: Mel Gorman Andrew, can you pick up this patch please and drop mm-vmscan-scale-number-of-pages-reclaimed-by-reclaim-compaction-only-in-direct-reclaim.patch ? There are mixed reports on how much it helps but it comes down to "this fixes a problem" versus "kswapd is still showing higher usage". I think the higher kswapd usage is explained by the removal of __GFP_NO_KSWAPD and so while higher usage is bad, it is not necessarily unjustified. Ideally it would have been proven that having kswapd doing the work reduced application stalls in direct reclaim but unfortunately I do not have concrete evidence of that at this time. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx125.postini.com [74.125.245.125]) by kanga.kvack.org (Postfix) with SMTP id 585E96B002B for ; Sun, 11 Nov 2012 04:13:22 -0500 (EST) Message-ID: <509F6C2A.9060502@redhat.com> Date: Sun, 11 Nov 2012 10:13:14 +0100 From: Zdenek Kabelac MIME-Version: 1.0 Subject: Re: kswapd0: excessive CPU usage References: <5077434D.7080008@suse.cz> <50780F26.7070007@suse.cz> <20121012135726.GY29125@suse.de> <507BDD45.1070705@suse.cz> <20121015110937.GE29125@suse.de> <5093A3F4.8090108@redhat.com> <5093A631.5020209@suse.cz> <509422C3.1000803@suse.cz> <509C84ED.8090605@linux.vnet.ibm.com> <509CB9D1.6060704@redhat.com> <20121109090635.GG8218@suse.de> In-Reply-To: <20121109090635.GG8218@suse.de> Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Mel Gorman Cc: Seth Jennings , Jiri Slaby , Valdis.Kletnieks@vt.edu, Jiri Slaby , linux-mm@kvack.org, LKML , Andrew Morton , Rik van Riel , Robert Jennings Dne 9.11.2012 10:06, Mel Gorman napsal(a): > On Fri, Nov 09, 2012 at 09:07:45AM +0100, Zdenek Kabelac wrote: >>> fe2c2a106663130a5ab45cb0e3414b52df2fff0c is the first bad commit >>> commit fe2c2a106663130a5ab45cb0e3414b52df2fff0c >>> Author: Rik van Riel >>> Date: Wed Mar 21 16:33:51 2012 -0700 >>> >>> vmscan: reclaim at order 0 when compaction is enabled >>> ... >>> >>> This is plausible since the issue seems to be in the kswapd + compaction >>> realm. I've yet to figure out exactly what about this commit results in >>> kswapd spinning. >>> >>> I would be interested if someone can confirm this finding. >>> >>> -- >>> Seth >>> >> >> >> On my system 3.7-rc4 the problem seems to be effectively solved by >> revert patch: https://lkml.org/lkml/2012/11/5/308 >> > > Ok, while there is still a question on whether it's enough I think it's > sensible to at least start with the obvious one. > Hmm, so it's just took longer to hit the problem and observe kswapd0 spinning on my CPU again - it's not as endless like before - but still it easily eats minutes - it helps to turn off Firefox or TB (memory hungry apps) so kswapd0 stops soon - and restart those apps again. (And I still have like >1GB of cached memory) kswapd0 R running task 0 30 2 0x00000000 ffff8801331efae8 0000000000000082 0000000000000018 0000000000000246 ffff880135b9a340 ffff8801331effd8 ffff8801331effd8 ffff8801331effd8 ffff880055dfa340 ffff880135b9a340 00000000331efad8 ffff8801331ee000 Call Trace: [] preempt_schedule+0x42/0x60 [] _raw_spin_unlock+0x55/0x60 [] put_super+0x31/0x40 [] drop_super+0x22/0x30 [] prune_super+0x149/0x1b0 [] shrink_slab+0xba/0x510 [] ? mem_cgroup_iter+0x17a/0x2e0 [] ? mem_cgroup_iter+0xca/0x2e0 [] balance_pgdat+0x629/0x7f0 [] kswapd+0x174/0x620 [] ? __init_waitqueue_head+0x60/0x60 [] ? balance_pgdat+0x7f0/0x7f0 [] kthread+0xdb/0xe0 [] ? kthread_create_on_node+0x140/0x140 [] ret_from_fork+0x7c/0xb0 [] ? kthread_create_on_node+0x140/0x140 runnable tasks: task PID tree-key switches prio exec-runtime sum-exec sum-sleep ---------------------------------------------------------------------------------------------------------- kswapd0 30 8689943.729790 36266 120 8689943.729790 201495.640629 56609485.489414 / kworker/0:1 14790 8689937.729790 16969 120 8689937.729790 374.385996 150405.181652 / R bash 14855 821.749268 50 120 821.749268 24.027535 5252.291128 /autogroup-304 Mem-Info: DMA per-cpu: CPU 0: hi: 0, btch: 1 usd: 0 CPU 1: hi: 0, btch: 1 usd: 0 DMA32 per-cpu: CPU 0: hi: 186, btch: 31 usd: 146 CPU 1: hi: 186, btch: 31 usd: 135 Normal per-cpu: CPU 0: hi: 186, btch: 31 usd: 131 CPU 1: hi: 186, btch: 31 usd: 132 active_anon:726521 inactive_anon:26442 isolated_anon:0 active_file:77765 inactive_file:76890 isolated_file:0 unevictable:12 dirty:4 writeback:0 unstable:0 free:40261 slab_reclaimable:12414 slab_unreclaimable:9694 mapped:26382 shmem:162712 pagetables:6618 bounce:0 free_cma:0 DMA free:15676kB min:272kB low:340kB high:408kB active_anon:208kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15900kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:208kB slab_reclaimable:8kB slab_unreclaimable:8kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes lowmem_reserve[]: 0 2951 3836 3836 DMA32 free:126072kB min:51776kB low:64720kB high:77664kB active_anon:2175104kB inactive_anon:98976kB active_file:296252kB inactive_file:297648kB unevictable:48kB isolated(anon):0kB isolated(file):0kB present:3021960kB mlocked:48kB dirty:12kB writeback:0kB mapped:77664kB shmem:620388kB slab_reclaimable:19128kB slab_unreclaimable:6292kB kernel_stack:624kB pagetables:8900kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 885 885 Normal free:19296kB min:15532kB low:19412kB high:23296kB active_anon:730772kB inactive_anon:6792kB active_file:14808kB inactive_file:9912kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:906664kB mlocked:0kB dirty:4kB writeback:0kB mapped:27864kB shmem:30252kB slab_reclaimable:30520kB slab_unreclaimable:32476kB kernel_stack:2496kB pagetables:17572kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 0 0 DMA: 1*4kB 1*8kB 3*16kB 2*32kB 3*64kB 2*128kB 3*256kB 2*512kB 3*1024kB 3*2048kB 1*4096kB = 15676kB DMA32: 730*4kB 328*8kB 223*16kB 123*32kB 182*64kB 96*128kB 172*256kB 56*512kB 12*1024kB 1*2048kB 1*4096kB = 128120kB Normal: 600*4kB 384*8kB 164*16kB 122*32kB 40*64kB 7*128kB 1*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 19296kB 317367 total pagecache pages 0 pages in swap cache Swap cache stats: add 0, delete 0, find 0/0 Free swap = 0kB Total swap = 0kB 1032176 pages RAM 42789 pages reserved 642501 pages shared 869271 pages non-shared -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx103.postini.com [74.125.245.103]) by kanga.kvack.org (Postfix) with SMTP id CEE2D6B004D for ; Mon, 12 Nov 2012 06:37:36 -0500 (EST) Date: Mon, 12 Nov 2012 11:37:31 +0000 From: Mel Gorman Subject: [PATCH] Revert "mm: remove __GFP_NO_KSWAPD" Message-ID: <20121112113731.GS8218@suse.de> References: <20121012135726.GY29125@suse.de> <507BDD45.1070705@suse.cz> <20121015110937.GE29125@suse.de> <5093A3F4.8090108@redhat.com> <5093A631.5020209@suse.cz> <509422C3.1000803@suse.cz> <509C84ED.8090605@linux.vnet.ibm.com> <509CB9D1.6060704@redhat.com> <20121109090635.GG8218@suse.de> <509F6C2A.9060502@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <509F6C2A.9060502@redhat.com> Sender: owner-linux-mm@kvack.org List-ID: To: Zdenek Kabelac Cc: Seth Jennings , Jiri Slaby , Valdis.Kletnieks@vt.edu, Jiri Slaby , linux-mm@kvack.org, LKML , Andrew Morton , Rik van Riel , Robert Jennings With "mm: vmscan: scale number of pages reclaimed by reclaim/compaction based on failures" reverted, Zdenek Kabelac reported the following Hmm, so it's just took longer to hit the problem and observe kswapd0 spinning on my CPU again - it's not as endless like before - but still it easily eats minutes - it helps to turn off Firefox or TB (memory hungry apps) so kswapd0 stops soon - and restart those apps again. (And I still have like >1GB of cached memory) kswapd0 R running task 0 30 2 0x00000000 ffff8801331efae8 0000000000000082 0000000000000018 0000000000000246 ffff880135b9a340 ffff8801331effd8 ffff8801331effd8 ffff8801331effd8 ffff880055dfa340 ffff880135b9a340 00000000331efad8 ffff8801331ee000 Call Trace: [] preempt_schedule+0x42/0x60 [] _raw_spin_unlock+0x55/0x60 [] put_super+0x31/0x40 [] drop_super+0x22/0x30 [] prune_super+0x149/0x1b0 [] shrink_slab+0xba/0x510 The sysrq+m indicates the system has no swap so it'll never reclaim anonymous pages as part of reclaim/compaction. That is one part of the problem but not the root cause as file-backed pages could also be reclaimed. The likely underlying problem is that kswapd is woken up or kept awake for each THP allocation request in the page allocator slow path. If compaction fails for the requesting process then compaction will be deferred for a time and direct reclaim is avoided. However, if there are a storm of THP requests that are simply rejected, it will still be the the case that kswapd is awake for a prolonged period of time as pgdat->kswapd_max_order is updated each time. This is noticed by the main kswapd() loop and it will not call kswapd_try_to_sleep(). Instead it will loopp, shrinking a small number of pages and calling shrink_slab() on each iteration. The temptation is to supply a patch that checks if kswapd was woken for THP and if so ignore pgdat->kswapd_max_order but it'll be a hack and not backed up by proper testing. As 3.7 is very close to release and this is not a bug we should release with, a safer path is to revert "mm: remove __GFP_NO_KSWAPD" for now and revisit it with the view to ironing out the balance_pgdat() logic in general. Signed-off-by: Mel Gorman --- drivers/mtd/mtdcore.c | 6 ++++-- include/linux/gfp.h | 5 ++++- include/trace/events/gfpflags.h | 1 + mm/page_alloc.c | 7 ++++--- 4 files changed, 13 insertions(+), 6 deletions(-) diff --git a/drivers/mtd/mtdcore.c b/drivers/mtd/mtdcore.c index 374c46d..ec794a7 100644 --- a/drivers/mtd/mtdcore.c +++ b/drivers/mtd/mtdcore.c @@ -1077,7 +1077,8 @@ EXPORT_SYMBOL_GPL(mtd_writev); * until the request succeeds or until the allocation size falls below * the system page size. This attempts to make sure it does not adversely * impact system performance, so when allocating more than one page, we - * ask the memory allocator to avoid re-trying. + * ask the memory allocator to avoid re-trying, swapping, writing back + * or performing I/O. * * Note, this function also makes sure that the allocated buffer is aligned to * the MTD device's min. I/O unit, i.e. the "mtd->writesize" value. @@ -1091,7 +1092,8 @@ EXPORT_SYMBOL_GPL(mtd_writev); */ void *mtd_kmalloc_up_to(const struct mtd_info *mtd, size_t *size) { - gfp_t flags = __GFP_NOWARN | __GFP_WAIT | __GFP_NORETRY; + gfp_t flags = __GFP_NOWARN | __GFP_WAIT | + __GFP_NORETRY | __GFP_NO_KSWAPD; size_t min_alloc = max_t(size_t, mtd->writesize, PAGE_SIZE); void *kbuf; diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 02c1c971..d0a7967 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -31,6 +31,7 @@ struct vm_area_struct; #define ___GFP_THISNODE 0x40000u #define ___GFP_RECLAIMABLE 0x80000u #define ___GFP_NOTRACK 0x200000u +#define ___GFP_NO_KSWAPD 0x400000u #define ___GFP_OTHER_NODE 0x800000u #define ___GFP_WRITE 0x1000000u @@ -85,6 +86,7 @@ struct vm_area_struct; #define __GFP_RECLAIMABLE ((__force gfp_t)___GFP_RECLAIMABLE) /* Page is reclaimable */ #define __GFP_NOTRACK ((__force gfp_t)___GFP_NOTRACK) /* Don't track with kmemcheck */ +#define __GFP_NO_KSWAPD ((__force gfp_t)___GFP_NO_KSWAPD) #define __GFP_OTHER_NODE ((__force gfp_t)___GFP_OTHER_NODE) /* On behalf of other node */ #define __GFP_WRITE ((__force gfp_t)___GFP_WRITE) /* Allocator intends to dirty page */ @@ -114,7 +116,8 @@ struct vm_area_struct; __GFP_MOVABLE) #define GFP_IOFS (__GFP_IO | __GFP_FS) #define GFP_TRANSHUGE (GFP_HIGHUSER_MOVABLE | __GFP_COMP | \ - __GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_NOWARN) + __GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_NOWARN | \ + __GFP_NO_KSWAPD) #ifdef CONFIG_NUMA #define GFP_THISNODE (__GFP_THISNODE | __GFP_NOWARN | __GFP_NORETRY) diff --git a/include/trace/events/gfpflags.h b/include/trace/events/gfpflags.h index 9391706..d6fd8e5 100644 --- a/include/trace/events/gfpflags.h +++ b/include/trace/events/gfpflags.h @@ -36,6 +36,7 @@ {(unsigned long)__GFP_RECLAIMABLE, "GFP_RECLAIMABLE"}, \ {(unsigned long)__GFP_MOVABLE, "GFP_MOVABLE"}, \ {(unsigned long)__GFP_NOTRACK, "GFP_NOTRACK"}, \ + {(unsigned long)__GFP_NO_KSWAPD, "GFP_NO_KSWAPD"}, \ {(unsigned long)__GFP_OTHER_NODE, "GFP_OTHER_NODE"} \ ) : "GFP_NOWAIT" diff --git a/mm/page_alloc.c b/mm/page_alloc.c index bb90971..7228260 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2416,8 +2416,9 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, goto nopage; restart: - wake_all_kswapd(order, zonelist, high_zoneidx, - zone_idx(preferred_zone)); + if (!(gfp_mask & __GFP_NO_KSWAPD)) + wake_all_kswapd(order, zonelist, high_zoneidx, + zone_idx(preferred_zone)); /* * OK, we're below the kswapd watermark and have kicked background @@ -2494,7 +2495,7 @@ rebalance: * system then fail the allocation instead of entering direct reclaim. */ if ((deferred_compaction || contended_compaction) && - (gfp_mask & (__GFP_MOVABLE|__GFP_REPEAT)) == __GFP_MOVABLE) + (gfp_mask & __GFP_NO_KSWAPD)) goto nopage; /* Try direct reclaim and then allocating */ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx111.postini.com [74.125.245.111]) by kanga.kvack.org (Postfix) with SMTP id 010FA6B004D for ; Mon, 12 Nov 2012 07:20:01 -0500 (EST) Date: Mon, 12 Nov 2012 12:19:57 +0000 From: Mel Gorman Subject: Re: kswapd0: excessive CPU usage Message-ID: <20121112121956.GT8218@suse.de> References: <20121012135726.GY29125@suse.de> <507BDD45.1070705@suse.cz> <20121015110937.GE29125@suse.de> <5093A3F4.8090108@redhat.com> <5093A631.5020209@suse.cz> <509422C3.1000803@suse.cz> <509C84ED.8090605@linux.vnet.ibm.com> <509CB9D1.6060704@redhat.com> <20121109090635.GG8218@suse.de> <509F6C2A.9060502@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <509F6C2A.9060502@redhat.com> Sender: owner-linux-mm@kvack.org List-ID: To: Zdenek Kabelac Cc: Seth Jennings , Jiri Slaby , Valdis.Kletnieks@vt.edu, Jiri Slaby , linux-mm@kvack.org, LKML , Andrew Morton , Rik van Riel , Robert Jennings On Sun, Nov 11, 2012 at 10:13:14AM +0100, Zdenek Kabelac wrote: > Hmm, so it's just took longer to hit the problem and observe kswapd0 > spinning on my CPU again - it's not as endless like before - but > still it easily eats minutes - it helps to turn off Firefox or TB > (memory hungry apps) so kswapd0 stops soon - and restart those apps > again. > (And I still have like >1GB of cached memory) > I posted a "safe" patch that I believe explains why you are seeing what you are seeing. It does mean that there will still be some stalls due to THP because kswapd is not helping and it's avoiding the problem rather than trying to deal with it. Hence, I'm also going to post this patch even though I have not tested it myself. If you find it fixes the problem then it would be a preferable patch to the revert. It still is the case that the balance_pgdat() logic is in sort need of a rethink as it's pretty twisted right now. Thanks ---8<--- mm: Avoid waking kswapd for THP allocations when compaction is deferred or contended With "mm: vmscan: scale number of pages reclaimed by reclaim/compaction based on failures" reverted, Zdenek Kabelac reported the following Hmm, so it's just took longer to hit the problem and observe kswapd0 spinning on my CPU again - it's not as endless like before - but still it easily eats minutes - it helps to turn off Firefox or TB (memory hungry apps) so kswapd0 stops soon - and restart those apps again. (And I still have like >1GB of cached memory) kswapd0 R running task 0 30 2 0x00000000 ffff8801331efae8 0000000000000082 0000000000000018 0000000000000246 ffff880135b9a340 ffff8801331effd8 ffff8801331effd8 ffff8801331effd8 ffff880055dfa340 ffff880135b9a340 00000000331efad8 ffff8801331ee000 Call Trace: [] preempt_schedule+0x42/0x60 [] _raw_spin_unlock+0x55/0x60 [] put_super+0x31/0x40 [] drop_super+0x22/0x30 [] prune_super+0x149/0x1b0 [] shrink_slab+0xba/0x510 The sysrq+m indicates the system has no swap so it'll never reclaim anonymous pages as part of reclaim/compaction. That is one part of the problem but not the root cause as file-backed pages could also be reclaimed. The likely underlying problem is that kswapd is woken up or kept awake for each THP allocation request in the page allocator slow path. If compaction fails for the requesting process then compaction will be deferred for a time and direct reclaim is avoided. However, if there are a storm of THP requests that are simply rejected, it will still be the the case that kswapd is awake for a prolonged period of time as pgdat->kswapd_max_order is updated each time. This is noticed by the main kswapd() loop and it will not call kswapd_try_to_sleep(). Instead it will loopp, shrinking a small number of pages and calling shrink_slab() on each iteration. This patch defers when kswapd gets woken up for THP allocations. For !THP allocations, kswapd is always woken up. For THP allocations, kswapd is woken up iff the process is willing to enter into direct reclaim/compaction. Signed-off-by: Mel Gorman diff --git a/mm/page_alloc.c b/mm/page_alloc.c index bb90971..0b469b4 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2378,6 +2378,15 @@ bool gfp_pfmemalloc_allowed(gfp_t gfp_mask) return !!(gfp_to_alloc_flags(gfp_mask) & ALLOC_NO_WATERMARKS); } +/* Returns true if the allocation is likely for THP */ +static bool is_thp_alloc(gfp_t gfp_mask, unsigned int order) +{ + if (order == pageblock_order && + (gfp_mask & (__GFP_MOVABLE|__GFP_REPEAT)) == __GFP_MOVABLE) + return true; + return false; +} + static inline struct page * __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, struct zonelist *zonelist, enum zone_type high_zoneidx, @@ -2416,7 +2425,9 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, goto nopage; restart: - wake_all_kswapd(order, zonelist, high_zoneidx, + /* The decision whether to wake kswapd for THP is made later */ + if (!is_thp_alloc(gfp_mask, order)) + wake_all_kswapd(order, zonelist, high_zoneidx, zone_idx(preferred_zone)); /* @@ -2487,15 +2498,21 @@ rebalance: goto got_pg; sync_migration = true; - /* - * If compaction is deferred for high-order allocations, it is because - * sync compaction recently failed. In this is the case and the caller - * requested a movable allocation that does not heavily disrupt the - * system then fail the allocation instead of entering direct reclaim. - */ - if ((deferred_compaction || contended_compaction) && - (gfp_mask & (__GFP_MOVABLE|__GFP_REPEAT)) == __GFP_MOVABLE) - goto nopage; + if (is_thp_alloc(gfp_mask, order)) { + /* + * If compaction is deferred for high-order allocations, it is + * because sync compaction recently failed. In this is the case + * and the caller requested a movable allocation that does not + * heavily disrupt the system then fail the allocation instead + * of entering direct reclaim. + */ + if (deferred_compaction || contended_compaction) + goto nopage; + + /* If process is willing to reclaim/compact then wake kswapd */ + wake_all_kswapd(order, zonelist, high_zoneidx, + zone_idx(preferred_zone)); + } /* Try direct reclaim and then allocating */ page = __alloc_pages_direct_reclaim(gfp_mask, order, -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx142.postini.com [74.125.245.142]) by kanga.kvack.org (Postfix) with SMTP id DA2EC6B005D for ; Mon, 12 Nov 2012 08:13:28 -0500 (EST) Message-ID: <50A0F5F0.6090400@redhat.com> Date: Mon, 12 Nov 2012 14:13:20 +0100 From: Zdenek Kabelac MIME-Version: 1.0 Subject: Re: kswapd0: excessive CPU usage References: <20121012135726.GY29125@suse.de> <507BDD45.1070705@suse.cz> <20121015110937.GE29125@suse.de> <5093A3F4.8090108@redhat.com> <5093A631.5020209@suse.cz> <509422C3.1000803@suse.cz> <509C84ED.8090605@linux.vnet.ibm.com> <509CB9D1.6060704@redhat.com> <20121109090635.GG8218@suse.de> <509F6C2A.9060502@redhat.com> <20121112121956.GT8218@suse.de> In-Reply-To: <20121112121956.GT8218@suse.de> Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Mel Gorman Cc: Seth Jennings , Jiri Slaby , Valdis.Kletnieks@vt.edu, Jiri Slaby , linux-mm@kvack.org, LKML , Andrew Morton , Rik van Riel , Robert Jennings Dne 12.11.2012 13:19, Mel Gorman napsal(a): > On Sun, Nov 11, 2012 at 10:13:14AM +0100, Zdenek Kabelac wrote: >> Hmm, so it's just took longer to hit the problem and observe kswapd0 >> spinning on my CPU again - it's not as endless like before - but >> still it easily eats minutes - it helps to turn off Firefox or TB >> (memory hungry apps) so kswapd0 stops soon - and restart those apps >> again. >> (And I still have like >1GB of cached memory) >> > > I posted a "safe" patch that I believe explains why you are seeing what > you are seeing. It does mean that there will still be some stalls due to > THP because kswapd is not helping and it's avoiding the problem rather > than trying to deal with it. > > Hence, I'm also going to post this patch even though I have not tested > it myself. If you find it fixes the problem then it would be a > preferable patch to the revert. It still is the case that the > balance_pgdat() logic is in sort need of a rethink as it's pretty > twisted right now. > Should I apply them all together for 3.7-rc5 ? 1) https://lkml.org/lkml/2012/11/5/308 2) https://lkml.org/lkml/2012/11/12/113 3) https://lkml.org/lkml/2012/11/12/151 Zdenek -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx118.postini.com [74.125.245.118]) by kanga.kvack.org (Postfix) with SMTP id CD57D6B005A for ; Mon, 12 Nov 2012 08:31:44 -0500 (EST) Date: Mon, 12 Nov 2012 13:31:39 +0000 From: Mel Gorman Subject: Re: kswapd0: excessive CPU usage Message-ID: <20121112133139.GU8218@suse.de> References: <20121015110937.GE29125@suse.de> <5093A3F4.8090108@redhat.com> <5093A631.5020209@suse.cz> <509422C3.1000803@suse.cz> <509C84ED.8090605@linux.vnet.ibm.com> <509CB9D1.6060704@redhat.com> <20121109090635.GG8218@suse.de> <509F6C2A.9060502@redhat.com> <20121112121956.GT8218@suse.de> <50A0F5F0.6090400@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <50A0F5F0.6090400@redhat.com> Sender: owner-linux-mm@kvack.org List-ID: To: Zdenek Kabelac Cc: Seth Jennings , Jiri Slaby , Valdis.Kletnieks@vt.edu, Jiri Slaby , linux-mm@kvack.org, LKML , Andrew Morton , Rik van Riel , Robert Jennings On Mon, Nov 12, 2012 at 02:13:20PM +0100, Zdenek Kabelac wrote: > Dne 12.11.2012 13:19, Mel Gorman napsal(a): > >On Sun, Nov 11, 2012 at 10:13:14AM +0100, Zdenek Kabelac wrote: > >>Hmm, so it's just took longer to hit the problem and observe kswapd0 > >>spinning on my CPU again - it's not as endless like before - but > >>still it easily eats minutes - it helps to turn off Firefox or TB > >>(memory hungry apps) so kswapd0 stops soon - and restart those apps > >>again. > >>(And I still have like >1GB of cached memory) > >> > > > >I posted a "safe" patch that I believe explains why you are seeing what > >you are seeing. It does mean that there will still be some stalls due to > >THP because kswapd is not helping and it's avoiding the problem rather > >than trying to deal with it. > > > >Hence, I'm also going to post this patch even though I have not tested > >it myself. If you find it fixes the problem then it would be a > >preferable patch to the revert. It still is the case that the > >balance_pgdat() logic is in sort need of a rethink as it's pretty > >twisted right now. > > > > > Should I apply them all together for 3.7-rc5 ? > > 1) https://lkml.org/lkml/2012/11/5/308 > 2) https://lkml.org/lkml/2012/11/12/113 > 3) https://lkml.org/lkml/2012/11/12/151 > Not all together. Test either 1+2 or 1+3. 1+2 is the safer choice but does nothing about THP stalls. 1+3 is a riskier version but depends on me being correct on what the root cause of the problem you see it. If both 1+2 and 1+3 work for you, I'd choose 1+3 for merging. If you only have the time to test one combination then it would be preferred that you test the safe option of 1+2. Thanks. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx192.postini.com [74.125.245.192]) by kanga.kvack.org (Postfix) with SMTP id 6B8F96B006E for ; Mon, 12 Nov 2012 09:50:44 -0500 (EST) Message-ID: <50A10CBA.7000200@redhat.com> Date: Mon, 12 Nov 2012 15:50:34 +0100 From: Zdenek Kabelac MIME-Version: 1.0 Subject: Re: kswapd0: excessive CPU usage References: <20121015110937.GE29125@suse.de> <5093A3F4.8090108@redhat.com> <5093A631.5020209@suse.cz> <509422C3.1000803@suse.cz> <509C84ED.8090605@linux.vnet.ibm.com> <509CB9D1.6060704@redhat.com> <20121109090635.GG8218@suse.de> <509F6C2A.9060502@redhat.com> <20121112121956.GT8218@suse.de> <50A0F5F0.6090400@redhat.com> <20121112133139.GU8218@suse.de> In-Reply-To: <20121112133139.GU8218@suse.de> Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Mel Gorman Cc: Seth Jennings , Jiri Slaby , Valdis.Kletnieks@vt.edu, Jiri Slaby , linux-mm@kvack.org, LKML , Andrew Morton , Rik van Riel , Robert Jennings Dne 12.11.2012 14:31, Mel Gorman napsal(a): > On Mon, Nov 12, 2012 at 02:13:20PM +0100, Zdenek Kabelac wrote: >> Dne 12.11.2012 13:19, Mel Gorman napsal(a): >>> On Sun, Nov 11, 2012 at 10:13:14AM +0100, Zdenek Kabelac wrote: >>>> Hmm, so it's just took longer to hit the problem and observe kswapd0 >>>> spinning on my CPU again - it's not as endless like before - but >>>> still it easily eats minutes - it helps to turn off Firefox or TB >>>> (memory hungry apps) so kswapd0 stops soon - and restart those apps >>>> again. >>>> (And I still have like >1GB of cached memory) >>>> >>> >>> I posted a "safe" patch that I believe explains why you are seeing what >>> you are seeing. It does mean that there will still be some stalls due to >>> THP because kswapd is not helping and it's avoiding the problem rather >>> than trying to deal with it. >>> >>> Hence, I'm also going to post this patch even though I have not tested >>> it myself. If you find it fixes the problem then it would be a >>> preferable patch to the revert. It still is the case that the >>> balance_pgdat() logic is in sort need of a rethink as it's pretty >>> twisted right now. >>> >> >> >> Should I apply them all together for 3.7-rc5 ? >> >> 1) https://lkml.org/lkml/2012/11/5/308 >> 2) https://lkml.org/lkml/2012/11/12/113 >> 3) https://lkml.org/lkml/2012/11/12/151 >> > > Not all together. Test either 1+2 or 1+3. 1+2 is the safer choice but > does nothing about THP stalls. 1+3 is a riskier version but depends on > me being correct on what the root cause of the problem you see it. > > If both 1+2 and 1+3 work for you, I'd choose 1+3 for merging. If you only > have the time to test one combination then it would be preferred that you > test the safe option of 1+2. > > I'll go with 1+2 for couple days - the issue is - I've no idea how it gets suddenly triggered - it seemed to be running fine for 2-3 days even with just 1) - but then kswapd0 started to occupy CPU for minutes. Looks like some intensive workload on firefox (flash) may lead to that. Anyway it's hard to tell quickly if it helped. Zdenek -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx201.postini.com [74.125.245.201]) by kanga.kvack.org (Postfix) with SMTP id 1419C6B00AB for ; Wed, 14 Nov 2012 16:44:21 -0500 (EST) Date: Wed, 14 Nov 2012 22:43:40 +0100 From: Johannes Hirte Subject: Re: [PATCH] Revert "mm: vmscan: scale number of pages reclaimed by reclaim/compaction based on failures" Message-ID: <20121114224340.5f7cee78@fem.tu-ilmenau.de> In-Reply-To: <20121109083637.GD8218@suse.de> References: <5077434D.7080008@suse.cz> <50780F26.7070007@suse.cz> <20121012135726.GY29125@suse.de> <507BDD45.1070705@suse.cz> <20121015110937.GE29125@suse.de> <5093A3F4.8090108@redhat.com> <5093A631.5020209@suse.cz> <509422C3.1000803@suse.cz> <20121105142449.GI8218@suse.de> <20121106111554.1896c3f3@fem.tu-ilmenau.de> <20121109083637.GD8218@suse.de> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Mel Gorman Cc: Andrew Morton , Zdenek Kabelac , Valdis.Kletnieks@vt.edu, Jiri Slaby , linux-mm@kvack.org, Rik van Riel , Jiri Slaby , LKML Am Fri, 9 Nov 2012 08:36:37 +0000 schrieb Mel Gorman : > On Tue, Nov 06, 2012 at 11:15:54AM +0100, Johannes Hirte wrote: > > Am Mon, 5 Nov 2012 14:24:49 +0000 > > schrieb Mel Gorman : > > > > > Jiri Slaby reported the following: > > > > > > (It's an effective revert of "mm: vmscan: scale number of > > > pages reclaimed by reclaim/compaction based on failures".) Given > > > kswapd had hours of runtime in ps/top output yesterday in the > > > morning and after the revert it's now 2 minutes in sum for the > > > last 24h, I would say, it's gone. > > > > > > The intention of the patch in question was to compensate for the > > > loss of lumpy reclaim. Part of the reason lumpy reclaim worked is > > > because it aggressively reclaimed pages and this patch was meant > > > to be a sane compromise. > > > > > > When compaction fails, it gets deferred and both compaction and > > > reclaim/compaction is deferred avoid excessive reclaim. However, > > > since commit c6543459 (mm: remove __GFP_NO_KSWAPD), kswapd is > > > woken up each time and continues reclaiming which was not taken > > > into account when the patch was developed. > > > > > > Attempts to address the problem ended up just changing the shape > > > of the problem instead of fixing it. The release window gets > > > closer and while a THP allocation failing is not a major problem, > > > kswapd chewing up a lot of CPU is. This patch reverts "mm: > > > vmscan: scale number of pages reclaimed by reclaim/compaction > > > based on failures" and will be revisited in the future. > > > > > > Signed-off-by: Mel Gorman > > > --- > > > mm/vmscan.c | 25 ------------------------- > > > 1 file changed, 25 deletions(-) > > > > > > diff --git a/mm/vmscan.c b/mm/vmscan.c > > > index 2624edc..e081ee8 100644 > > > --- a/mm/vmscan.c > > > +++ b/mm/vmscan.c > > > @@ -1760,28 +1760,6 @@ static bool in_reclaim_compaction(struct > > > scan_control *sc) return false; > > > } > > > > > > -#ifdef CONFIG_COMPACTION > > > -/* > > > - * If compaction is deferred for sc->order then scale the number > > > of pages > > > - * reclaimed based on the number of consecutive allocation > > > failures > > > - */ > > > -static unsigned long scale_for_compaction(unsigned long > > > pages_for_compaction, > > > - struct lruvec *lruvec, struct > > > scan_control *sc) -{ > > > - struct zone *zone = lruvec_zone(lruvec); > > > - > > > - if (zone->compact_order_failed <= sc->order) > > > - pages_for_compaction <<= > > > zone->compact_defer_shift; > > > - return pages_for_compaction; > > > -} > > > -#else > > > -static unsigned long scale_for_compaction(unsigned long > > > pages_for_compaction, > > > - struct lruvec *lruvec, struct > > > scan_control *sc) -{ > > > - return pages_for_compaction; > > > -} > > > -#endif > > > - > > > /* > > > * Reclaim/compaction is used for high-order allocation > > > requests. It reclaims > > > * order-0 pages before compacting the zone. > > > should_continue_reclaim() returns @@ -1829,9 +1807,6 @@ static > > > inline bool should_continue_reclaim(struct lruvec *lruvec, > > > * inactive lists are large enough, continue reclaiming > > > */ > > > pages_for_compaction = (2UL << sc->order); > > > - > > > - pages_for_compaction = > > > scale_for_compaction(pages_for_compaction, > > > - lruvec, sc); > > > inactive_lru_pages = get_lru_size(lruvec, > > > LRU_INACTIVE_FILE); if (nr_swap_pages > 0) > > > inactive_lru_pages += get_lru_size(lruvec, > > > LRU_INACTIVE_ANON); -- > > > > Even with this patch I see kswapd0 very often on top. Much more than > > with kernel 3.6. > > How severe is the CPU usage? The higher usage can be explained by "mm: > remove __GFP_NO_KSWAPD" which allows kswapd to compact memory to > reduce the amount of time processes spend in compaction but will > result in the CPU cost being incurred by kswapd. > > Is it really high like the bug was reporting with high usage over long > periods of time or do you just see it using 2-6% of CPU for short > periods? It is really high. I've seen with compile-jobs (make -j4 on dual core) kswapd0 consuming at least 50% CPU most time. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx145.postini.com [74.125.245.145]) by kanga.kvack.org (Postfix) with SMTP id 7B0A36B0072 for ; Fri, 16 Nov 2012 14:14:48 -0500 (EST) Received: by mail-vb0-f41.google.com with SMTP id v13so3877781vbk.14 for ; Fri, 16 Nov 2012 11:14:47 -0800 (PST) MIME-Version: 1.0 In-Reply-To: <20121112113731.GS8218@suse.de> References: <20121012135726.GY29125@suse.de> <507BDD45.1070705@suse.cz> <20121015110937.GE29125@suse.de> <5093A3F4.8090108@redhat.com> <5093A631.5020209@suse.cz> <509422C3.1000803@suse.cz> <509C84ED.8090605@linux.vnet.ibm.com> <509CB9D1.6060704@redhat.com> <20121109090635.GG8218@suse.de> <509F6C2A.9060502@redhat.com> <20121112113731.GS8218@suse.de> Date: Fri, 16 Nov 2012 14:14:47 -0500 Message-ID: Subject: Re: [PATCH] Revert "mm: remove __GFP_NO_KSWAPD" From: Josh Boyer Content-Type: text/plain; charset=ISO-8859-1 Sender: owner-linux-mm@kvack.org List-ID: To: Mel Gorman Cc: Zdenek Kabelac , Seth Jennings , Jiri Slaby , Valdis.Kletnieks@vt.edu, Jiri Slaby , linux-mm@kvack.org, LKML , Andrew Morton , Rik van Riel , Robert Jennings On Mon, Nov 12, 2012 at 6:37 AM, Mel Gorman wrote: > With "mm: vmscan: scale number of pages reclaimed by reclaim/compaction > based on failures" reverted, Zdenek Kabelac reported the following > > Hmm, so it's just took longer to hit the problem and observe > kswapd0 spinning on my CPU again - it's not as endless like before - > but still it easily eats minutes - it helps to turn off Firefox > or TB (memory hungry apps) so kswapd0 stops soon - and restart > those apps again. (And I still have like >1GB of cached memory) > > kswapd0 R running task 0 30 2 0x00000000 > ffff8801331efae8 0000000000000082 0000000000000018 0000000000000246 > ffff880135b9a340 ffff8801331effd8 ffff8801331effd8 ffff8801331effd8 > ffff880055dfa340 ffff880135b9a340 00000000331efad8 ffff8801331ee000 > Call Trace: > [] preempt_schedule+0x42/0x60 > [] _raw_spin_unlock+0x55/0x60 > [] put_super+0x31/0x40 > [] drop_super+0x22/0x30 > [] prune_super+0x149/0x1b0 > [] shrink_slab+0xba/0x510 > > The sysrq+m indicates the system has no swap so it'll never reclaim > anonymous pages as part of reclaim/compaction. That is one part of the > problem but not the root cause as file-backed pages could also be reclaimed. > > The likely underlying problem is that kswapd is woken up or kept awake > for each THP allocation request in the page allocator slow path. > > If compaction fails for the requesting process then compaction will be > deferred for a time and direct reclaim is avoided. However, if there > are a storm of THP requests that are simply rejected, it will still > be the the case that kswapd is awake for a prolonged period of time > as pgdat->kswapd_max_order is updated each time. This is noticed by > the main kswapd() loop and it will not call kswapd_try_to_sleep(). > Instead it will loopp, shrinking a small number of pages and calling > shrink_slab() on each iteration. > > The temptation is to supply a patch that checks if kswapd was woken for > THP and if so ignore pgdat->kswapd_max_order but it'll be a hack and not > backed up by proper testing. As 3.7 is very close to release and this is > not a bug we should release with, a safer path is to revert "mm: remove > __GFP_NO_KSWAPD" for now and revisit it with the view to ironing out the > balance_pgdat() logic in general. > > Signed-off-by: Mel Gorman Does anyone know if this is queued to go into 3.7 somewhere? I looked a bit and can't find it in a tree. We have a few reports of Fedora rawhide users hitting this. josh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx204.postini.com [74.125.245.204]) by kanga.kvack.org (Postfix) with SMTP id 3E0926B0074 for ; Fri, 16 Nov 2012 14:51:31 -0500 (EST) Date: Fri, 16 Nov 2012 11:51:24 -0800 From: Andrew Morton Subject: Re: [PATCH] Revert "mm: remove __GFP_NO_KSWAPD" Message-Id: <20121116115124.c2981abc.akpm@linux-foundation.org> In-Reply-To: References: <20121012135726.GY29125@suse.de> <507BDD45.1070705@suse.cz> <20121015110937.GE29125@suse.de> <5093A3F4.8090108@redhat.com> <5093A631.5020209@suse.cz> <509422C3.1000803@suse.cz> <509C84ED.8090605@linux.vnet.ibm.com> <509CB9D1.6060704@redhat.com> <20121109090635.GG8218@suse.de> <509F6C2A.9060502@redhat.com> <20121112113731.GS8218@suse.de> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Josh Boyer Cc: Mel Gorman , Zdenek Kabelac , Seth Jennings , Jiri Slaby , Valdis.Kletnieks@vt.edu, Jiri Slaby , linux-mm@kvack.org, LKML , Rik van Riel , Robert Jennings On Fri, 16 Nov 2012 14:14:47 -0500 Josh Boyer wrote: > > The temptation is to supply a patch that checks if kswapd was woken for > > THP and if so ignore pgdat->kswapd_max_order but it'll be a hack and not > > backed up by proper testing. As 3.7 is very close to release and this is > > not a bug we should release with, a safer path is to revert "mm: remove > > __GFP_NO_KSWAPD" for now and revisit it with the view to ironing out the > > balance_pgdat() logic in general. > > > > Signed-off-by: Mel Gorman > > Does anyone know if this is queued to go into 3.7 somewhere? I looked > a bit and can't find it in a tree. We have a few reports of Fedora > rawhide users hitting this. Still thinking about it. We're reverting quite a lot of material lately. mm-revert-mm-vmscan-scale-number-of-pages-reclaimed-by-reclaim-compaction-based-on-failures.patch and revert-mm-fix-up-zone-present-pages.patch are queued for 3.7. I'll toss this one in there as well, but I can't say I'm feeling terribly confident. How is Valdis's machine nowadays? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx205.postini.com [74.125.245.205]) by kanga.kvack.org (Postfix) with SMTP id 24DDE6B0078 for ; Fri, 16 Nov 2012 15:06:22 -0500 (EST) Date: Fri, 16 Nov 2012 20:06:17 +0000 From: Mel Gorman Subject: Re: [PATCH] Revert "mm: remove __GFP_NO_KSWAPD" Message-ID: <20121116200616.GK8218@suse.de> References: <20121015110937.GE29125@suse.de> <5093A3F4.8090108@redhat.com> <5093A631.5020209@suse.cz> <509422C3.1000803@suse.cz> <509C84ED.8090605@linux.vnet.ibm.com> <509CB9D1.6060704@redhat.com> <20121109090635.GG8218@suse.de> <509F6C2A.9060502@redhat.com> <20121112113731.GS8218@suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: Josh Boyer Cc: Zdenek Kabelac , Seth Jennings , Jiri Slaby , Valdis.Kletnieks@vt.edu, Jiri Slaby , linux-mm@kvack.org, LKML , Andrew Morton , Rik van Riel , Robert Jennings On Fri, Nov 16, 2012 at 02:14:47PM -0500, Josh Boyer wrote: > On Mon, Nov 12, 2012 at 6:37 AM, Mel Gorman wrote: > > With "mm: vmscan: scale number of pages reclaimed by reclaim/compaction > > based on failures" reverted, Zdenek Kabelac reported the following > > > > Hmm, so it's just took longer to hit the problem and observe > > kswapd0 spinning on my CPU again - it's not as endless like before - > > but still it easily eats minutes - it helps to turn off Firefox > > or TB (memory hungry apps) so kswapd0 stops soon - and restart > > those apps again. (And I still have like >1GB of cached memory) > > > > kswapd0 R running task 0 30 2 0x00000000 > > ffff8801331efae8 0000000000000082 0000000000000018 0000000000000246 > > ffff880135b9a340 ffff8801331effd8 ffff8801331effd8 ffff8801331effd8 > > ffff880055dfa340 ffff880135b9a340 00000000331efad8 ffff8801331ee000 > > Call Trace: > > [] preempt_schedule+0x42/0x60 > > [] _raw_spin_unlock+0x55/0x60 > > [] put_super+0x31/0x40 > > [] drop_super+0x22/0x30 > > [] prune_super+0x149/0x1b0 > > [] shrink_slab+0xba/0x510 > > > > The sysrq+m indicates the system has no swap so it'll never reclaim > > anonymous pages as part of reclaim/compaction. That is one part of the > > problem but not the root cause as file-backed pages could also be reclaimed. > > > > The likely underlying problem is that kswapd is woken up or kept awake > > for each THP allocation request in the page allocator slow path. > > > > If compaction fails for the requesting process then compaction will be > > deferred for a time and direct reclaim is avoided. However, if there > > are a storm of THP requests that are simply rejected, it will still > > be the the case that kswapd is awake for a prolonged period of time > > as pgdat->kswapd_max_order is updated each time. This is noticed by > > the main kswapd() loop and it will not call kswapd_try_to_sleep(). > > Instead it will loopp, shrinking a small number of pages and calling > > shrink_slab() on each iteration. > > > > The temptation is to supply a patch that checks if kswapd was woken for > > THP and if so ignore pgdat->kswapd_max_order but it'll be a hack and not > > backed up by proper testing. As 3.7 is very close to release and this is > > not a bug we should release with, a safer path is to revert "mm: remove > > __GFP_NO_KSWAPD" for now and revisit it with the view to ironing out the > > balance_pgdat() logic in general. > > > > Signed-off-by: Mel Gorman > > Does anyone know if this is queued to go into 3.7 somewhere? I looked > a bit and can't find it in a tree. We have a few reports of Fedora > rawhide users hitting this. > No, because I was waiting to hear if a) it worked and preferably if the alternative "less safe" option worked. This close to release it might be better to just go with the safe option. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx160.postini.com [74.125.245.160]) by kanga.kvack.org (Postfix) with SMTP id 1418B6B005D for ; Sun, 18 Nov 2012 14:00:51 -0500 (EST) Message-ID: <50A9304E.3020205@redhat.com> Date: Sun, 18 Nov 2012 20:00:30 +0100 From: Zdenek Kabelac MIME-Version: 1.0 Subject: Re: kswapd0: excessive CPU usage References: <20121015110937.GE29125@suse.de> <5093A3F4.8090108@redhat.com> <5093A631.5020209@suse.cz> <509422C3.1000803@suse.cz> <509C84ED.8090605@linux.vnet.ibm.com> <509CB9D1.6060704@redhat.com> <20121109090635.GG8218@suse.de> <509F6C2A.9060502@redhat.com> <20121112121956.GT8218@suse.de> <50A0F5F0.6090400@redhat.com> <20121112133139.GU8218@suse.de> In-Reply-To: <20121112133139.GU8218@suse.de> Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Mel Gorman Cc: Seth Jennings , Jiri Slaby , Valdis.Kletnieks@vt.edu, Jiri Slaby , linux-mm@kvack.org, LKML , Andrew Morton , Rik van Riel , Robert Jennings Dne 12.11.2012 14:31, Mel Gorman napsal(a): > On Mon, Nov 12, 2012 at 02:13:20PM +0100, Zdenek Kabelac wrote: >> Dne 12.11.2012 13:19, Mel Gorman napsal(a): >>> On Sun, Nov 11, 2012 at 10:13:14AM +0100, Zdenek Kabelac wrote: >>>> Hmm, so it's just took longer to hit the problem and observe kswapd0 >>>> spinning on my CPU again - it's not as endless like before - but >>>> still it easily eats minutes - it helps to turn off Firefox or TB >>>> (memory hungry apps) so kswapd0 stops soon - and restart those apps >>>> again. >>>> (And I still have like >1GB of cached memory) >>>> >>> >>> I posted a "safe" patch that I believe explains why you are seeing what >>> you are seeing. It does mean that there will still be some stalls due to >>> THP because kswapd is not helping and it's avoiding the problem rather >>> than trying to deal with it. >>> >>> Hence, I'm also going to post this patch even though I have not tested >>> it myself. If you find it fixes the problem then it would be a >>> preferable patch to the revert. It still is the case that the >>> balance_pgdat() logic is in sort need of a rethink as it's pretty >>> twisted right now. >>> >> >> >> Should I apply them all together for 3.7-rc5 ? >> >> 1) https://lkml.org/lkml/2012/11/5/308 >> 2) https://lkml.org/lkml/2012/11/12/113 >> 3) https://lkml.org/lkml/2012/11/12/151 >> > > Not all together. Test either 1+2 or 1+3. 1+2 is the safer choice but > does nothing about THP stalls. 1+3 is a riskier version but depends on > me being correct on what the root cause of the problem you see it. > > If both 1+2 and 1+3 work for you, I'd choose 1+3 for merging. If you only > have the time to test one combination then it would be preferred that you > test the safe option of 1+2. So I've tested 1+2 for a few days - once I've rebooted for another reason, but today happened this to me (with ~2day uptime) For some reason my machine went ouf of memory and OOM killed firefox and then even whole Xsession. Unsure whether it's related to those 2 patches - but I've never had such OOM failure before. Should I experiment now with 1+3 - or is there newer thing to test ? Zdenek X: page allocation failure: order:0, mode:0x200da Pid: 1126, comm: X Not tainted 3.7.0-rc5-00007-g95e21c5 #100 Call Trace: [] warn_alloc_failed+0xe9/0x140 [] __alloc_pages_nodemask+0x7fa/0xa40 [] shmem_getpage_gfp+0x603/0x9d0 [] ? native_sched_clock+0x26/0x90 [] shmem_fault+0x4f/0xa0 [] shm_fault+0x1e/0x20 [] __do_fault+0x73/0x4d0 [] ? generic_file_aio_write+0xb0/0x100 [] handle_pte_fault+0x97/0x9a0 [] ? __lock_is_held+0x5f/0x90 [] ? get_parent_ip+0x11/0x50 rsyslogd invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0 rsyslogd cpuset=/ mems_allowed=0 Pid: 571, comm: rsyslogd Not tainted 3.7.0-rc5-00007-g95e21c5 #100 Call Trace: [] dump_header.isra.12+0x78/0x224 [] ? sub_preempt_count+0x79/0xd0 [] ? _raw_spin_unlock_irqrestore+0x42/0x80 [] ? ___ratelimit+0x9e/0x130 [] oom_kill_process+0x1d3/0x330 [] out_of_memory+0x439/0x4a0 [] __alloc_pages_nodemask+0x976/0xa40 [] ? find_get_page+0x5/0x230 [] filemap_fault+0x2d0/0x480 [] __do_fault+0x73/0x4d0 [] handle_pte_fault+0x97/0x9a0 [] ? __lock_is_held+0x5f/0x90 [] ? get_parent_ip+0x11/0x50 [] handle_mm_fault+0x22f/0x2f0 [] __do_page_fault+0x15d/0x4e0 [] ? sub_preempt_count+0x79/0xd0 [] ? _raw_spin_unlock+0x35/0x60 [] ? proc_reg_read+0x8c/0xc0 [] ? error_sti+0x5/0x6 [] ? trace_hardirqs_off_thunk+0x3a/0x3c [] do_page_fault+0xe/0x10 [] page_fault+0x22/0x30 Mem-Info: DMA per-cpu: CPU 0: hi: 0, btch: 1 usd: 0 CPU 1: hi: 0, btch: 1 usd: 0 DMA32 per-cpu: CPU 0: hi: 186, btch: 31 usd: 30 CPU 1: hi: 186, btch: 31 usd: 6 Normal per-cpu: CPU 0: hi: 186, btch: 31 usd: 30 CPU 1: hi: 186, btch: 31 usd: 0 active_anon:900420 inactive_anon:28835 isolated_anon:0 active_file:43 inactive_file:21 isolated_file:0 unevictable:4 dirty:34 writeback:2 unstable:0 free:20731 slab_reclaimable:8641 slab_unreclaimable:10446 mapped:18325 shmem:243662 pagetables:7705 bounce:0 free_cma:0 DMA free:12120kB min:272kB low:340kB high:408kB active_anon:2892kB inactive_anon:872kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15900kB mlocked:0kB dirty:0kB writeback:0kB mapped:1672kB shmem:3596kB slab_reclaimable:0kB slab_unreclaimable:16kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes lowmem_reserve[]: 0 2951 3836 3836 DMA32 free:55296kB min:51776kB low:64720kB high:77664kB active_anon:2834992kB inactive_anon:107924kB active_file:92kB inactive_file:52kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3021968kB mlocked:0kB dirty:88kB writeback:0kB mapped:65460kB shmem:943100kB slab_reclaimable:11700kB slab_unreclaimable:8968kB kernel_stack:592kB pagetables:11852kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:180 all_unreclaimable? yes lowmem_reserve[]: 0 0 885 885 Normal free:15508kB min:15532kB low:19412kB high:23296kB active_anon:763796kB inactive_anon:6544kB active_file:80kB inactive_file:32kB unevictable:16kB isolated(anon):0kB isolated(file):0kB present:906664kB mlocked:16kB dirty:48kB writeback:52kB mapped:6168kB shmem:27952kB slab_reclaimable:22864kB slab_unreclaimable:32800kB kernel_stack:2568kB pagetables:18968kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:234 all_unreclaimable? yes lowmem_reserve[]: 0 0 0 0 DMA: 2*4kB 2*8kB 2*16kB 1*32kB 2*64kB 1*128kB 2*256kB 2*512kB 2*1024kB 2*2048kB 1*4096kB = 12120kB DMA32: 900*4kB 1512*8kB 513*16kB 635*32kB 109*64kB 8*128kB 0*256kB 0*512kB 1*1024kB 1*2048kB 0*4096kB = 55296kB Normal: 452*4kB 363*8kB 225*16kB 139*32kB 30*64kB 4*128kB 1*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 17496kB 243783 total pagecache pages 0 pages in swap cache Swap cache stats: add 0, delete 0, find 0/0 Free swap = 0kB Total swap = 0kB 1032176 pages RAM 42789 pages reserved 553592 pages shared 943414 pages non-shared [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [ 351] 0 351 74685 1679 154 0 0 systemd-journal [ 544] 0 544 5863 107 16 0 0 bluetoothd [ 545] 0 545 88977 725 56 0 0 NetworkManager [ 546] 0 546 30170 158 15 0 0 crond [ 552] 0 552 1879 28 8 0 0 gpm [ 557] 0 557 1092 37 8 0 0 acpid [ 564] 81 564 6361 373 16 0 -900 dbus-daemon [ 566] 0 566 61331 155 22 0 0 rsyslogd [ 567] 498 567 7026 104 19 0 0 avahi-daemon [ 568] 498 568 6994 59 17 0 0 avahi-daemon [ 573] 0 573 1758 33 9 0 0 mcelog [ 578] 0 578 5925 51 16 0 0 atd [ 586] 105 586 121536 4270 56 0 0 polkitd [ 593] 0 593 21967 205 48 0 -900 modem-manager [ 601] 0 601 1087 26 8 0 0 thinkfan [ 619] 0 619 122722 1085 129 0 0 libvirtd [ 630] 32 630 4812 68 13 0 0 rpcbind [ 633] 0 633 20080 199 43 0 -1000 sshd [ 653] 29 653 5905 116 16 0 0 rpc.statd [ 700] 0 700 13173 190 28 0 0 wpa_supplicant [ 719] 0 719 4810 50 14 0 0 rpc.idmapd [ 730] 0 730 28268 36 10 0 0 rpc.rquotad [ 766] 0 766 6030 153 15 0 0 rpc.mountd [ 806] 99 806 3306 45 11 0 0 dnsmasq [ 985] 0 985 21219 150 46 0 0 login [ 988] 0 988 260408 355 48 0 0 console-kit-dae [ 1053] 11641 1053 28706 241 14 0 0 bash [ 1097] 11641 1097 27972 58 10 0 0 startx [ 1125] 11641 1125 3487 48 13 0 0 xinit [ 1126] 11641 1126 80028 35289 154 0 0 X [ 1138] 11641 1138 142989 930 122 0 0 gnome-session [ 1151] 11641 1151 4013 64 12 0 0 dbus-launch [ 1152] 11641 1152 6069 82 17 0 0 dbus-daemon [ 1154] 11641 1154 85449 162 36 0 0 at-spi-bus-laun [ 1158] 11641 1158 6103 116 17 0 0 dbus-daemon [ 1161] 11641 1161 32328 174 33 0 0 at-spi2-registr [ 1172] 11641 1172 4013 65 13 0 0 dbus-launch [ 1173] 11641 1173 6350 265 18 0 0 dbus-daemon [ 1177] 11641 1177 37416 416 29 0 0 gconfd-2 [ 1184] 11641 1184 117556 1203 44 0 0 gnome-keyring-d [ 1185] 11641 1185 224829 2236 177 0 0 gnome-settings- [ 1194] 0 1194 57227 786 46 0 0 upowerd [ 1226] 11641 1226 77392 190 36 0 0 gvfsd [ 1246] 11641 1246 118201 772 90 0 0 pulseaudio [ 1247] 496 1247 41161 59 17 0 0 rtkit-daemon [ 1252] 11641 1252 29494 205 58 0 0 gconf-helper [ 1253] 106 1253 81296 355 46 0 0 colord [ 1257] 11641 1257 59080 1574 60 0 0 openbox [ 1258] 11641 1258 185569 3216 146 0 0 gnome-panel [ 1264] 11641 1264 64102 229 27 0 0 dconf-service [ 1268] 11641 1268 139203 858 116 0 0 gnome-user-shar [ 1269] 11641 1269 268645 27442 334 0 0 pidgin [ 1270] 11641 1270 142642 1064 117 0 0 bluetooth-apple [ 1271] 11641 1271 193218 1775 175 0 0 nm-applet [ 1272] 11641 1272 220194 1810 138 0 0 gnome-sound-app [ 1285] 11641 1285 80914 632 45 0 0 gvfs-udisks2-vo [ 1287] 0 1287 88101 599 41 0 0 udisksd [ 1295] 11641 1295 177162 14140 150 0 0 wnck-applet [ 1297] 11641 1297 281043 3161 199 0 0 clock-applet [ 1299] 11641 1299 142537 1053 120 0 0 cpufreq-applet [ 1302] 11641 1302 141960 986 113 0 0 notification-ar [ 1340] 11641 1340 190026 6265 144 0 0 gnome-terminal [ 1346] 11641 1346 2123 35 10 0 0 gnome-pty-helpe [ 1347] 11641 1347 28719 253 11 0 0 bash [ 1858] 11641 1858 10895 101 27 0 0 xfconfd [ 2052] 11641 2052 28720 255 11 0 0 bash [ 6239] 11641 6239 73437 711 88 0 0 kdeinit4 [ 6240] 11641 6240 83952 717 101 0 0 klauncher [ 6242] 11641 6242 126497 1479 172 0 0 kded4 [ 6244] 11641 6244 2977 48 11 0 0 gam_server [10804] 11641 10804 101320 307 47 0 0 gvfsd-http [12175] 0 12175 27197 32 10 0 0 agetty [12249] 11641 12249 28719 252 14 0 0 bash [14862] 0 14862 51773 344 55 0 0 cupsd [14868] 4 14868 18105 158 39 0 0 cups-polld [16728] 11641 16728 28691 244 12 0 0 bash [16975] 0 16975 9109 253 23 0 -1000 systemd-udevd [17618] 0 17618 8245 87 22 0 0 systemd-logind [ 3133] 11641 3133 43721 132 40 0 0 su [ 3136] 0 3136 28564 139 12 0 0 bash [ 3983] 11641 3983 43722 134 41 0 0 su [ 3986] 0 3986 28564 144 13 0 0 bash [16350] 11641 16350 28691 245 14 0 0 bash [31228] 11641 31228 28691 245 11 0 0 bash [31922] 11641 31922 28719 250 13 0 0 bash [ 2340] 11641 2340 28691 245 15 0 0 bash [12586] 38 12586 7851 150 19 0 0 ntpd [32658] 11641 32658 41192 424 35 0 0 mc [32660] 11641 32660 28692 245 13 0 0 bash [29193] 11641 29193 713846 414344 1614 0 0 firefox [10971] 11641 10971 43722 133 43 0 0 su [10974] 0 10974 28564 132 12 0 0 bash [11343] 0 11343 28497 66 11 0 0 ksmtuned [11387] 11641 11387 28719 254 11 0 0 bash [11450] 11641 11450 28691 246 13 0 0 bash [11576] 11641 11576 43722 133 40 0 0 su [11579] 0 11579 28564 141 13 0 0 bash [12106] 11641 12106 28691 244 12 0 0 bash [12141] 11641 12141 43722 132 44 0 0 su [12144] 0 12144 28564 140 11 0 0 bash [12264] 11641 12264 28691 245 11 0 0 bash [12299] 11641 12299 43721 133 40 0 0 su [12302] 0 12302 28564 137 12 0 0 bash [26024] 11641 26024 28691 245 13 0 0 bash [26083] 11641 26083 28691 245 13 0 0 bash [28235] 11641 28235 43721 132 42 0 0 su [28238] 0 28238 28564 143 13 0 0 bash [29460] 11641 29460 43721 132 42 0 0 su [29463] 0 29463 28564 137 12 0 0 bash [29758] 11641 29758 28720 256 12 0 0 bash [29864] 11641 29864 41916 1153 36 0 0 mc [29866] 11641 29866 28728 257 11 0 0 bash [32750] 0 32750 23164 2994 47 0 0 dhclient [ 323] 0 323 24081 471 48 0 0 sendmail [ 347] 51 347 20347 367 38 0 0 sendmail [ 907] 11641 907 379562 159766 707 0 0 thunderbird [ 6340] 11641 6340 28719 251 12 0 0 bash [ 6790] 11641 6790 80307 620 101 0 0 xfce4-notifyd [ 6844] 0 6844 26669 23 9 0 0 sleep Out of memory: Kill process 29193 (firefox) score 420 or sacrifice child Killed process 29193 (firefox) total-vm:2855384kB, anon-rss:1653868kB, file-rss:3508kB [] handle_mm_fault+0x22f/0x2f0 [] __get_user_pages+0x12a/0x530 [] get_dump_page+0x45/0x60 [] elf_core_dump+0x16bd/0x1960 [] ? elf_core_dump+0x9d6/0x1960 [] ? sub_preempt_count+0x79/0xd0 [] ? mutex_unlock+0xe/0x10 [] ? do_truncate+0x73/0xa0 [] do_coredump+0xa21/0xeb0 [] ? debug_check_no_locks_freed+0xe0/0x170 [] ? trace_hardirqs_off+0xd/0x10 [] get_signal_to_deliver+0x2e1/0x960 [] do_signal+0x3f/0x9a0 [] ? pci_fixup_msi_k8t_onboard_sound+0x7d/0x97 [] ? is_prefetch.isra.15+0x1a6/0x1fd [] ? error_sti+0x5/0x6 [] ? retint_signal+0x11/0x90 [] do_notify_resume+0x80/0xb0 [] retint_signal+0x46/0x90 Mem-Info: DMA per-cpu: CPU 0: hi: 0, btch: 1 usd: 0 CPU 1: hi: 0, btch: 1 usd: 0 DMA32 per-cpu: CPU 0: hi: 186, btch: 31 usd: 0 CPU 1: hi: 186, btch: 31 usd: 30 Normal per-cpu: CPU 0: hi: 186, btch: 31 usd: 0 CPU 1: hi: 186, btch: 31 usd: 30 active_anon:900420 inactive_anon:28835 isolated_anon:0 active_file:8 inactive_file:0 isolated_file:0 unevictable:4 dirty:34 writeback:2 unstable:0 free:20724 slab_reclaimable:8641 slab_unreclaimable:10446 mapped:18325 shmem:243662 pagetables:7705 bounce:0 free_cma:0 DMA free:12120kB min:272kB low:340kB high:408kB active_anon:2892kB inactive_anon:872kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15900kB mlocked:0kB dirty:0kB writeback:0kB mapped:1672kB shmem:3596kB slab_reclaimable:0kB slab_unreclaimable:16kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes lowmem_reserve[]: 0 2951 3836 3836 DMA32 free:55404kB min:51776kB low:64720kB high:77664kB active_anon:2834992kB inactive_anon:107924kB active_file:0kB inactive_file:28kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3021968kB mlocked:0kB dirty:0kB writeback:0kB mapped:65460kB shmem:943100kB slab_reclaimable:11700kB slab_unreclaimable:8968kB kernel_stack:592kB pagetables:11852kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:129 all_unreclaimable? yes lowmem_reserve[]: 0 0 885 885 Normal free:15364kB min:15532kB low:19412kB high:23296kB active_anon:763796kB inactive_anon:6544kB active_file:0kB inactive_file:24kB unevictable:16kB isolated(anon):0kB isolated(file):0kB present:906664kB mlocked:16kB dirty:48kB writeback:52kB mapped:6168kB shmem:27952kB slab_reclaimable:22864kB slab_unreclaimable:32800kB kernel_stack:2568kB pagetables:18968kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:379 all_unreclaimable? yes lowmem_reserve[]: 0 0 0 0 DMA: 2*4kB 2*8kB 2*16kB 1*32kB 2*64kB 1*128kB 2*256kB 2*512kB 2*1024kB 2*2048kB 1*4096kB = 12120kB DMA32: 896*4kB 1512*8kB 513*16kB 635*32kB 109*64kB 8*128kB 0*256kB 0*512kB 1*1024kB 1*2048kB 0*4096kB = 55280kB Normal: 403*4kB 377*8kB 225*16kB 139*32kB 30*64kB 4*128kB 1*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 17412kB 243733 total pagecache pages rsyslogd invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0 rsyslogd cpuset=/ mems_allowed=0 Pid: 571, comm: rsyslogd Not tainted 3.7.0-rc5-00007-g95e21c5 #100 Call Trace: [] dump_header.isra.12+0x78/0x224 [] ? sub_preempt_count+0x79/0xd0 [] ? _raw_spin_unlock_irqrestore+0x42/0x80 [] ? ___ratelimit+0x9e/0x130 [] oom_kill_process+0x1d3/0x330 [] out_of_memory+0x439/0x4a0 [] __alloc_pages_nodemask+0x976/0xa40 [] ? find_get_page+0x5/0x230 [] filemap_fault+0x2d0/0x480 [] __do_fault+0x73/0x4d0 [] handle_pte_fault+0x97/0x9a0 [] ? __lock_is_held+0x5f/0x90 [] ? get_parent_ip+0x11/0x50 [] handle_mm_fault+0x22f/0x2f0 [] __do_page_fault+0x15d/0x4e0 [] ? _raw_spin_unlock+0x35/0x60 [] ? proc_reg_read+0x8c/0xc0 [] ? error_sti+0x5/0x6 [] ? trace_hardirqs_off_thunk+0x3a/0x3c [] do_page_fault+0xe/0x10 [] page_fault+0x22/0x30 Mem-Info: DMA per-cpu: CPU 0: hi: 0, btch: 1 usd: 0 CPU 1: hi: 0, btch: 1 usd: 0 DMA32 per-cpu: CPU 0: hi: 186, btch: 31 usd: 0 CPU 1: hi: 186, btch: 31 usd: 30 Normal per-cpu: CPU 0: hi: 186, btch: 31 usd: 1 CPU 1: hi: 186, btch: 31 usd: 46 active_anon:900420 inactive_anon:28835 isolated_anon:0 active_file:0 inactive_file:7 isolated_file:0 unevictable:4 dirty:0 writeback:2 unstable:0 free:20691 slab_reclaimable:8641 slab_unreclaimable:10446 mapped:18325 shmem:243662 pagetables:7705 bounce:0 free_cma:0 DMA free:12120kB min:272kB low:340kB high:408kB active_anon:2892kB inactive_anon:872kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15900kB mlocked:0kB dirty:0kB writeback:0kB mapped:1672kB shmem:3596kB slab_reclaimable:0kB slab_unreclaimable:16kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes lowmem_reserve[]: 0 2951 3836 3836 DMA32 free:55280kB min:51776kB low:64720kB high:77664kB active_anon:2834992kB inactive_anon:107924kB active_file:0kB inactive_file:12kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3021968kB mlocked:0kB dirty:0kB writeback:0kB mapped:65460kB shmem:943100kB slab_reclaimable:11700kB slab_unreclaimable:8968kB kernel_stack:592kB pagetables:11852kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:520 all_unreclaimable? yes lowmem_reserve[]: 0 0 885 885 Normal free:15364kB min:15532kB low:19412kB high:23296kB active_anon:763796kB inactive_anon:6544kB active_file:0kB inactive_file:16kB unevictable:16kB isolated(anon):0kB isolated(file):0kB present:906664kB mlocked:16kB dirty:48kB writeback:52kB mapped:6168kB shmem:27952kB slab_reclaimable:22864kB slab_unreclaimable:32800kB kernel_stack:2568kB pagetables:18968kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:571 all_unreclaimable? yes lowmem_reserve[]: 0 0 0 0 DMA: 2*4kB 2*8kB 2*16kB 1*32kB 2*64kB 1*128kB 2*256kB 2*512kB 2*1024kB 2*2048kB 1*4096kB = 12120kB DMA32: 896*4kB 1512*8kB 513*16kB 635*32kB 109*64kB 8*128kB 0*256kB 0*512kB 1*1024kB 1*2048kB 0*4096kB = 55280kB Normal: 403*4kB 377*8kB 225*16kB 139*32kB 30*64kB 4*128kB 1*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 17412kB 243733 total pagecache pages 0 pages in swap cache Swap cache stats: add 0, delete 0, find 0/0 Free swap = 0kB Total swap = 0kB 0 pages in swap cache Swap cache stats: add 0, delete 0, find 0/0 Free swap = 0kB Total swap = 0kB 1032176 pages RAM 42789 pages reserved 553579 pages shared 943538 pages non-shared 1032176 pages RAM 42789 pages reserved 553576 pages shared 943549 pages non-shared [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [ 351] 0 351 74685 1682 154 0 0 systemd-journal [ 544] 0 544 5863 107 16 0 0 bluetoothd [ 545] 0 545 88977 725 56 0 0 NetworkManager [ 546] 0 546 30170 158 15 0 0 crond [ 552] 0 552 1879 28 8 0 0 gpm [ 557] 0 557 1092 37 8 0 0 acpid [ 564] 81 564 6361 373 16 0 -900 dbus-daemon [ 566] 0 566 61331 155 22 0 0 rsyslogd [ 567] 498 567 7026 104 19 0 0 avahi-daemon [ 568] 498 568 6994 59 17 0 0 avahi-daemon [ 573] 0 573 1758 33 9 0 0 mcelog [ 578] 0 578 5925 51 16 0 0 atd [ 586] 105 586 121536 4270 56 0 0 polkitd [ 593] 0 593 21967 205 48 0 -900 modem-manager [ 601] 0 601 1087 26 8 0 0 thinkfan [ 619] 0 619 122722 1085 129 0 0 libvirtd [ 630] 32 630 4812 68 13 0 0 rpcbind [ 633] 0 633 20080 199 43 0 -1000 sshd [ 653] 29 653 5905 116 16 0 0 rpc.statd [ 700] 0 700 13173 190 28 0 0 wpa_supplicant [ 719] 0 719 4810 50 14 0 0 rpc.idmapd [ 730] 0 730 28268 36 10 0 0 rpc.rquotad [ 766] 0 766 6030 153 15 0 0 rpc.mountd [ 806] 99 806 3306 45 11 0 0 dnsmasq [ 985] 0 985 21219 150 46 0 0 login [ 988] 0 988 260408 355 48 0 0 console-kit-dae [ 1053] 11641 1053 28706 241 14 0 0 bash [ 1097] 11641 1097 27972 58 10 0 0 startx [ 1125] 11641 1125 3487 48 13 0 0 xinit [ 1126] 11641 1126 80028 35379 154 0 0 X [ 1138] 11641 1138 142989 930 122 0 0 gnome-session [ 1151] 11641 1151 4013 64 12 0 0 dbus-launch [ 1152] 11641 1152 6069 82 17 0 0 dbus-daemon [ 1154] 11641 1154 85449 162 36 0 0 at-spi-bus-laun [ 1158] 11641 1158 6103 116 17 0 0 dbus-daemon [ 1161] 11641 1161 32328 174 33 0 0 at-spi2-registr [ 1172] 11641 1172 4013 65 13 0 0 dbus-launch [ 1173] 11641 1173 6350 265 18 0 0 dbus-daemon [ 1177] 11641 1177 37416 416 29 0 0 gconfd-2 [ 1184] 11641 1184 117556 1203 44 0 0 gnome-keyring-d [ 1185] 11641 1185 224829 2236 177 0 0 gnome-settings- [ 1194] 0 1194 57227 786 46 0 0 upowerd [ 1226] 11641 1226 77392 190 36 0 0 gvfsd [ 1246] 11641 1246 118201 772 90 0 0 pulseaudio [ 1247] 496 1247 41161 59 17 0 0 rtkit-daemon [ 1252] 11641 1252 29494 205 58 0 0 gconf-helper [ 1253] 106 1253 81296 355 46 0 0 colord [ 1257] 11641 1257 59080 1574 60 0 0 openbox [ 1258] 11641 1258 185569 3216 146 0 0 gnome-panel [ 1264] 11641 1264 64102 229 27 0 0 dconf-service [ 1268] 11641 1268 139203 858 116 0 0 gnome-user-shar [ 1269] 11641 1269 268645 27442 334 0 0 pidgin [ 1270] 11641 1270 142642 1064 117 0 0 bluetooth-apple [ 1271] 11641 1271 193218 1775 175 0 0 nm-applet [ 1272] 11641 1272 220194 1810 138 0 0 gnome-sound-app [ 1285] 11641 1285 80914 632 45 0 0 gvfs-udisks2-vo [ 1287] 0 1287 88101 599 41 0 0 udisksd [ 1295] 11641 1295 177162 14140 150 0 0 wnck-applet [ 1297] 11641 1297 281043 3161 199 0 0 clock-applet [ 1299] 11641 1299 142537 1051 120 0 0 cpufreq-applet [ 1302] 11641 1302 141960 986 113 0 0 notification-ar [ 1340] 11641 1340 190026 6265 144 0 0 gnome-terminal [ 1346] 11641 1346 2123 35 10 0 0 gnome-pty-helpe [ 1347] 11641 1347 28719 253 11 0 0 bash [ 1858] 11641 1858 10895 101 27 0 0 xfconfd X: page allocation failure: order:0, mode:0x200da Pid: 1126, comm: X Not tainted 3.7.0-rc5-00007-g95e21c5 #100 [ 2052] 11641 2052 28720 255 11 0 0 bash [ 6239] 11641 6239 73437 711 88 0 0 kdeinit4 [ 6240] 11641 6240 83952 717 101 0 0 klauncher Call Trace: [] warn_alloc_failed+0xe9/0x140 [] __alloc_pages_nodemask+0x7fa/0xa40 [] shmem_getpage_gfp+0x603/0x9d0 [] ? native_sched_clock+0x26/0x90 [] shmem_fault+0x4f/0xa0 [] shm_fault+0x1e/0x20 [] __do_fault+0x73/0x4d0 [] ? generic_file_aio_write+0xb0/0x100 [] handle_pte_fault+0x97/0x9a0 [] ? __lock_is_held+0x5f/0x90 [] ? get_parent_ip+0x11/0x50 [] handle_mm_fault+0x22f/0x2f0 [] __get_user_pages+0x12a/0x530 [] get_dump_page+0x45/0x60 [] elf_core_dump+0x16bd/0x1960 [] ? elf_core_dump+0x9d6/0x1960 [] ? sub_preempt_count+0x79/0xd0 [] ? mutex_unlock+0xe/0x10 [] ? do_truncate+0x73/0xa0 [] do_coredump+0xa21/0xeb0 [] ? debug_check_no_locks_freed+0xe0/0x170 [] ? trace_hardirqs_off+0xd/0x10 [] get_signal_to_deliver+0x2e1/0x960 [] do_signal+0x3f/0x9a0 [] ? pci_fixup_msi_k8t_onboard_sound+0x7d/0x97 [] ? is_prefetch.isra.15+0x1a6/0x1fd [] ? error_sti+0x5/0x6 [] ? retint_signal+0x11/0x90 [] do_notify_resume+0x80/0xb0 [] retint_signal+0x46/0x90 Mem-Info: DMA per-cpu: CPU 0: hi: 0, btch: 1 usd: 0 CPU 1: hi: 0, btch: 1 usd: 0 DMA32 per-cpu: CPU 0: hi: 186, btch: 31 usd: 0 CPU 1: hi: 186, btch: 31 usd: 0 Normal per-cpu: CPU 0: hi: 186, btch: 31 usd: 1 CPU 1: hi: 186, btch: 31 usd: 14 active_anon:900420 inactive_anon:28978 isolated_anon:0 active_file:22 inactive_file:24 isolated_file:0 unevictable:4 dirty:5 writeback:0 unstable:0 free:20346 slab_reclaimable:8656 slab_unreclaimable:10414 mapped:18437 shmem:243751 pagetables:7717 bounce:0 free_cma:0 DMA free:12120kB min:272kB low:340kB high:408kB active_anon:2892kB inactive_anon:872kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15900kB mlocked:0kB dirty:0kB writeback:0kB mapped:1672kB shmem:3596kB slab_reclaimable:0kB slab_unreclaimable:16kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes lowmem_reserve[]: 0 2951 3836 3836 DMA32 free:55316kB min:51776kB low:64720kB high:77664kB active_anon:2834992kB inactive_anon:108408kB active_file:52kB inactive_file:56kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3021968kB mlocked:0kB dirty:20kB writeback:0kB mapped:65916kB shmem:943452kB slab_reclaimable:11716kB slab_unreclaimable:8904kB kernel_stack:488kB pagetables:11880kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:3103 all_unreclaimable? yes lowmem_reserve[]: 0 0 885 885 Normal free:13948kB min:15532kB low:19412kB high:23296kB active_anon:763796kB inactive_anon:6632kB active_file:36kB inactive_file:40kB unevictable:16kB isolated(anon):0kB isolated(file):0kB present:906664kB mlocked:16kB dirty:0kB writeback:0kB mapped:6160kB shmem:27956kB slab_reclaimable:22908kB slab_unreclaimable:32736kB kernel_stack:2352kB pagetables:18988kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:602 all_unreclaimable? yes lowmem_reserve[]: 0 0 0 0 DMA: 2*4kB 2*8kB 2*16kB 1*32kB 2*64kB 1*128kB 2*256kB 2*512kB 2*1024kB 2*2048kB 1*4096kB = 12120kB DMA32: 883*4kB 1525*8kB 513*16kB 637*32kB 109*64kB 8*128kB 0*256kB 0*512kB 1*1024kB 1*2048kB 0*4096kB = 55396kB Normal: 269*4kB 255*8kB 227*16kB 141*32kB 30*64kB 4*128kB 1*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 15996kB 243797 total pagecache pages 0 pages in swap cache Swap cache stats: add 0, delete 0, find 0/0 Free swap = 0kB Total swap = 0kB 1032176 pages RAM 42789 pages reserved 553637 pages shared 943817 pages non-shared X: page allocation failure: order:0, mode:0x200da Pid: 1126, comm: X Not tainted 3.7.0-rc5-00007-g95e21c5 #100 Call Trace: [] warn_alloc_failed+0xe9/0x140 [] __alloc_pages_nodemask+0x7fa/0xa40 [] shmem_getpage_gfp+0x603/0x9d0 [] ? native_sched_clock+0x26/0x90 [] shmem_fault+0x4f/0xa0 [] shm_fault+0x1e/0x20 [] __do_fault+0x73/0x4d0 [] handle_pte_fault+0x97/0x9a0 [] ? __lock_is_held+0x5f/0x90 [] ? get_parent_ip+0x11/0x50 [] handle_mm_fault+0x22f/0x2f0 [] __get_user_pages+0x12a/0x530 [] ? _raw_spin_unlock+0x35/0x60 [] get_dump_page+0x45/0x60 [] elf_core_dump+0x16bd/0x1960 [] ? elf_core_dump+0x9d6/0x1960 [] ? sub_preempt_count+0x79/0xd0 [] ? mutex_unlock+0xe/0x10 [] ? do_truncate+0x73/0xa0 [] do_coredump+0xa21/0xeb0 [] ? debug_check_no_locks_freed+0xe0/0x170 [] ? trace_hardirqs_off+0xd/0x10 [] get_signal_to_deliver+0x2e1/0x960 [] do_signal+0x3f/0x9a0 [] ? pci_fixup_msi_k8t_onboard_sound+0x7d/0x97 [] ? is_prefetch.isra.15+0x1a6/0x1fd [] ? error_sti+0x5/0x6 [] ? retint_signal+0x11/0x90 [] do_notify_resume+0x80/0xb0 [] retint_signal+0x46/0x90 Mem-Info: DMA per-cpu: CPU 0: hi: 0, btch: 1 usd: 0 CPU 1: hi: 0, btch: 1 usd: 0 DMA32 per-cpu: CPU 0: hi: 186, btch: 31 usd: 0 CPU 1: hi: 186, btch: 31 usd: 0 Normal per-cpu: CPU 0: hi: 186, btch: 31 usd: 1 CPU 1: hi: 186, btch: 31 usd: 24 active_anon:900420 inactive_anon:28978 isolated_anon:0 active_file:22 inactive_file:24 isolated_file:19 unevictable:4 dirty:5 writeback:0 unstable:0 free:20222 slab_reclaimable:8656 slab_unreclaimable:10414 mapped:18437 shmem:243751 pagetables:7717 bounce:0 free_cma:0 DMA free:12120kB min:272kB low:340kB high:408kB active_anon:2892kB inactive_anon:872kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15900kB mlocked:0kB dirty:0kB writeback:0kB mapped:1672kB shmem:3596kB slab_reclaimable:0kB slab_unreclaimable:16kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes lowmem_reserve[]: 0 2951 3836 3836 DMA32 free:55316kB min:51776kB low:64720kB high:77664kB active_anon:2834992kB inactive_anon:108408kB active_file:52kB inactive_file:56kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3021968kB mlocked:0kB dirty:20kB writeback:0kB mapped:65916kB shmem:943452kB slab_reclaimable:11716kB slab_unreclaimable:8904kB kernel_stack:488kB pagetables:11880kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:3940 all_unreclaimable? yes [ 6242] 11641 6242 126497 1479 172 0 0 kded4 [ 6244] 11641 6244 2977 48 11 0 0 gam_server [10804] 11641 10804 101320 307 47 0 0 gvfsd-http [12175] 0 12175 27197 32 10 0 0 agetty [12249] 11641 12249 28719 252 14 0 0 bash [14862] 0 14862 51773 344 55 0 0 cupsd [14868] 4 14868 18105 158 39 0 0 cups-polld [16728] 11641 16728 28691 244 12 0 0 bash [16975] 0 16975 9109 253 23 0 -1000 systemd-udevd [17618] 0 17618 8245 87 22 0 0 systemd-logind [ 3133] 11641 3133 43721 132 40 0 0 su [ 3136] 0 3136 28564 139 12 0 0 bash [ 3983] 11641 3983 43722 134 41 0 0 su [ 3986] 0 3986 28564 144 13 0 0 bash [16350] 11641 16350 28691 245 14 0 0 bash [31228] 11641 31228 28691 245 11 0 0 bash [31922] 11641 31922 28719 250 13 0 0 bash [ 2340] 11641 2340 28691 245 15 0 0 bash [12586] 38 12586 7851 150 19 0 0 ntpd [32658] 11641 32658 41192 424 35 0 0 mc [32660] 11641 32660 28692 245 13 0 0 bash [10971] 11641 10971 43722 133 43 0 0 su [10974] 0 10974 28564 132 12 0 0 bash [11343] 0 11343 28497 66 11 0 0 ksmtuned [11387] 11641 11387 28719 254 11 0 0 bash [11450] 11641 11450 28691 246 13 0 0 bash [11576] 11641 11576 43722 133 40 0 0 su [11579] 0 11579 28564 141 13 0 0 bash [12106] 11641 12106 28691 244 12 0 0 bash [12141] 11641 12141 43722 132 44 0 0 su [12144] 0 12144 28564 140 11 0 0 bash [12264] 11641 12264 28691 245 11 0 0 bash [12299] 11641 12299 43721 133 40 0 0 su [12302] 0 12302 28564 137 12 0 0 bash [26024] 11641 26024 28691 245 13 0 0 bash [26083] 11641 26083 28691 245 13 0 0 bash [28235] 11641 28235 43721 132 42 0 0 su [28238] 0 28238 28564 143 13 0 0 bash [29460] 11641 29460 43721 132 42 0 0 su [29463] 0 29463 28564 137 12 0 0 bash [29758] 11641 29758 28720 256 12 0 0 bash [29864] 11641 29864 41916 1153 36 0 0 mc [29866] 11641 29866 28728 257 11 0 0 bash [32750] 0 32750 23164 2994 47 0 0 dhclient [ 323] 0 323 24081 471 48 0 0 sendmail [ 347] 51 347 20347 367 38 0 0 sendmail [ 907] 11641 907 379562 159766 707 0 0 thunderbird [ 6340] 11641 6340 28719 251 12 0 0 bash [ 6790] 11641 6790 80307 620 101 0 0 xfce4-notifyd [ 6844] 0 6844 26669 23 9 0 0 sleep Out of memory: Kill process 907 (thunderbird) score 162 or sacrifice child Killed process 907 (thunderbird) total-vm:1518248kB, anon-rss:638476kB, file-rss:588kB lowmem_reserve[]: 0 0 885 885 Normal free:12832kB min:15532kB low:19412kB high:23296kB active_anon:763796kB inactive_anon:6632kB active_file:36kB inactive_file:40kB unevictable:16kB isolated(anon):0kB isolated(file):0kB present:906664kB mlocked:16kB dirty:0kB writeback:0kB mapped:6160kB shmem:27956kB slab_reclaimable:22908kB slab_unreclaimable:32736kB kernel_stack:2352kB pagetables:18988kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:1742 all_unreclaimable? yes lowmem_reserve[]: 0 0 0 0 DMA: 2*4kB 2*8kB 2*16kB 1*32kB 2*64kB 1*128kB 2*256kB 2*512kB 2*1024kB 2*2048kB 1*4096kB = 12120kB DMA32: 883*4kB 1525*8kB 513*16kB 637*32kB 109*64kB 8*128kB 0*256kB 0*512kB 1*1024kB 1*2048kB 0*4096kB = 55396kB Normal: 270*4kB 173*8kB 198*16kB 141*32kB 30*64kB 4*128kB 1*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 14880kB 243797 total pagecache pages 0 pages in swap cache Swap cache stats: add 0, delete 0, find 0/0 Free swap = 0kB Total swap = 0kB 1032176 pages RAM 42789 pages reserved 553659 pages shared 937056 pages non-shared SysRq : Emergency Sync Emergency Sync complete SysRq : Emergency Remount R/O -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx131.postini.com [74.125.245.131]) by kanga.kvack.org (Postfix) with SMTP id 04BD26B005D for ; Sun, 18 Nov 2012 14:08:04 -0500 (EST) Received: by mail-ea0-f169.google.com with SMTP id a12so1172017eaa.14 for ; Sun, 18 Nov 2012 11:08:03 -0800 (PST) Message-ID: <50A9320D.4060700@suse.cz> Date: Sun, 18 Nov 2012 20:07:57 +0100 From: Jiri Slaby MIME-Version: 1.0 Subject: Re: kswapd0: excessive CPU usage References: <20121015110937.GE29125@suse.de> <5093A3F4.8090108@redhat.com> <5093A631.5020209@suse.cz> <509422C3.1000803@suse.cz> <509C84ED.8090605@linux.vnet.ibm.com> <509CB9D1.6060704@redhat.com> <20121109090635.GG8218@suse.de> <509F6C2A.9060502@redhat.com> <20121112121956.GT8218@suse.de> <50A0F5F0.6090400@redhat.com> <20121112133139.GU8218@suse.de> <50A9304E.3020205@redhat.com> In-Reply-To: <50A9304E.3020205@redhat.com> Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Zdenek Kabelac Cc: Mel Gorman , Seth Jennings , Jiri Slaby , Valdis.Kletnieks@vt.edu, linux-mm@kvack.org, LKML , Andrew Morton , Rik van Riel , Robert Jennings On 11/18/2012 08:00 PM, Zdenek Kabelac wrote: > For some reason my machine went ouf of memory and OOM killed > firefox and then even whole Xsession. > > Unsure whether it's related to those 2 patches - but I've never had > such OOM failure before. As I wrote, this would be me: https://lkml.org/lkml/2012/11/15/150 There is no -next tree for Friday which would contain the set already. So for now, it should be enough for you to apply: https://lkml.org/lkml/2012/11/15/95 Or, alternatively, if you use a brand new systemd, it likes to fork bomb using udev. thanks, -- js suse labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx173.postini.com [74.125.245.173]) by kanga.kvack.org (Postfix) with SMTP id 5D1086B0072 for ; Mon, 19 Nov 2012 20:44:17 -0500 (EST) Subject: Re: [PATCH] Revert "mm: remove __GFP_NO_KSWAPD" In-Reply-To: Your message of "Fri, 16 Nov 2012 11:51:24 -0800." <20121116115124.c2981abc.akpm@linux-foundation.org> From: Valdis.Kletnieks@vt.edu References: <20121012135726.GY29125@suse.de> <507BDD45.1070705@suse.cz> <20121015110937.GE29125@suse.de> <5093A3F4.8090108@redhat.com> <5093A631.5020209@suse.cz> <509422C3.1000803@suse.cz> <509C84ED.8090605@linux.vnet.ibm.com> <509CB9D1.6060704@redhat.com> <20121109090635.GG8218@suse.de> <509F6C2A.9060502@redhat.com> <20121112113731.GS8218@suse.de> <20121116115124.c2981abc.akpm@linux-foundation.org> Mime-Version: 1.0 Content-Type: multipart/signed; boundary="==_Exmh_1353375823_1855P"; micalg=pgp-sha1; protocol="application/pgp-signature" Content-Transfer-Encoding: 7bit Date: Mon, 19 Nov 2012 20:43:43 -0500 Message-ID: <45635.1353375823@turing-police.cc.vt.edu> Sender: owner-linux-mm@kvack.org List-ID: To: Andrew Morton Cc: Josh Boyer , Mel Gorman , Zdenek Kabelac , Seth Jennings , Jiri Slaby , Jiri Slaby , linux-mm@kvack.org, LKML , Rik van Riel , Robert Jennings --==_Exmh_1353375823_1855P Content-Type: text/plain; charset=us-ascii On Fri, 16 Nov 2012 11:51:24 -0800, Andrew Morton said: > On Fri, 16 Nov 2012 14:14:47 -0500 > Josh Boyer wrote: > > > > The temptation is to supply a patch that checks if kswapd was woken for > > > THP and if so ignore pgdat->kswapd_max_order but it'll be a hack and not > > > backed up by proper testing. As 3.7 is very close to release and this is > > > not a bug we should release with, a safer path is to revert "mm: remove > > > __GFP_NO_KSWAPD" for now and revisit it with the view to ironing out the > > > balance_pgdat() logic in general. > > > > > > Signed-off-by: Mel Gorman > > > > Does anyone know if this is queued to go into 3.7 somewhere? I looked > > a bit and can't find it in a tree. We have a few reports of Fedora > > rawhide users hitting this. > > Still thinking about it. We're reverting quite a lot of material > lately. > mm-revert-mm-vmscan-scale-number-of-pages-reclaimed-by-reclaim-compaction-based-on-failures.patch > and revert-mm-fix-up-zone-present-pages.patch are queued for 3.7. > > I'll toss this one in there as well, but I can't say I'm feeling > terribly confident. How is Valdis's machine nowadays? I admit possibly having lost the plot. With the two patches you mention stuck on top of next-20121114, I'm seeing less kswapd issues but am still tripping over them on occasion. It seems to be related to uptime - I don't see any for a few hours, but they become more frequent. I was seeing quite a few of them yesterday after I had a 30-hour uptime. I'll stick Mel's "mm: remove __GFP_NO_KSWAPD" patch on this evening and let you know what happens (might be a day or two before I have definitive results, as usualally my laptop gets rebooted twice a day). --==_Exmh_1353375823_1855P Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) Comment: Exmh version 2.5 07/13/2001 iQIVAwUBUKrgTgdmEQWDXROgAQIlPw/+O72fn1X2bl4WGFjrOWRpJj0rwxAmGh5F DHQDXO0ddBRnK2myFab16ISrDuU11+tP+ygRgepOYyBZ6sBL6EneIIc0Wzvpih6G eB0rvgKeWox2xk0LEcghxP8mgV3umAmyD4lrhZrxot4jzmiVqZu/57jmubZjzT0j eqxLZ+KU23WGHkiRN94kKZehElf+Jw0N9cmKZTB2I5HkIEzx7gvkHzSXD6s112bC 9l4Jq5eToQA+lc12314gr9PWzXGkYlarftXgly23cHUk/m055mG80BOZWXj/hglF cOmj+EwOWg76+rb7o+L3Z0JIlV4ol0bdXQwlXtx9/ePo0q12ENgXCJLUpVcutMfd C8cf6RVG1b0OPKDjT60Igq9NBVHTSTB2T0EH0wdBs6knLRDljehzNpQ1TxVEOlVg bTq2jPN7sa+e+izKdcj27QwAHYZ7A0GxoMwvEIs6efFE2Ps3vci64ZkaJzfgts3Z 3+oSuYciLjzoLzlQ/+xtu3+LkzRZD66WQHi792nW8JRHrGhOJPAN+REyMPrLsu18 gp8umUDkTtqMEUIr9feGnKlSlIFLRMClAyrsTuMC6dvQgykNAHKG32IZYFHJjY9M HUestGffH807rrmjl8SUFk/EM31gpCCXxdQMVkZNaMdkuJ90G0hW0OzHXfON2++o 15InbL4Up8E= =frhe -----END PGP SIGNATURE----- --==_Exmh_1353375823_1855P-- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx157.postini.com [74.125.245.157]) by kanga.kvack.org (Postfix) with SMTP id 4ABF36B008A for ; Tue, 20 Nov 2012 04:18:29 -0500 (EST) Message-ID: <50AB4ADB.6090506@parallels.com> Date: Tue, 20 Nov 2012 13:18:19 +0400 From: Glauber Costa MIME-Version: 1.0 Subject: Re: [PATCH] Revert "mm: remove __GFP_NO_KSWAPD" References: <20121012135726.GY29125@suse.de> <507BDD45.1070705@suse.cz> <20121015110937.GE29125@suse.de> <5093A3F4.8090108@redhat.com> <5093A631.5020209@suse.cz> <509422C3.1000803@suse.cz> <509C84ED.8090605@linux.vnet.ibm.com> <509CB9D1.6060704@redhat.com> <20121109090635.GG8218@suse.de> <509F6C2A.9060502@redhat.com> <20121112113731.GS8218@suse.de> In-Reply-To: <20121112113731.GS8218@suse.de> Content-Type: text/plain; charset="ISO-8859-15" Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Mel Gorman Cc: Zdenek Kabelac , Seth Jennings , Jiri Slaby , Valdis.Kletnieks@vt.edu, Jiri Slaby , linux-mm@kvack.org, LKML , Andrew Morton , Rik van Riel , Robert Jennings On 11/12/2012 03:37 PM, Mel Gorman wrote: > diff --git a/include/linux/gfp.h b/include/linux/gfp.h > index 02c1c971..d0a7967 100644 > --- a/include/linux/gfp.h > +++ b/include/linux/gfp.h > @@ -31,6 +31,7 @@ struct vm_area_struct; > #define ___GFP_THISNODE 0x40000u > #define ___GFP_RECLAIMABLE 0x80000u > #define ___GFP_NOTRACK 0x200000u > +#define ___GFP_NO_KSWAPD 0x400000u > #define ___GFP_OTHER_NODE 0x800000u > #define ___GFP_WRITE 0x1000000u Keep in mind that this bit has been reused in -mm. If this patch needs to be reverted, we'll need to first change the definition of __GFP_KMEMCG (and __GFP_BITS_SHIFT as a result), or it would break things. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx107.postini.com [74.125.245.107]) by kanga.kvack.org (Postfix) with SMTP id 2EFBA6B006C for ; Tue, 20 Nov 2012 10:38:47 -0500 (EST) Received: by mail-vc0-f169.google.com with SMTP id fl17so7945489vcb.14 for ; Tue, 20 Nov 2012 07:38:45 -0800 (PST) MIME-Version: 1.0 In-Reply-To: <20121116200616.GK8218@suse.de> References: <20121015110937.GE29125@suse.de> <5093A3F4.8090108@redhat.com> <5093A631.5020209@suse.cz> <509422C3.1000803@suse.cz> <509C84ED.8090605@linux.vnet.ibm.com> <509CB9D1.6060704@redhat.com> <20121109090635.GG8218@suse.de> <509F6C2A.9060502@redhat.com> <20121112113731.GS8218@suse.de> <20121116200616.GK8218@suse.de> Date: Tue, 20 Nov 2012 10:38:45 -0500 Message-ID: Subject: Re: [PATCH] Revert "mm: remove __GFP_NO_KSWAPD" From: Josh Boyer Content-Type: text/plain; charset=ISO-8859-1 Sender: owner-linux-mm@kvack.org List-ID: To: Mel Gorman Cc: Zdenek Kabelac , Seth Jennings , Jiri Slaby , Valdis.Kletnieks@vt.edu, Jiri Slaby , linux-mm@kvack.org, LKML , Andrew Morton , Rik van Riel , Robert Jennings , Thorsten Leemhuis , bruno@wolff.to On Fri, Nov 16, 2012 at 3:06 PM, Mel Gorman wrote: > On Fri, Nov 16, 2012 at 02:14:47PM -0500, Josh Boyer wrote: >> On Mon, Nov 12, 2012 at 6:37 AM, Mel Gorman wrote: >> > With "mm: vmscan: scale number of pages reclaimed by reclaim/compaction >> > based on failures" reverted, Zdenek Kabelac reported the following >> > >> > Hmm, so it's just took longer to hit the problem and observe >> > kswapd0 spinning on my CPU again - it's not as endless like before - >> > but still it easily eats minutes - it helps to turn off Firefox >> > or TB (memory hungry apps) so kswapd0 stops soon - and restart >> > those apps again. (And I still have like >1GB of cached memory) >> > >> > kswapd0 R running task 0 30 2 0x00000000 >> > ffff8801331efae8 0000000000000082 0000000000000018 0000000000000246 >> > ffff880135b9a340 ffff8801331effd8 ffff8801331effd8 ffff8801331effd8 >> > ffff880055dfa340 ffff880135b9a340 00000000331efad8 ffff8801331ee000 >> > Call Trace: >> > [] preempt_schedule+0x42/0x60 >> > [] _raw_spin_unlock+0x55/0x60 >> > [] put_super+0x31/0x40 >> > [] drop_super+0x22/0x30 >> > [] prune_super+0x149/0x1b0 >> > [] shrink_slab+0xba/0x510 >> > >> > The sysrq+m indicates the system has no swap so it'll never reclaim >> > anonymous pages as part of reclaim/compaction. That is one part of the >> > problem but not the root cause as file-backed pages could also be reclaimed. >> > >> > The likely underlying problem is that kswapd is woken up or kept awake >> > for each THP allocation request in the page allocator slow path. >> > >> > If compaction fails for the requesting process then compaction will be >> > deferred for a time and direct reclaim is avoided. However, if there >> > are a storm of THP requests that are simply rejected, it will still >> > be the the case that kswapd is awake for a prolonged period of time >> > as pgdat->kswapd_max_order is updated each time. This is noticed by >> > the main kswapd() loop and it will not call kswapd_try_to_sleep(). >> > Instead it will loopp, shrinking a small number of pages and calling >> > shrink_slab() on each iteration. >> > >> > The temptation is to supply a patch that checks if kswapd was woken for >> > THP and if so ignore pgdat->kswapd_max_order but it'll be a hack and not >> > backed up by proper testing. As 3.7 is very close to release and this is >> > not a bug we should release with, a safer path is to revert "mm: remove >> > __GFP_NO_KSWAPD" for now and revisit it with the view to ironing out the >> > balance_pgdat() logic in general. >> > >> > Signed-off-by: Mel Gorman >> >> Does anyone know if this is queued to go into 3.7 somewhere? I looked >> a bit and can't find it in a tree. We have a few reports of Fedora >> rawhide users hitting this. >> > > No, because I was waiting to hear if a) it worked and preferably if the > alternative "less safe" option worked. This close to release it might be > better to just go with the safe option. We've been tracking it in https://bugzilla.redhat.com/show_bug.cgi?id=866988 and people say this revert patch doesn't seem to make the issue go away fully. Thorsten has created another kernel with the other patch applied for testing. At least I think that is the latest status from the bug. Hopefully the commenters will chime in. josh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx173.postini.com [74.125.245.173]) by kanga.kvack.org (Postfix) with SMTP id 694A46B0072 for ; Tue, 20 Nov 2012 11:14:15 -0500 (EST) Date: Tue, 20 Nov 2012 10:13:15 -0600 From: Bruno Wolff III Subject: Re: [PATCH] Revert "mm: remove __GFP_NO_KSWAPD" Message-ID: <20121120161315.GA8338@wolff.to> References: <5093A631.5020209@suse.cz> <509422C3.1000803@suse.cz> <509C84ED.8090605@linux.vnet.ibm.com> <509CB9D1.6060704@redhat.com> <20121109090635.GG8218@suse.de> <509F6C2A.9060502@redhat.com> <20121112113731.GS8218@suse.de> <20121116200616.GK8218@suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: Josh Boyer Cc: Mel Gorman , Zdenek Kabelac , Seth Jennings , Jiri Slaby , Valdis.Kletnieks@vt.edu, Jiri Slaby , linux-mm@kvack.org, LKML , Andrew Morton , Rik van Riel , Robert Jennings , Thorsten Leemhuis On Tue, Nov 20, 2012 at 10:38:45 -0500, Josh Boyer wrote: > >We've been tracking it in https://bugzilla.redhat.com/show_bug.cgi?id=866988 >and people say this revert patch doesn't seem to make the issue go away >fully. Thorsten has created another kernel with the other patch applied >for testing. > >At least I think that is the latest status from the bug. Hopefully the >commenters will chime in. I am seeing kswapd0 hogging a cpu right now. I have two rsyncs and an md sync running and a couple of large memory processes (java and firefox) idle. I haven't been seeing this happen as often as previously. Before doing a yum update with an rsync was pretty good at triggering the problem. Now, not so much. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx107.postini.com [74.125.245.107]) by kanga.kvack.org (Postfix) with SMTP id 388426B005D for ; Tue, 20 Nov 2012 12:43:12 -0500 (EST) Message-ID: <50ABC128.80706@leemhuis.info> Date: Tue, 20 Nov 2012 18:43:04 +0100 From: Thorsten Leemhuis MIME-Version: 1.0 Subject: Re: [PATCH] Revert "mm: remove __GFP_NO_KSWAPD" References: <20121015110937.GE29125@suse.de> <5093A3F4.8090108@redhat.com> <5093A631.5020209@suse.cz> <509422C3.1000803@suse.cz> <509C84ED.8090605@linux.vnet.ibm.com> <509CB9D1.6060704@redhat.com> <20121109090635.GG8218@suse.de> <509F6C2A.9060502@redhat.com> <20121112113731.GS8218@suse.de> <20121116200616.GK8218@suse.de> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Josh Boyer Cc: Mel Gorman , Zdenek Kabelac , Seth Jennings , Jiri Slaby , Valdis.Kletnieks@vt.edu, Jiri Slaby , linux-mm@kvack.org, LKML , Andrew Morton , Rik van Riel , Robert Jennings , bruno@wolff.to On 20.11.2012 16:38, Josh Boyer wrote: > On Fri, Nov 16, 2012 at 3:06 PM, Mel Gorman wrote: >> On Fri, Nov 16, 2012 at 02:14:47PM -0500, Josh Boyer wrote: >>> On Mon, Nov 12, 2012 at 6:37 AM, Mel Gorman wrote: >>>> With "mm: vmscan: scale number of pages reclaimed by reclaim/compaction >>>> based on failures" reverted, Zdenek Kabelac reported the following >>>> >>>> Hmm, so it's just took longer to hit the problem and observe >>>> kswapd0 spinning on my CPU again - it's not as endless like before - >>>> but still it easily eats minutes - it helps to turn off Firefox >>>> or TB (memory hungry apps) so kswapd0 stops soon - and restart >>>> those apps again. (And I still have like >1GB of cached memory) >>>> >>>> kswapd0 R running task 0 30 2 0x00000000 >>>> ffff8801331efae8 0000000000000082 0000000000000018 0000000000000246 >>>> ffff880135b9a340 ffff8801331effd8 ffff8801331effd8 ffff8801331effd8 >>>> ffff880055dfa340 ffff880135b9a340 00000000331efad8 ffff8801331ee000 >>>> Call Trace: >>>> [] preempt_schedule+0x42/0x60 >>>> [] _raw_spin_unlock+0x55/0x60 >>>> [] put_super+0x31/0x40 >>>> [] drop_super+0x22/0x30 >>>> [] prune_super+0x149/0x1b0 >>>> [] shrink_slab+0xba/0x510 >>>> >>>> The sysrq+m indicates the system has no swap so it'll never reclaim >>>> anonymous pages as part of reclaim/compaction. That is one part of the >>>> problem but not the root cause as file-backed pages could also be reclaimed. >>>> >>>> The likely underlying problem is that kswapd is woken up or kept awake >>>> for each THP allocation request in the page allocator slow path. >>>> >>>> If compaction fails for the requesting process then compaction will be >>>> deferred for a time and direct reclaim is avoided. However, if there >>>> are a storm of THP requests that are simply rejected, it will still >>>> be the the case that kswapd is awake for a prolonged period of time >>>> as pgdat->kswapd_max_order is updated each time. This is noticed by >>>> the main kswapd() loop and it will not call kswapd_try_to_sleep(). >>>> Instead it will loopp, shrinking a small number of pages and calling >>>> shrink_slab() on each iteration. >>>> >>>> The temptation is to supply a patch that checks if kswapd was woken for >>>> THP and if so ignore pgdat->kswapd_max_order but it'll be a hack and not >>>> backed up by proper testing. As 3.7 is very close to release and this is >>>> not a bug we should release with, a safer path is to revert "mm: remove >>>> __GFP_NO_KSWAPD" for now and revisit it with the view to ironing out the >>>> balance_pgdat() logic in general. >>>> >>>> Signed-off-by: Mel Gorman >>> >>> Does anyone know if this is queued to go into 3.7 somewhere? I looked >>> a bit and can't find it in a tree. We have a few reports of Fedora >>> rawhide users hitting this. >> >> No, because I was waiting to hear if a) it worked and preferably if the >> alternative "less safe" option worked. This close to release it might be >> better to just go with the safe option. > > We've been tracking it in https://bugzilla.redhat.com/show_bug.cgi?id=866988 > and people say this revert patch doesn't seem to make the issue go away > fully. Thorsten has created another kernel with the other patch applied > for testing. > > At least I think that is the latest status from the bug. Hopefully the > commenters will chime in. The short story from my current point of view is: * my main machine at home where I initially saw the issue that started this thread seems to be running fine with rc6 and the "safe" patch Mel posted in https://lkml.org/lkml/2012/11/12/113 Before that I ran a rc5 kernel with the revert that went into rc6 and the "safe" patch -- that worked fine for a few days, too. * I have a second machine where I started to use 3.7-rc kernels only yesterday (the machine triggered a bug in the radeon driver that seems to be fixed in rc6) which showed symptoms like the ones Zdenek Kabelac mentions in this thread. I wasn't able to look closer at it, but simply tried rc6 with the safe patch, which didn't help. I'm now running rc6 with the "riskier" patch from https://lkml.org/lkml/2012/11/12/151 I can't yet tell if it helps. If the problems shows up again I'll try to capture more debugging data via sysrq -- there wasn't any time for that when I was running rc6 with the safe patch, sorry. Thorsten -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx158.postini.com [74.125.245.158]) by kanga.kvack.org (Postfix) with SMTP id 661896B0070 for ; Tue, 20 Nov 2012 15:18:19 -0500 (EST) Date: Tue, 20 Nov 2012 12:18:17 -0800 From: Andrew Morton Subject: Re: [PATCH] Revert "mm: remove __GFP_NO_KSWAPD" Message-Id: <20121120121817.cf80b8ad.akpm@linux-foundation.org> In-Reply-To: <50AB4ADB.6090506@parallels.com> References: <20121012135726.GY29125@suse.de> <507BDD45.1070705@suse.cz> <20121015110937.GE29125@suse.de> <5093A3F4.8090108@redhat.com> <5093A631.5020209@suse.cz> <509422C3.1000803@suse.cz> <509C84ED.8090605@linux.vnet.ibm.com> <509CB9D1.6060704@redhat.com> <20121109090635.GG8218@suse.de> <509F6C2A.9060502@redhat.com> <20121112113731.GS8218@suse.de> <50AB4ADB.6090506@parallels.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Glauber Costa Cc: Mel Gorman , Zdenek Kabelac , Seth Jennings , Jiri Slaby , Valdis.Kletnieks@vt.edu, Jiri Slaby , linux-mm@kvack.org, LKML , Rik van Riel , Robert Jennings On Tue, 20 Nov 2012 13:18:19 +0400 Glauber Costa wrote: > On 11/12/2012 03:37 PM, Mel Gorman wrote: > > diff --git a/include/linux/gfp.h b/include/linux/gfp.h > > index 02c1c971..d0a7967 100644 > > --- a/include/linux/gfp.h > > +++ b/include/linux/gfp.h > > @@ -31,6 +31,7 @@ struct vm_area_struct; > > #define ___GFP_THISNODE 0x40000u > > #define ___GFP_RECLAIMABLE 0x80000u > > #define ___GFP_NOTRACK 0x200000u > > +#define ___GFP_NO_KSWAPD 0x400000u > > #define ___GFP_OTHER_NODE 0x800000u > > #define ___GFP_WRITE 0x1000000u > > Keep in mind that this bit has been reused in -mm. > If this patch needs to be reverted, we'll need to first change > the definition of __GFP_KMEMCG (and __GFP_BITS_SHIFT as a result), or it > would break things. I presently have /* Plain integer GFP bitmasks. Do not use this directly. */ #define ___GFP_DMA 0x01u #define ___GFP_HIGHMEM 0x02u #define ___GFP_DMA32 0x04u #define ___GFP_MOVABLE 0x08u #define ___GFP_WAIT 0x10u #define ___GFP_HIGH 0x20u #define ___GFP_IO 0x40u #define ___GFP_FS 0x80u #define ___GFP_COLD 0x100u #define ___GFP_NOWARN 0x200u #define ___GFP_REPEAT 0x400u #define ___GFP_NOFAIL 0x800u #define ___GFP_NORETRY 0x1000u #define ___GFP_MEMALLOC 0x2000u #define ___GFP_COMP 0x4000u #define ___GFP_ZERO 0x8000u #define ___GFP_NOMEMALLOC 0x10000u #define ___GFP_HARDWALL 0x20000u #define ___GFP_THISNODE 0x40000u #define ___GFP_RECLAIMABLE 0x80000u #define ___GFP_KMEMCG 0x100000u #define ___GFP_NOTRACK 0x200000u #define ___GFP_NO_KSWAPD 0x400000u #define ___GFP_OTHER_NODE 0x800000u #define ___GFP_WRITE 0x1000000u and #define __GFP_BITS_SHIFT 25 /* Room for N __GFP_FOO bits */ #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1)) Which I think is OK? I'd forgotten about __GFP_BITS_SHIFT. Should we do this? --- a/include/linux/gfp.h~a +++ a/include/linux/gfp.h @@ -35,6 +35,7 @@ struct vm_area_struct; #define ___GFP_NO_KSWAPD 0x400000u #define ___GFP_OTHER_NODE 0x800000u #define ___GFP_WRITE 0x1000000u +/* If the above are modified, __GFP_BITS_SHIFT may need updating */ /* * GFP bitmasks.. _ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx123.postini.com [74.125.245.123]) by kanga.kvack.org (Postfix) with SMTP id 963BF6B006C for ; Wed, 21 Nov 2012 03:30:28 -0500 (EST) Message-ID: <50AC911A.3070501@parallels.com> Date: Wed, 21 Nov 2012 12:30:18 +0400 From: Glauber Costa MIME-Version: 1.0 Subject: Re: [PATCH] Revert "mm: remove __GFP_NO_KSWAPD" References: <20121012135726.GY29125@suse.de> <507BDD45.1070705@suse.cz> <20121015110937.GE29125@suse.de> <5093A3F4.8090108@redhat.com> <5093A631.5020209@suse.cz> <509422C3.1000803@suse.cz> <509C84ED.8090605@linux.vnet.ibm.com> <509CB9D1.6060704@redhat.com> <20121109090635.GG8218@suse.de> <509F6C2A.9060502@redhat.com> <20121112113731.GS8218@suse.de> <50AB4ADB.6090506@parallels.com> <20121120121817.cf80b8ad.akpm@linux-foundation.org> In-Reply-To: <20121120121817.cf80b8ad.akpm@linux-foundation.org> Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Andrew Morton Cc: Mel Gorman , Zdenek Kabelac , Seth Jennings , Jiri Slaby , Valdis.Kletnieks@vt.edu, Jiri Slaby , linux-mm@kvack.org, LKML , Rik van Riel , Robert Jennings On 11/21/2012 12:18 AM, Andrew Morton wrote: > On Tue, 20 Nov 2012 13:18:19 +0400 > Glauber Costa wrote: > >> On 11/12/2012 03:37 PM, Mel Gorman wrote: >>> diff --git a/include/linux/gfp.h b/include/linux/gfp.h >>> index 02c1c971..d0a7967 100644 >>> --- a/include/linux/gfp.h >>> +++ b/include/linux/gfp.h >>> @@ -31,6 +31,7 @@ struct vm_area_struct; >>> #define ___GFP_THISNODE 0x40000u >>> #define ___GFP_RECLAIMABLE 0x80000u >>> #define ___GFP_NOTRACK 0x200000u >>> +#define ___GFP_NO_KSWAPD 0x400000u >>> #define ___GFP_OTHER_NODE 0x800000u >>> #define ___GFP_WRITE 0x1000000u >> >> Keep in mind that this bit has been reused in -mm. >> If this patch needs to be reverted, we'll need to first change >> the definition of __GFP_KMEMCG (and __GFP_BITS_SHIFT as a result), or it >> would break things. > > I presently have > > /* Plain integer GFP bitmasks. Do not use this directly. */ > #define ___GFP_DMA 0x01u > #define ___GFP_HIGHMEM 0x02u > #define ___GFP_DMA32 0x04u > #define ___GFP_MOVABLE 0x08u > #define ___GFP_WAIT 0x10u > #define ___GFP_HIGH 0x20u > #define ___GFP_IO 0x40u > #define ___GFP_FS 0x80u > #define ___GFP_COLD 0x100u > #define ___GFP_NOWARN 0x200u > #define ___GFP_REPEAT 0x400u > #define ___GFP_NOFAIL 0x800u > #define ___GFP_NORETRY 0x1000u > #define ___GFP_MEMALLOC 0x2000u > #define ___GFP_COMP 0x4000u > #define ___GFP_ZERO 0x8000u > #define ___GFP_NOMEMALLOC 0x10000u > #define ___GFP_HARDWALL 0x20000u > #define ___GFP_THISNODE 0x40000u > #define ___GFP_RECLAIMABLE 0x80000u > #define ___GFP_KMEMCG 0x100000u > #define ___GFP_NOTRACK 0x200000u > #define ___GFP_NO_KSWAPD 0x400000u > #define ___GFP_OTHER_NODE 0x800000u > #define ___GFP_WRITE 0x1000000u > > and > Humm, I didn't realize there were also another free space at 0x100000u. This seems fine. > #define __GFP_BITS_SHIFT 25 /* Room for N __GFP_FOO bits */ > #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1)) > > Which I think is OK? Yes, if we haven't increased the size of the flag-space, no need to change it. > > I'd forgotten about __GFP_BITS_SHIFT. Should we do this? > > --- a/include/linux/gfp.h~a > +++ a/include/linux/gfp.h > @@ -35,6 +35,7 @@ struct vm_area_struct; > #define ___GFP_NO_KSWAPD 0x400000u > #define ___GFP_OTHER_NODE 0x800000u > #define ___GFP_WRITE 0x1000000u > +/* If the above are modified, __GFP_BITS_SHIFT may need updating */ > This is a very helpful comment. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx196.postini.com [74.125.245.196]) by kanga.kvack.org (Postfix) with SMTP id A74806B00B5 for ; Wed, 21 Nov 2012 10:08:56 -0500 (EST) Date: Wed, 21 Nov 2012 15:08:50 +0000 From: Mel Gorman Subject: Re: [PATCH] Revert "mm: remove __GFP_NO_KSWAPD" Message-ID: <20121121150850.GF8218@suse.de> References: <5093A631.5020209@suse.cz> <509422C3.1000803@suse.cz> <509C84ED.8090605@linux.vnet.ibm.com> <509CB9D1.6060704@redhat.com> <20121109090635.GG8218@suse.de> <509F6C2A.9060502@redhat.com> <20121112113731.GS8218@suse.de> <20121116200616.GK8218@suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: Josh Boyer Cc: Zdenek Kabelac , Seth Jennings , Jiri Slaby , Valdis.Kletnieks@vt.edu, Jiri Slaby , linux-mm@kvack.org, LKML , Andrew Morton , Rik van Riel , Robert Jennings , Thorsten Leemhuis , bruno@wolff.to On Tue, Nov 20, 2012 at 10:38:45AM -0500, Josh Boyer wrote: > On Fri, Nov 16, 2012 at 3:06 PM, Mel Gorman wrote: > > On Fri, Nov 16, 2012 at 02:14:47PM -0500, Josh Boyer wrote: > >> On Mon, Nov 12, 2012 at 6:37 AM, Mel Gorman wrote: > >> > With "mm: vmscan: scale number of pages reclaimed by reclaim/compaction > >> > based on failures" reverted, Zdenek Kabelac reported the following > >> > > >> > Hmm, so it's just took longer to hit the problem and observe > >> > kswapd0 spinning on my CPU again - it's not as endless like before - > >> > but still it easily eats minutes - it helps to turn off Firefox > >> > or TB (memory hungry apps) so kswapd0 stops soon - and restart > >> > those apps again. (And I still have like >1GB of cached memory) > >> > > >> > kswapd0 R running task 0 30 2 0x00000000 > >> > ffff8801331efae8 0000000000000082 0000000000000018 0000000000000246 > >> > ffff880135b9a340 ffff8801331effd8 ffff8801331effd8 ffff8801331effd8 > >> > ffff880055dfa340 ffff880135b9a340 00000000331efad8 ffff8801331ee000 > >> > Call Trace: > >> > [] preempt_schedule+0x42/0x60 > >> > [] _raw_spin_unlock+0x55/0x60 > >> > [] put_super+0x31/0x40 > >> > [] drop_super+0x22/0x30 > >> > [] prune_super+0x149/0x1b0 > >> > [] shrink_slab+0xba/0x510 > >> > > >> > The sysrq+m indicates the system has no swap so it'll never reclaim > >> > anonymous pages as part of reclaim/compaction. That is one part of the > >> > problem but not the root cause as file-backed pages could also be reclaimed. > >> > > >> > The likely underlying problem is that kswapd is woken up or kept awake > >> > for each THP allocation request in the page allocator slow path. > >> > > >> > If compaction fails for the requesting process then compaction will be > >> > deferred for a time and direct reclaim is avoided. However, if there > >> > are a storm of THP requests that are simply rejected, it will still > >> > be the the case that kswapd is awake for a prolonged period of time > >> > as pgdat->kswapd_max_order is updated each time. This is noticed by > >> > the main kswapd() loop and it will not call kswapd_try_to_sleep(). > >> > Instead it will loopp, shrinking a small number of pages and calling > >> > shrink_slab() on each iteration. > >> > > >> > The temptation is to supply a patch that checks if kswapd was woken for > >> > THP and if so ignore pgdat->kswapd_max_order but it'll be a hack and not > >> > backed up by proper testing. As 3.7 is very close to release and this is > >> > not a bug we should release with, a safer path is to revert "mm: remove > >> > __GFP_NO_KSWAPD" for now and revisit it with the view to ironing out the > >> > balance_pgdat() logic in general. > >> > > >> > Signed-off-by: Mel Gorman > >> > >> Does anyone know if this is queued to go into 3.7 somewhere? I looked > >> a bit and can't find it in a tree. We have a few reports of Fedora > >> rawhide users hitting this. > >> > > > > No, because I was waiting to hear if a) it worked and preferably if the > > alternative "less safe" option worked. This close to release it might be > > better to just go with the safe option. > > We've been tracking it in https://bugzilla.redhat.com/show_bug.cgi?id=866988 > and people say this revert patch doesn't seem to make the issue go away > fully. Thorsten has created another kernel with the other patch applied > for testing. > There is also a potential accounting bug that could be affecting this. https://lkml.org/lkml/2012/11/20/613 . NR_FREE_PAGES affects watermark calculations. If it's drifts too far then processes would keep entering direct reclaim and waking kswapd even if there is no need to. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx124.postini.com [74.125.245.124]) by kanga.kvack.org (Postfix) with SMTP id 42C236B0071 for ; Fri, 23 Nov 2012 10:20:52 -0500 (EST) Message-ID: <50AF9450.9020803@leemhuis.info> Date: Fri, 23 Nov 2012 16:20:48 +0100 From: Thorsten Leemhuis MIME-Version: 1.0 Subject: Re: [PATCH] Revert "mm: remove __GFP_NO_KSWAPD" References: <20121015110937.GE29125@suse.de> <5093A3F4.8090108@redhat.com> <5093A631.5020209@suse.cz> <509422C3.1000803@suse.cz> <509C84ED.8090605@linux.vnet.ibm.com> <509CB9D1.6060704@redhat.com> <20121109090635.GG8218@suse.de> <509F6C2A.9060502@redhat.com> <20121112113731.GS8218@suse.de> <20121116200616.GK8218@suse.de> <50ABC128.80706@leemhuis.info> In-Reply-To: <50ABC128.80706@leemhuis.info> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org List-ID: To: Josh Boyer Cc: Mel Gorman , Zdenek Kabelac , Seth Jennings , Jiri Slaby , Valdis.Kletnieks@vt.edu, Jiri Slaby , linux-mm@kvack.org, LKML , Andrew Morton , Rik van Riel , Robert Jennings , bruno@wolff.to Thorsten Leemhuis wrote on 20.11.2012 18:43: > On 20.11.2012 16:38, Josh Boyer wrote: > > The short story from my current point of view is: Quick update, in case anybody is interested: > * my main machine at home where I initially saw the issue that started > this thread seems to be running fine with rc6 and the "safe" patch Mel > posted in https://lkml.org/lkml/2012/11/12/113 Before that I ran a rc5 > kernel with the revert that went into rc6 and the "safe" patch -- that > worked fine for a few days, too. On this machine I'm running a rc6 kernel + the fix for the accounting bug(A1) that went into mainline ~40 hours ago + the "riskier" patch Mel posted in https://lkml.org/lkml/2012/11/12/151 Up to now everything works fine. (A1) https://lkml.org/lkml/2012/11/21/362 > * I have a second machine where I started to use 3.7-rc kernels only > yesterday (the machine triggered a bug in the radeon driver that seems > to be fixed in rc6) which showed symptoms like the ones Zdenek Kabelac > mentions in this thread. I wasn't able to look closer at it, but simply > tried rc6 with the safe patch, which didn't help. I'm now running rc6 > with the "riskier" patch from https://lkml.org/lkml/2012/11/12/151 > I can't yet tell if it helps. If the problems shows up again I'll try to > capture more debugging data via sysrq -- there wasn't any time for that > when I was running rc6 with the safe patch, sorry. This machine is now also behaving fine with above mentioned rc6 kernel + the two patches. It seems the accounting bug was the root cause for the problems this machine showed. CU Thorsten -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx137.postini.com [74.125.245.137]) by kanga.kvack.org (Postfix) with SMTP id 4665F6B004D for ; Tue, 27 Nov 2012 06:12:31 -0500 (EST) Date: Tue, 27 Nov 2012 11:12:25 +0000 From: Mel Gorman Subject: Re: [PATCH] Revert "mm: remove __GFP_NO_KSWAPD" Message-ID: <20121127111225.GO8218@suse.de> References: <509C84ED.8090605@linux.vnet.ibm.com> <509CB9D1.6060704@redhat.com> <20121109090635.GG8218@suse.de> <509F6C2A.9060502@redhat.com> <20121112113731.GS8218@suse.de> <20121116200616.GK8218@suse.de> <50ABC128.80706@leemhuis.info> <50AF9450.9020803@leemhuis.info> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <50AF9450.9020803@leemhuis.info> Sender: owner-linux-mm@kvack.org List-ID: To: Thorsten Leemhuis Cc: Josh Boyer , Zdenek Kabelac , Seth Jennings , Jiri Slaby , Valdis.Kletnieks@vt.edu, Jiri Slaby , linux-mm@kvack.org, LKML , Andrew Morton , Rik van Riel , Robert Jennings , bruno@wolff.to On Fri, Nov 23, 2012 at 04:20:48PM +0100, Thorsten Leemhuis wrote: > Thorsten Leemhuis wrote on 20.11.2012 18:43: > > On 20.11.2012 16:38, Josh Boyer wrote: > > > > The short story from my current point of view is: > > Quick update, in case anybody is interested: > > > * my main machine at home where I initially saw the issue that started > > this thread seems to be running fine with rc6 and the "safe" patch Mel > > posted in https://lkml.org/lkml/2012/11/12/113 Before that I ran a rc5 > > kernel with the revert that went into rc6 and the "safe" patch -- that > > worked fine for a few days, too. > > On this machine I'm running a rc6 kernel + the fix for the accounting > bug(1) that went into mainline ~40 hours ago + the "riskier" patch Mel > posted in https://lkml.org/lkml/2012/11/12/151 > > Up to now everything works fine. > > (1) https://lkml.org/lkml/2012/11/21/362 > That's good news, thanks for the follow up. Maybe 3.7 will not be a complete disaster with respect to THP after all this. The riskier patch was not picked up simply because it was riskier and would still be vunerable to the effective infinite loop Johannes found in kswapd. It'll all need to be revisisted. > > * I have a second machine where I started to use 3.7-rc kernels only > > yesterday (the machine triggered a bug in the radeon driver that seems > > to be fixed in rc6) which showed symptoms like the ones Zdenek Kabelac > > mentions in this thread. I wasn't able to look closer at it, but simply > > tried rc6 with the safe patch, which didn't help. I'm now running rc6 > > with the "riskier" patch from https://lkml.org/lkml/2012/11/12/151 > > I can't yet tell if it helps. If the problems shows up again I'll try to > > capture more debugging data via sysrq -- there wasn't any time for that > > when I was running rc6 with the safe patch, sorry. > > This machine is now also behaving fine with above mentioned rc6 kernel + > the two patches. It seems the accounting bug was the root cause for the > problems this machine showed. > For some yes, for others no. Others are getting stuck within effective infinite loops in kswapd and the trigger cases are different although the symptoms loop similar. Thanks again. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754070Ab2JKIwh (ORCPT ); Thu, 11 Oct 2012 04:52:37 -0400 Received: from mail-wg0-f44.google.com ([74.125.82.44]:57915 "EHLO mail-wg0-f44.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751436Ab2JKIwc (ORCPT ); Thu, 11 Oct 2012 04:52:32 -0400 Message-ID: <507688CC.9000104@suse.cz> Date: Thu, 11 Oct 2012 10:52:28 +0200 From: Jiri Slaby User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/17.0 Thunderbird/17.0a2 MIME-Version: 1.0 To: linux-mm@kvack.org, LKML , Jiri Slaby Subject: kswapd0: wxcessive CPU usage X-Enigmail-Version: 1.5a1pre Content-Type: text/plain; charset=ISO-8859-2 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, with 3.6.0-next-20121008, kswapd0 is spinning my CPU at 100% for 1 minute or so. If I try to suspend to RAM, this trace appears: kswapd0 R running task 0 577 2 0x00000000 0000000000000000 00000000000000c0 cccccccccccccccd ffff8801c4146800 ffff8801c4b15c88 ffffffff8116ee05 0000000000003e32 ffff8801c3a79000 ffff8801c4b15ca8 ffffffff8116fdf8 ffff8801c480f398 ffff8801c3a79000 Call Trace: [] ? put_super+0x25/0x40 [] ? grab_super_passive+0x24/0xa0 [] ? prune_super+0x149/0x1b0 [] ? shrink_slab+0xa1/0x2d0 [] ? kswapd+0x66d/0xb60 [] ? try_to_free_pages+0x180/0x180 [] ? kthread+0xc0/0xd0 [] ? kthread_create_on_node+0x130/0x130 [] ? ret_from_fork+0x7c/0x90 [] ? kthread_create_on_node+0x130/0x130 # cat /proc/vmstat nr_free_pages 239962 nr_inactive_anon 89825 nr_active_anon 711136 nr_inactive_file 60386 nr_active_file 46668 nr_unevictable 0 nr_mlock 0 nr_anon_pages 500678 nr_mapped 41319 nr_file_pages 319317 nr_dirty 45 nr_writeback 0 nr_slab_reclaimable 21909 nr_slab_unreclaimable 21598 nr_page_table_pages 12131 nr_kernel_stack 491 nr_unstable 0 nr_bounce 0 nr_vmscan_write 1674280 nr_vmscan_immediate_reclaim 301662 nr_writeback_temp 0 nr_isolated_anon 0 nr_isolated_file 0 nr_shmem 212263 nr_dirtied 10620227 nr_written 9260939 nr_anon_transparent_hugepages 172 nr_free_cma 0 nr_dirty_threshold 31459 nr_dirty_background_threshold 15729 pgpgin 31311778 pgpgout 38987552 pswpin 0 pswpout 0 pgalloc_dma 0 pgalloc_dma32 245169455 pgalloc_normal 279685864 pgalloc_movable 0 pgfree 537318727 pgactivate 13126755 pgdeactivate 2482953 pgfault 645947575 pgmajfault 193427 pgrefill_dma 0 pgrefill_dma32 1124272 pgrefill_normal 1998033 pgrefill_movable 0 pgsteal_kswapd_dma 0 pgsteal_kswapd_dma32 2531015 pgsteal_kswapd_normal 3403006 pgsteal_kswapd_movable 0 pgsteal_direct_dma 0 pgsteal_direct_dma32 362488 pgsteal_direct_normal 1134511 pgsteal_direct_movable 0 pgscan_kswapd_dma 0 pgscan_kswapd_dma32 2693620 pgscan_kswapd_normal 5836491 pgscan_kswapd_movable 0 pgscan_direct_dma 0 pgscan_direct_dma32 368374 pgscan_direct_normal 1658486 pgscan_direct_movable 0 pgscan_direct_throttle 0 pginodesteal 258410 slabs_scanned 86459392 kswapd_inodesteal 3907549 kswapd_low_wmark_hit_quickly 15408 kswapd_high_wmark_hit_quickly 23113 kswapd_skip_congestion_wait 10 pageoutrun 2165627235 allocstall 11256 pgrotated 219624 compact_blocks_moved 4862077 compact_pages_moved 1970005 compact_pagemigrate_failed 1726156 compact_stall 21275 compact_fail 6589 compact_success 14686 htlb_buddy_alloc_success 0 htlb_buddy_alloc_fail 0 unevictable_pgs_culled 2799 unevictable_pgs_scanned 0 unevictable_pgs_rescued 22563 unevictable_pgs_mlocked 22563 unevictable_pgs_munlocked 22563 unevictable_pgs_cleared 0 unevictable_pgs_stranded 0 thp_fault_alloc 18725 thp_fault_fallback 64868 thp_collapse_alloc 9216 thp_collapse_alloc_failed 2031 thp_split 2146 Any ideas what it could be? -- js suse labs From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758776Ab2JKPeb (ORCPT ); Thu, 11 Oct 2012 11:34:31 -0400 Received: from mail-we0-f174.google.com ([74.125.82.174]:39434 "EHLO mail-we0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755455Ab2JKPe2 (ORCPT ); Thu, 11 Oct 2012 11:34:28 -0400 Message-ID: <5076E700.2030909@suse.cz> Date: Thu, 11 Oct 2012 17:34:24 +0200 From: Jiri Slaby User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/17.0 Thunderbird/17.0a2 MIME-Version: 1.0 To: Valdis.Kletnieks@vt.edu CC: linux-mm@kvack.org, LKML , Jiri Slaby Subject: Re: kswapd0: wxcessive CPU usage References: <507688CC.9000104@suse.cz> <106695.1349963080@turing-police.cc.vt.edu> In-Reply-To: <106695.1349963080@turing-police.cc.vt.edu> X-Enigmail-Version: 1.5a1pre Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 10/11/2012 03:44 PM, Valdis.Kletnieks@vt.edu wrote: > So at least we know we're not hallucinating. :) Just a thought? Do you have raid? -- js suse labs From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1759009Ab2JKR7k (ORCPT ); Thu, 11 Oct 2012 13:59:40 -0400 Received: from mail-wg0-f42.google.com ([74.125.82.42]:58609 "EHLO mail-wg0-f42.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756706Ab2JKR7i (ORCPT ); Thu, 11 Oct 2012 13:59:38 -0400 Message-ID: <50770905.5070904@suse.cz> Date: Thu, 11 Oct 2012 19:59:33 +0200 From: Jiri Slaby User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/17.0 Thunderbird/17.0a2 MIME-Version: 1.0 To: Valdis.Kletnieks@vt.edu CC: Jiri Slaby , linux-mm@kvack.org, LKML Subject: Re: kswapd0: wxcessive CPU usage References: <507688CC.9000104@suse.cz> <106695.1349963080@turing-police.cc.vt.edu> <5076E700.2030909@suse.cz> <118079.1349978211@turing-police.cc.vt.edu> In-Reply-To: <118079.1349978211@turing-police.cc.vt.edu> X-Enigmail-Version: 1.5a1pre Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 10/11/2012 07:56 PM, Valdis.Kletnieks@vt.edu wrote: > On Thu, 11 Oct 2012 17:34:24 +0200, Jiri Slaby said: >> On 10/11/2012 03:44 PM, Valdis.Kletnieks@vt.edu wrote: >>> So at least we know we're not hallucinating. :) >> >> Just a thought? Do you have raid? > > Nope, just a 160G laptop spinning hard drive. Filesystems are ext4 > on LVM on a cryptoLUKS partition on /dev/sda2. Ok, it's maybe compaction. Do you have CONFIG_COMPACTION=y? -- js suse labs From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756703Ab2JKWIU (ORCPT ); Thu, 11 Oct 2012 18:08:20 -0400 Received: from mail-wg0-f44.google.com ([74.125.82.44]:44734 "EHLO mail-wg0-f44.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752632Ab2JKWIR (ORCPT ); Thu, 11 Oct 2012 18:08:17 -0400 Message-ID: <5077434D.7080008@suse.cz> Date: Fri, 12 Oct 2012 00:08:13 +0200 From: Jiri Slaby User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/17.0 Thunderbird/17.0a2 MIME-Version: 1.0 To: Valdis.Kletnieks@vt.edu CC: Jiri Slaby , linux-mm@kvack.org, LKML Subject: Re: kswapd0: excessive CPU usage References: <507688CC.9000104@suse.cz> <106695.1349963080@turing-police.cc.vt.edu> <5076E700.2030909@suse.cz> <118079.1349978211@turing-police.cc.vt.edu> <50770905.5070904@suse.cz> <119175.1349979570@turing-police.cc.vt.edu> In-Reply-To: <119175.1349979570@turing-police.cc.vt.edu> X-Enigmail-Version: 1.5a1pre Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 10/11/2012 08:19 PM, Valdis.Kletnieks@vt.edu wrote: > # zgrep COMPAC /proc/config.gz > CONFIG_COMPACTION=y > > Hope that tells you something useful. It just supports my another theory. This seems to fix it for me: --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1830,8 +1830,8 @@ static inline bool should_continue_reclaim(struct lruvec *lruvec, */ pages_for_compaction = (2UL << sc->order); - pages_for_compaction = scale_for_compaction(pages_for_compaction, - lruvec, sc); +/* pages_for_compaction = scale_for_compaction(pages_for_compaction, + lruvec, sc);*/ inactive_lru_pages = get_lru_size(lruvec, LRU_INACTIVE_FILE); if (nr_swap_pages > 0) inactive_lru_pages += get_lru_size(lruvec, LRU_INACTIVE_ANON); And for you? (It's an effective revert of "mm: vmscan: scale number of pages reclaimed by reclaim/compaction based on failures".) regards, -- js suse labs From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1759433Ab2JKWOP (ORCPT ); Thu, 11 Oct 2012 18:14:15 -0400 Received: from mail.linuxfoundation.org ([140.211.169.12]:46467 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753486Ab2JKWOO (ORCPT ); Thu, 11 Oct 2012 18:14:14 -0400 Date: Thu, 11 Oct 2012 15:14:13 -0700 From: Andrew Morton To: Jiri Slaby Cc: linux-mm@kvack.org, LKML , Jiri Slaby Subject: Re: kswapd0: wxcessive CPU usage Message-Id: <20121011151413.3ab58542.akpm@linux-foundation.org> In-Reply-To: <507688CC.9000104@suse.cz> References: <507688CC.9000104@suse.cz> X-Mailer: Sylpheed 3.0.2 (GTK+ 2.20.1; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 11 Oct 2012 10:52:28 +0200 Jiri Slaby wrote: > with 3.6.0-next-20121008, kswapd0 is spinning my CPU at 100% for 1 > minute or so. If I try to suspend to RAM, this trace appears: > kswapd0 R running task 0 577 2 0x00000000 > 0000000000000000 00000000000000c0 cccccccccccccccd ffff8801c4146800 > ffff8801c4b15c88 ffffffff8116ee05 0000000000003e32 ffff8801c3a79000 > ffff8801c4b15ca8 ffffffff8116fdf8 ffff8801c480f398 ffff8801c3a79000 > Call Trace: > [] ? put_super+0x25/0x40 > [] ? grab_super_passive+0x24/0xa0 > [] ? prune_super+0x149/0x1b0 > [] ? shrink_slab+0xa1/0x2d0 > [] ? kswapd+0x66d/0xb60 > [] ? try_to_free_pages+0x180/0x180 > [] ? kthread+0xc0/0xd0 > [] ? kthread_create_on_node+0x130/0x130 > [] ? ret_from_fork+0x7c/0x90 > [] ? kthread_create_on_node+0x130/0x130 Could you please do a sysrq-T a few times while it's spinning, to confirm that this trace is consistently the culprit? From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1759501Ab2JKW0J (ORCPT ); Thu, 11 Oct 2012 18:26:09 -0400 Received: from mail-wi0-f172.google.com ([209.85.212.172]:48631 "EHLO mail-wi0-f172.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752632Ab2JKW0E (ORCPT ); Thu, 11 Oct 2012 18:26:04 -0400 Message-ID: <50774779.8000005@suse.cz> Date: Fri, 12 Oct 2012 00:26:01 +0200 From: Jiri Slaby User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/17.0 Thunderbird/17.0a2 MIME-Version: 1.0 To: Andrew Morton CC: Jiri Slaby , linux-mm@kvack.org, LKML Subject: Re: kswapd0: wxcessive CPU usage References: <507688CC.9000104@suse.cz> <20121011151413.3ab58542.akpm@linux-foundation.org> In-Reply-To: <20121011151413.3ab58542.akpm@linux-foundation.org> X-Enigmail-Version: 1.5a1pre Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 10/12/2012 12:14 AM, Andrew Morton wrote: > Could you please do a sysrq-T a few times while it's spinning, to > confirm that this trace is consistently the culprit? For me yes, shrink_slab is in the most of the traces. -- js suse labs From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933948Ab2JLMiF (ORCPT ); Fri, 12 Oct 2012 08:38:05 -0400 Received: from mail-wg0-f42.google.com ([74.125.82.42]:44847 "EHLO mail-wg0-f42.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932292Ab2JLMiB (ORCPT ); Fri, 12 Oct 2012 08:38:01 -0400 Message-ID: <50780F26.7070007@suse.cz> Date: Fri, 12 Oct 2012 14:37:58 +0200 From: Jiri Slaby User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/17.0 Thunderbird/17.0a2 MIME-Version: 1.0 To: Valdis.Kletnieks@vt.edu CC: Jiri Slaby , linux-mm@kvack.org, LKML , Mel Gorman , Andrew Morton Subject: Re: kswapd0: excessive CPU usage References: <507688CC.9000104@suse.cz> <106695.1349963080@turing-police.cc.vt.edu> <5076E700.2030909@suse.cz> <118079.1349978211@turing-police.cc.vt.edu> <50770905.5070904@suse.cz> <119175.1349979570@turing-police.cc.vt.edu> <5077434D.7080008@suse.cz> In-Reply-To: <5077434D.7080008@suse.cz> X-Enigmail-Version: 1.5a1pre Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 10/12/2012 12:08 AM, Jiri Slaby wrote: > (It's an effective revert of "mm: vmscan: scale number of pages > reclaimed by reclaim/compaction based on failures".) Given kswapd had hours of runtime in ps/top output yesterday in the morning and after the revert it's now 2 minutes in sum for the last 24h, I would say, it's gone. Mel, you wrote me it's unlikely the patch, but not impossible in the end. Can you take a look, please? If you need some trace-cmd output or anything, just let us know. This is x86_64, 6G of RAM, no swap. FWIW EXT4, SLUB, COMPACTION all enabled/used. thanks, -- js suse labs From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934074Ab2JLN5d (ORCPT ); Fri, 12 Oct 2012 09:57:33 -0400 Received: from cantor2.suse.de ([195.135.220.15]:46647 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933559Ab2JLN5b (ORCPT ); Fri, 12 Oct 2012 09:57:31 -0400 Date: Fri, 12 Oct 2012 14:57:26 +0100 From: Mel Gorman To: Jiri Slaby Cc: Valdis.Kletnieks@vt.edu, Jiri Slaby , linux-mm@kvack.org, LKML , Andrew Morton Subject: Re: kswapd0: excessive CPU usage Message-ID: <20121012135726.GY29125@suse.de> References: <507688CC.9000104@suse.cz> <106695.1349963080@turing-police.cc.vt.edu> <5076E700.2030909@suse.cz> <118079.1349978211@turing-police.cc.vt.edu> <50770905.5070904@suse.cz> <119175.1349979570@turing-police.cc.vt.edu> <5077434D.7080008@suse.cz> <50780F26.7070007@suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <50780F26.7070007@suse.cz> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Oct 12, 2012 at 02:37:58PM +0200, Jiri Slaby wrote: > On 10/12/2012 12:08 AM, Jiri Slaby wrote: > > (It's an effective revert of "mm: vmscan: scale number of pages > > reclaimed by reclaim/compaction based on failures".) > > Given kswapd had hours of runtime in ps/top output yesterday in the > morning and after the revert it's now 2 minutes in sum for the last 24h, > I would say, it's gone. > > Mel, you wrote me it's unlikely the patch, but not impossible in the > end. Can you take a look, please? If you need some trace-cmd output or > anything, just let us know. > > This is x86_64, 6G of RAM, no swap. FWIW EXT4, SLUB, COMPACTION all > enabled/used. > Can you monitor the behaviour of this patch please? Please keep a particular eye on kswapd activity and the amount of free memory. If free memory is spiking it might indicate that kswapd is still too aggressive with the loss of the __GFP_NO_KSWAPD flag. One way to tell is to record /proc/vmstat over time and see what the pgsteal_* figures look like. If they are climbing aggressively during what should be normal usage then it might show that kswapd is still too aggressive when asked to reclaim for THP. Thanks very much. ---8<--- mm: vmscan: scale number of pages reclaimed by reclaim/compaction only in direct reclaim Jiri Slaby reported the following: (It's an effective revert of "mm: vmscan: scale number of pages reclaimed by reclaim/compaction based on failures".) Given kswapd had hours of runtime in ps/top output yesterday in the morning and after the revert it's now 2 minutes in sum for the last 24h, I would say, it's gone. The intention of the patch in question was to compensate for the loss of lumpy reclaim. Part of the reason lumpy reclaim worked is because it aggressively reclaimed pages and this patch was meant to be a sane compromise. When compaction fails, it gets deferred and both compaction and reclaim/compaction is deferred avoid excessive reclaim. However, since commit c6543459 (mm: remove __GFP_NO_KSWAPD), kswapd is woken up each time and continues reclaiming which was not taken into account when the patch was developed. As it is not taking deferred compaction into account in this path it scans aggressively before falling out and making the compaction_deferred check in compaction_ready. This patch avoids kswapd scaling pages for reclaim and leaves the aggressive reclaim to the process attempting the THP allocation. Signed-off-by: Mel Gorman --- mm/vmscan.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 2624edc..2b7edfa 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1763,14 +1763,20 @@ static bool in_reclaim_compaction(struct scan_control *sc) #ifdef CONFIG_COMPACTION /* * If compaction is deferred for sc->order then scale the number of pages - * reclaimed based on the number of consecutive allocation failures + * reclaimed based on the number of consecutive allocation failures. This + * scaling only happens for direct reclaim as it is about to attempt + * compaction. If compaction fails, future allocations will be deferred + * and reclaim avoided. On the other hand, kswapd does not take compaction + * deferral into account so if it scaled, it could scan excessively even + * though allocations are temporarily not being attempted. */ static unsigned long scale_for_compaction(unsigned long pages_for_compaction, struct lruvec *lruvec, struct scan_control *sc) { struct zone *zone = lruvec_zone(lruvec); - if (zone->compact_order_failed <= sc->order) + if (zone->compact_order_failed <= sc->order && + !current_is_kswapd()) pages_for_compaction <<= zone->compact_defer_shift; return pages_for_compaction; } From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751922Ab2JOJyU (ORCPT ); Mon, 15 Oct 2012 05:54:20 -0400 Received: from mail-we0-f174.google.com ([74.125.82.174]:56270 "EHLO mail-we0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751605Ab2JOJyS (ORCPT ); Mon, 15 Oct 2012 05:54:18 -0400 Message-ID: <507BDD45.1070705@suse.cz> Date: Mon, 15 Oct 2012 11:54:13 +0200 From: Jiri Slaby User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/17.0 Thunderbird/17.0 MIME-Version: 1.0 To: Mel Gorman CC: Valdis.Kletnieks@vt.edu, Jiri Slaby , linux-mm@kvack.org, LKML , Andrew Morton Subject: Re: kswapd0: excessive CPU usage References: <507688CC.9000104@suse.cz> <106695.1349963080@turing-police.cc.vt.edu> <5076E700.2030909@suse.cz> <118079.1349978211@turing-police.cc.vt.edu> <50770905.5070904@suse.cz> <119175.1349979570@turing-police.cc.vt.edu> <5077434D.7080008@suse.cz> <50780F26.7070007@suse.cz> <20121012135726.GY29125@suse.de> In-Reply-To: <20121012135726.GY29125@suse.de> X-Enigmail-Version: 1.5a1pre Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 10/12/2012 03:57 PM, Mel Gorman wrote: > mm: vmscan: scale number of pages reclaimed by reclaim/compaction only in direct reclaim > > Jiri Slaby reported the following: > > (It's an effective revert of "mm: vmscan: scale number of pages > reclaimed by reclaim/compaction based on failures".) > Given kswapd had hours of runtime in ps/top output yesterday in the > morning and after the revert it's now 2 minutes in sum for the last 24h, > I would say, it's gone. > > The intention of the patch in question was to compensate for the loss of > lumpy reclaim. Part of the reason lumpy reclaim worked is because it > aggressively reclaimed pages and this patch was meant to be a > sane compromise. > > When compaction fails, it gets deferred and both compaction and > reclaim/compaction is deferred avoid excessive reclaim. However, since > commit c6543459 (mm: remove __GFP_NO_KSWAPD), kswapd is woken up each time > and continues reclaiming which was not taken into account when the patch > was developed. > > As it is not taking deferred compaction into account in this path it scans > aggressively before falling out and making the compaction_deferred check in > compaction_ready. This patch avoids kswapd scaling pages for reclaim and > leaves the aggressive reclaim to the process attempting the THP > allocation. > > Signed-off-by: Mel Gorman > --- > mm/vmscan.c | 10 ++++++++-- > 1 file changed, 8 insertions(+), 2 deletions(-) > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 2624edc..2b7edfa 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -1763,14 +1763,20 @@ static bool in_reclaim_compaction(struct scan_control *sc) > #ifdef CONFIG_COMPACTION > /* > * If compaction is deferred for sc->order then scale the number of pages > - * reclaimed based on the number of consecutive allocation failures > + * reclaimed based on the number of consecutive allocation failures. This > + * scaling only happens for direct reclaim as it is about to attempt > + * compaction. If compaction fails, future allocations will be deferred > + * and reclaim avoided. On the other hand, kswapd does not take compaction > + * deferral into account so if it scaled, it could scan excessively even > + * though allocations are temporarily not being attempted. > */ > static unsigned long scale_for_compaction(unsigned long pages_for_compaction, > struct lruvec *lruvec, struct scan_control *sc) > { > struct zone *zone = lruvec_zone(lruvec); > > - if (zone->compact_order_failed <= sc->order) > + if (zone->compact_order_failed <= sc->order && > + !current_is_kswapd()) > pages_for_compaction <<= zone->compact_defer_shift; > return pages_for_compaction; > } Yes, applying this instead of the revert fixes the issue as well. thanks, -- js suse labs From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752325Ab2JOLJm (ORCPT ); Mon, 15 Oct 2012 07:09:42 -0400 Received: from cantor2.suse.de ([195.135.220.15]:60279 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752262Ab2JOLJk (ORCPT ); Mon, 15 Oct 2012 07:09:40 -0400 Date: Mon, 15 Oct 2012 12:09:37 +0100 From: Mel Gorman To: Jiri Slaby Cc: Valdis.Kletnieks@vt.edu, Jiri Slaby , linux-mm@kvack.org, LKML , Andrew Morton Subject: Re: kswapd0: excessive CPU usage Message-ID: <20121015110937.GE29125@suse.de> References: <507688CC.9000104@suse.cz> <106695.1349963080@turing-police.cc.vt.edu> <5076E700.2030909@suse.cz> <118079.1349978211@turing-police.cc.vt.edu> <50770905.5070904@suse.cz> <119175.1349979570@turing-police.cc.vt.edu> <5077434D.7080008@suse.cz> <50780F26.7070007@suse.cz> <20121012135726.GY29125@suse.de> <507BDD45.1070705@suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <507BDD45.1070705@suse.cz> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Oct 15, 2012 at 11:54:13AM +0200, Jiri Slaby wrote: > On 10/12/2012 03:57 PM, Mel Gorman wrote: > > mm: vmscan: scale number of pages reclaimed by reclaim/compaction only in direct reclaim > > > > Jiri Slaby reported the following: > > > > (It's an effective revert of "mm: vmscan: scale number of pages > > reclaimed by reclaim/compaction based on failures".) > > Given kswapd had hours of runtime in ps/top output yesterday in the > > morning and after the revert it's now 2 minutes in sum for the last 24h, > > I would say, it's gone. > > > > The intention of the patch in question was to compensate for the loss of > > lumpy reclaim. Part of the reason lumpy reclaim worked is because it > > aggressively reclaimed pages and this patch was meant to be a > > sane compromise. > > > > When compaction fails, it gets deferred and both compaction and > > reclaim/compaction is deferred avoid excessive reclaim. However, since > > commit c6543459 (mm: remove __GFP_NO_KSWAPD), kswapd is woken up each time > > and continues reclaiming which was not taken into account when the patch > > was developed. > > > > As it is not taking deferred compaction into account in this path it scans > > aggressively before falling out and making the compaction_deferred check in > > compaction_ready. This patch avoids kswapd scaling pages for reclaim and > > leaves the aggressive reclaim to the process attempting the THP > > allocation. > > > > Signed-off-by: Mel Gorman > > --- > > mm/vmscan.c | 10 ++++++++-- > > 1 file changed, 8 insertions(+), 2 deletions(-) > > > > diff --git a/mm/vmscan.c b/mm/vmscan.c > > index 2624edc..2b7edfa 100644 > > --- a/mm/vmscan.c > > +++ b/mm/vmscan.c > > @@ -1763,14 +1763,20 @@ static bool in_reclaim_compaction(struct scan_control *sc) > > #ifdef CONFIG_COMPACTION > > /* > > * If compaction is deferred for sc->order then scale the number of pages > > - * reclaimed based on the number of consecutive allocation failures > > + * reclaimed based on the number of consecutive allocation failures. This > > + * scaling only happens for direct reclaim as it is about to attempt > > + * compaction. If compaction fails, future allocations will be deferred > > + * and reclaim avoided. On the other hand, kswapd does not take compaction > > + * deferral into account so if it scaled, it could scan excessively even > > + * though allocations are temporarily not being attempted. > > */ > > static unsigned long scale_for_compaction(unsigned long pages_for_compaction, > > struct lruvec *lruvec, struct scan_control *sc) > > { > > struct zone *zone = lruvec_zone(lruvec); > > > > - if (zone->compact_order_failed <= sc->order) > > + if (zone->compact_order_failed <= sc->order && > > + !current_is_kswapd()) > > pages_for_compaction <<= zone->compact_defer_shift; > > return pages_for_compaction; > > } > > Yes, applying this instead of the revert fixes the issue as well. > Thanks Jiri. -- Mel Gorman SUSE Labs From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758536Ab2J2LAY (ORCPT ); Mon, 29 Oct 2012 07:00:24 -0400 Received: from basicbox7.server-home.net ([195.137.212.29]:59698 "EHLO basicbox7.server-home.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753795Ab2J2LAX (ORCPT ); Mon, 29 Oct 2012 07:00:23 -0400 X-Greylist: delayed 495 seconds by postgrey-1.27 at vger.kernel.org; Mon, 29 Oct 2012 07:00:23 EDT Message-ID: <508E5FD3.1060105@leemhuis.info> Date: Mon, 29 Oct 2012 11:52:03 +0100 From: Thorsten Leemhuis User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:16.0) Gecko/20121016 Thunderbird/16.0.1 MIME-Version: 1.0 To: Mel Gorman CC: Jiri Slaby , Valdis.Kletnieks@vt.edu, Jiri Slaby , linux-mm@kvack.org, LKML , Andrew Morton Subject: Re: kswapd0: excessive CPU usage References: <507688CC.9000104@suse.cz> <106695.1349963080@turing-police.cc.vt.edu> <5076E700.2030909@suse.cz> <118079.1349978211@turing-police.cc.vt.edu> <50770905.5070904@suse.cz> <119175.1349979570@turing-police.cc.vt.edu> <5077434D.7080008@suse.cz> <50780F26.7070007@suse.cz> <20121012135726.GY29125@suse.de> <507BDD45.1070705@suse.cz> <20121015110937.GE29125@suse.de> In-Reply-To: <20121015110937.GE29125@suse.de> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi! On 15.10.2012 13:09, Mel Gorman wrote: > On Mon, Oct 15, 2012 at 11:54:13AM +0200, Jiri Slaby wrote: >> On 10/12/2012 03:57 PM, Mel Gorman wrote: >>> mm: vmscan: scale number of pages reclaimed by reclaim/compaction only in direct reclaim >>> Jiri Slaby reported the following: > [...] >>> diff --git a/mm/vmscan.c b/mm/vmscan.c >>> index 2624edc..2b7edfa 100644 >>> --- a/mm/vmscan.c >>> +++ b/mm/vmscan.c >>> @@ -1763,14 +1763,20 @@ static bool in_reclaim_compaction(struct scan_control *sc) >>> #ifdef CONFIG_COMPACTION >>> /* >>> * If compaction is deferred for sc->order then scale the number of pages >>> - * reclaimed based on the number of consecutive allocation failures >>> + * reclaimed based on the number of consecutive allocation failures. This >>> + * scaling only happens for direct reclaim as it is about to attempt >>> + * compaction. If compaction fails, future allocations will be deferred >>> + * and reclaim avoided. On the other hand, kswapd does not take compaction >>> + * deferral into account so if it scaled, it could scan excessively even >>> + * though allocations are temporarily not being attempted. >>> */ >>> static unsigned long scale_for_compaction(unsigned long pages_for_compaction, >>> struct lruvec *lruvec, struct scan_control *sc) >>> { >>> struct zone *zone = lruvec_zone(lruvec); >>> >>> - if (zone->compact_order_failed <= sc->order) >>> + if (zone->compact_order_failed <= sc->order && >>> + !current_is_kswapd()) >>> pages_for_compaction <<= zone->compact_defer_shift; >>> return pages_for_compaction; >>> } >> Yes, applying this instead of the revert fixes the issue as well. Just wondering, is there a reason why this patch wasn't applied to mainline? Did it simply fall through the cracks? Or am I missing something? I'm asking because I think I stil see the issue on 3.7-rc2-git-checkout-from-friday. Seems Fedora rawhide users are hitting it, too: https://bugzilla.redhat.com/show_bug.cgi?id=866988 Or are we seeing something different which just looks similar? I can test the patch if it needs further testing, but from the discussion I got the impression that everything is clear and the patch ready for merging. CU knurd From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755530Ab2J3TSu (ORCPT ); Tue, 30 Oct 2012 15:18:50 -0400 Received: from cantor2.suse.de ([195.135.220.15]:41702 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752110Ab2J3TSt (ORCPT ); Tue, 30 Oct 2012 15:18:49 -0400 Date: Tue, 30 Oct 2012 19:18:43 +0000 From: Mel Gorman To: Thorsten Leemhuis Cc: Jiri Slaby , Valdis.Kletnieks@vt.edu, Jiri Slaby , linux-mm@kvack.org, LKML , Andrew Morton Subject: Re: kswapd0: excessive CPU usage Message-ID: <20121030191843.GH3888@suse.de> References: <5076E700.2030909@suse.cz> <118079.1349978211@turing-police.cc.vt.edu> <50770905.5070904@suse.cz> <119175.1349979570@turing-police.cc.vt.edu> <5077434D.7080008@suse.cz> <50780F26.7070007@suse.cz> <20121012135726.GY29125@suse.de> <507BDD45.1070705@suse.cz> <20121015110937.GE29125@suse.de> <508E5FD3.1060105@leemhuis.info> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <508E5FD3.1060105@leemhuis.info> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Oct 29, 2012 at 11:52:03AM +0100, Thorsten Leemhuis wrote: > Hi! > > On 15.10.2012 13:09, Mel Gorman wrote: > >On Mon, Oct 15, 2012 at 11:54:13AM +0200, Jiri Slaby wrote: > >>On 10/12/2012 03:57 PM, Mel Gorman wrote: > >>>mm: vmscan: scale number of pages reclaimed by reclaim/compaction only in direct reclaim > >>>Jiri Slaby reported the following: > > [...] > >>>diff --git a/mm/vmscan.c b/mm/vmscan.c > >>>index 2624edc..2b7edfa 100644 > >>>--- a/mm/vmscan.c > >>>+++ b/mm/vmscan.c > >>>@@ -1763,14 +1763,20 @@ static bool in_reclaim_compaction(struct scan_control *sc) > >>> #ifdef CONFIG_COMPACTION > >>> /* > >>> * If compaction is deferred for sc->order then scale the number of pages > >>>- * reclaimed based on the number of consecutive allocation failures > >>>+ * reclaimed based on the number of consecutive allocation failures. This > >>>+ * scaling only happens for direct reclaim as it is about to attempt > >>>+ * compaction. If compaction fails, future allocations will be deferred > >>>+ * and reclaim avoided. On the other hand, kswapd does not take compaction > >>>+ * deferral into account so if it scaled, it could scan excessively even > >>>+ * though allocations are temporarily not being attempted. > >>> */ > >>> static unsigned long scale_for_compaction(unsigned long pages_for_compaction, > >>> struct lruvec *lruvec, struct scan_control *sc) > >>> { > >>> struct zone *zone = lruvec_zone(lruvec); > >>> > >>>- if (zone->compact_order_failed <= sc->order) > >>>+ if (zone->compact_order_failed <= sc->order && > >>>+ !current_is_kswapd()) > >>> pages_for_compaction <<= zone->compact_defer_shift; > >>> return pages_for_compaction; > >>> } > >>Yes, applying this instead of the revert fixes the issue as well. > > Just wondering, is there a reason why this patch wasn't applied to > mainline? Did it simply fall through the cracks? Or am I missing > something? > It's because a problem was reported related to the patch (off-list, whoops). I'm waiting to hear if a second patch fixes the problem or not. > I'm asking because I think I stil see the issue on > 3.7-rc2-git-checkout-from-friday. Seems Fedora rawhide users are > hitting it, too: > https://bugzilla.redhat.com/show_bug.cgi?id=866988 > I like the steps to reproduce. Is step 3 profit? > Or are we seeing something different which just looks similar? I can > test the patch if it needs further testing, but from the discussion > I got the impression that everything is clear and the patch ready > for merging. It could be the same issue. Can you test with the "mm: vmscan: scale number of pages reclaimed by reclaim/compaction only in direct reclaim" patch and the following on top please? Thanks. ---8<--- mm: page_alloc: Do not wake kswapd if the request is for THP but deferred Since commit c6543459 (mm: remove __GFP_NO_KSWAPD), kswapd gets woken for every THP request in the slow path. If compaction has been deferred the waker will not compact or enter direct reclaim on its own behalf but kswapd is still woken to reclaim free pages that no one may consume. If compaction was deferred because pages and slab was not reclaimable then kswapd is just consuming cycles for no gain. This patch avoids waking kswapd if the compaction has been deferred. It'll still wake when compaction is running to reduce the latency of THP allocations. Signed-off-by: Mel Gorman --- mm/page_alloc.c | 21 +++++++++++++++++++-- 1 file changed, 19 insertions(+), 2 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index bb90971..e72674c 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2378,6 +2378,15 @@ bool gfp_pfmemalloc_allowed(gfp_t gfp_mask) return !!(gfp_to_alloc_flags(gfp_mask) & ALLOC_NO_WATERMARKS); } +/* Returns true if the allocation is likely for THP */ +static bool is_thp_alloc(gfp_t gfp_mask, unsigned int order) +{ + if (order == pageblock_order && + (gfp_mask & (__GFP_MOVABLE|__GFP_REPEAT)) == __GFP_MOVABLE) + return true; + return false; +} + static inline struct page * __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, struct zonelist *zonelist, enum zone_type high_zoneidx, @@ -2416,7 +2425,15 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, goto nopage; restart: - wake_all_kswapd(order, zonelist, high_zoneidx, + /* + * kswapd is woken except when this is a THP request and compaction + * is deferred. If we are backing off reclaim/compaction then kswapd + * should not be awake aggressively reclaiming with no consumers of + * the freed pages + */ + if (!(is_thp_alloc(gfp_mask, order) && + compaction_deferred(preferred_zone, order))) + wake_all_kswapd(order, zonelist, high_zoneidx, zone_idx(preferred_zone)); /* @@ -2494,7 +2511,7 @@ rebalance: * system then fail the allocation instead of entering direct reclaim. */ if ((deferred_compaction || contended_compaction) && - (gfp_mask & (__GFP_MOVABLE|__GFP_REPEAT)) == __GFP_MOVABLE) + is_thp_alloc(gfp_mask, order)) goto nopage; /* Try direct reclaim and then allocating */ From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1422762Ab2JaLZY (ORCPT ); Wed, 31 Oct 2012 07:25:24 -0400 Received: from basicbox7.server-home.net ([195.137.212.29]:44400 "EHLO basicbox7.server-home.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1422661Ab2JaLZR (ORCPT ); Wed, 31 Oct 2012 07:25:17 -0400 Message-ID: <50910A99.5050707@leemhuis.info> Date: Wed, 31 Oct 2012 12:25:13 +0100 From: Thorsten Leemhuis User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:16.0) Gecko/20121016 Thunderbird/16.0.1 MIME-Version: 1.0 To: Mel Gorman CC: Jiri Slaby , Valdis.Kletnieks@vt.edu, Jiri Slaby , linux-mm@kvack.org, LKML , Andrew Morton Subject: Re: kswapd0: excessive CPU usage References: <5076E700.2030909@suse.cz> <118079.1349978211@turing-police.cc.vt.edu> <50770905.5070904@suse.cz> <119175.1349979570@turing-police.cc.vt.edu> <5077434D.7080008@suse.cz> <50780F26.7070007@suse.cz> <20121012135726.GY29125@suse.de> <507BDD45.1070705@suse.cz> <20121015110937.GE29125@suse.de> <508E5FD3.1060105@leemhuis.info> <20121030191843.GH3888@suse.de> In-Reply-To: <20121030191843.GH3888@suse.de> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 30.10.2012 20:18, Mel Gorman wrote: > On Mon, Oct 29, 2012 at 11:52:03AM +0100, Thorsten Leemhuis wrote: >> On 15.10.2012 13:09, Mel Gorman wrote: >>> On Mon, Oct 15, 2012 at 11:54:13AM +0200, Jiri Slaby wrote: >>>> On 10/12/2012 03:57 PM, Mel Gorman wrote: >>>>> mm: vmscan: scale number of pages reclaimed by reclaim/compaction only in direct reclaim >>>>> Jiri Slaby reported the following: > [...] >>>> Yes, applying this instead of the revert fixes the issue as well. >> Just wondering, is there a reason why this patch wasn't applied to >> mainline? Did it simply fall through the cracks? Or am I missing >> something? > It's because a problem was reported related to the patch (off-list, > whoops). I'm waiting to hear if a second patch fixes the problem or not. Anything in particular I should look out for while testing? >> I'm asking because I think I stil see the issue on >> 3.7-rc2-git-checkout-from-friday. Seems Fedora rawhide users are >> hitting it, too: >> https://bugzilla.redhat.com/show_bug.cgi?id=866988 > I like the steps to reproduce. One of those cases where the bugzilla bug template was not very helpful or where it was not used as intended (you decide) :-) > Is step 3 profit? Yes, but psst, don't tell anyone; step 4 (world domination! for real!) is also hidden to keep that part of the big plan a secret for now ;-) >> Or are we seeing something different which just looks similar? I can >> test the patch if it needs further testing, but from the discussion >> I got the impression that everything is clear and the patch ready >> for merging. > It could be the same issue. Can you test with the "mm: vmscan: scale > number of pages reclaimed by reclaim/compaction only in direct reclaim" > patch and the following on top please? Built a vanilla mainline kernel with those two patches and installed it on the machine where I was seeing problems high kswapd0 load on 3.7-rc3. Ran it an hour yesterday and a few hours today; seems the patches fix the issue for me as kswapd behaves: $ LC_ALL=C ps -aux | grep 'kswapd' root 62 0.0 0.0 0 0 ? S Oct30 0:05 [kswapd0] So everything is looking fine again so far thx to the two patches -- hopefully it stays that way even after hitting "send" in my mailer in a few seconds. CU knurd From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1423081Ab2JaPEr (ORCPT ); Wed, 31 Oct 2012 11:04:47 -0400 Received: from cantor2.suse.de ([195.135.220.15]:54706 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757743Ab2JaPEp (ORCPT ); Wed, 31 Oct 2012 11:04:45 -0400 Date: Wed, 31 Oct 2012 15:04:38 +0000 From: Mel Gorman To: Thorsten Leemhuis Cc: Jiri Slaby , Valdis.Kletnieks@vt.edu, Jiri Slaby , linux-mm@kvack.org, LKML , Andrew Morton Subject: Re: kswapd0: excessive CPU usage Message-ID: <20121031150438.GK3888@suse.de> References: <50770905.5070904@suse.cz> <119175.1349979570@turing-police.cc.vt.edu> <5077434D.7080008@suse.cz> <50780F26.7070007@suse.cz> <20121012135726.GY29125@suse.de> <507BDD45.1070705@suse.cz> <20121015110937.GE29125@suse.de> <508E5FD3.1060105@leemhuis.info> <20121030191843.GH3888@suse.de> <50910A99.5050707@leemhuis.info> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <50910A99.5050707@leemhuis.info> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Oct 31, 2012 at 12:25:13PM +0100, Thorsten Leemhuis wrote: > On 30.10.2012 20:18, Mel Gorman wrote: > >On Mon, Oct 29, 2012 at 11:52:03AM +0100, Thorsten Leemhuis wrote: > >>On 15.10.2012 13:09, Mel Gorman wrote: > >>>On Mon, Oct 15, 2012 at 11:54:13AM +0200, Jiri Slaby wrote: > >>>>On 10/12/2012 03:57 PM, Mel Gorman wrote: > >>>>>mm: vmscan: scale number of pages reclaimed by reclaim/compaction only in direct reclaim > >>>>>Jiri Slaby reported the following: > >[...] > >>>>Yes, applying this instead of the revert fixes the issue as well. > >>Just wondering, is there a reason why this patch wasn't applied to > >>mainline? Did it simply fall through the cracks? Or am I missing > >>something? > >It's because a problem was reported related to the patch (off-list, > >whoops). I'm waiting to hear if a second patch fixes the problem or not. > > Anything in particular I should look out for while testing? > Excessive reclaim, high CPU usage by kswapd, processes getting stick in isolate_migratepages or isolate_freepages. > >>I'm asking because I think I stil see the issue on > >>3.7-rc2-git-checkout-from-friday. Seems Fedora rawhide users are > >>hitting it, too: > >>https://bugzilla.redhat.com/show_bug.cgi?id=866988 > >I like the steps to reproduce. > > One of those cases where the bugzilla bug template was not very > helpful or where it was not used as intended (you decide) :-) > It wins at entertainment value if nothing else :) > >Is step 3 profit? > > Yes, but psst, don't tell anyone; step 4 (world domination! for > real!) is also hidden to keep that part of the big plan a secret for > now ;-) > No doubt it's the default private comment #1 ! > >>Or are we seeing something different which just looks similar? I can > >>test the patch if it needs further testing, but from the discussion > >>I got the impression that everything is clear and the patch ready > >>for merging. > >It could be the same issue. Can you test with the "mm: vmscan: scale > >number of pages reclaimed by reclaim/compaction only in direct reclaim" > >patch and the following on top please? > > Built a vanilla mainline kernel with those two patches and installed > it on the machine where I was seeing problems high kswapd0 load on > 3.7-rc3. Ran it an hour yesterday and a few hours today; seems the > patches fix the issue for me as kswapd behaves: > > $ LC_ALL=C ps -aux | grep 'kswapd' > root 62 0.0 0.0 0 0 ? S Oct30 0:05 [kswapd0] > > So everything is looking fine again so far thx to the two patches > -- hopefully it stays that way even after hitting "send" in my > mailer in a few seconds. > Ok, great. Keep an eye on it please. If Jiri Slaby reports similar success then I'll collapse the two patches together and resend to Andrew. Thanks. -- Mel Gorman SUSE Labs From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756100Ab2KBKoR (ORCPT ); Fri, 2 Nov 2012 06:44:17 -0400 Received: from mx1.redhat.com ([209.132.183.28]:28287 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755185Ab2KBKoP (ORCPT ); Fri, 2 Nov 2012 06:44:15 -0400 Message-ID: <5093A3F4.8090108@redhat.com> Date: Fri, 02 Nov 2012 11:44:04 +0100 From: Zdenek Kabelac Organization: Red Hat User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:16.0) Gecko/20121016 Thunderbird/16.0.1 MIME-Version: 1.0 To: Mel Gorman CC: Jiri Slaby , Valdis.Kletnieks@vt.edu, Jiri Slaby , linux-mm@kvack.org, LKML , Andrew Morton Subject: Re: kswapd0: excessive CPU usage References: <507688CC.9000104@suse.cz> <106695.1349963080@turing-police.cc.vt.edu> <5076E700.2030909@suse.cz> <118079.1349978211@turing-police.cc.vt.edu> <50770905.5070904@suse.cz> <119175.1349979570@turing-police.cc.vt.edu> <5077434D.7080008@suse.cz> <50780F26.7070007@suse.cz> <20121012135726.GY29125@suse.de> <507BDD45.1070705@suse.cz> <20121015110937.GE29125@suse.de> In-Reply-To: <20121015110937.GE29125@suse.de> Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Dne 15.10.2012 13:09, Mel Gorman napsal(a): > On Mon, Oct 15, 2012 at 11:54:13AM +0200, Jiri Slaby wrote: >> On 10/12/2012 03:57 PM, Mel Gorman wrote: >>> mm: vmscan: scale number of pages reclaimed by reclaim/compaction only in direct reclaim >>> >>> Jiri Slaby reported the following: >>> >>> (It's an effective revert of "mm: vmscan: scale number of pages >>> reclaimed by reclaim/compaction based on failures".) >>> Given kswapd had hours of runtime in ps/top output yesterday in the >>> morning and after the revert it's now 2 minutes in sum for the last 24h, >>> I would say, it's gone. >>> >>> The intention of the patch in question was to compensate for the loss of >>> lumpy reclaim. Part of the reason lumpy reclaim worked is because it >>> aggressively reclaimed pages and this patch was meant to be a >>> sane compromise. >>> >>> When compaction fails, it gets deferred and both compaction and >>> reclaim/compaction is deferred avoid excessive reclaim. However, since >>> commit c6543459 (mm: remove __GFP_NO_KSWAPD), kswapd is woken up each time >>> and continues reclaiming which was not taken into account when the patch >>> was developed. >>> >>> As it is not taking deferred compaction into account in this path it scans >>> aggressively before falling out and making the compaction_deferred check in >>> compaction_ready. This patch avoids kswapd scaling pages for reclaim and >>> leaves the aggressive reclaim to the process attempting the THP >>> allocation. >>> >>> Signed-off-by: Mel Gorman >>> --- >>> mm/vmscan.c | 10 ++++++++-- >>> 1 file changed, 8 insertions(+), 2 deletions(-) >>> >>> diff --git a/mm/vmscan.c b/mm/vmscan.c >>> index 2624edc..2b7edfa 100644 >>> --- a/mm/vmscan.c >>> +++ b/mm/vmscan.c >>> @@ -1763,14 +1763,20 @@ static bool in_reclaim_compaction(struct scan_control *sc) >>> #ifdef CONFIG_COMPACTION >>> /* >>> * If compaction is deferred for sc->order then scale the number of pages >>> - * reclaimed based on the number of consecutive allocation failures >>> + * reclaimed based on the number of consecutive allocation failures. This >>> + * scaling only happens for direct reclaim as it is about to attempt >>> + * compaction. If compaction fails, future allocations will be deferred >>> + * and reclaim avoided. On the other hand, kswapd does not take compaction >>> + * deferral into account so if it scaled, it could scan excessively even >>> + * though allocations are temporarily not being attempted. >>> */ >>> static unsigned long scale_for_compaction(unsigned long pages_for_compaction, >>> struct lruvec *lruvec, struct scan_control *sc) >>> { >>> struct zone *zone = lruvec_zone(lruvec); >>> >>> - if (zone->compact_order_failed <= sc->order) >>> + if (zone->compact_order_failed <= sc->order && >>> + !current_is_kswapd()) >>> pages_for_compaction <<= zone->compact_defer_shift; >>> return pages_for_compaction; >>> } >> >> Yes, applying this instead of the revert fixes the issue as well. >> > I've applied this patch on 3.7.0-rc3 kernel - and I still see excessive CPU usage - mainly after suspend/resume Here is just simple kswapd backtrace from running kernel: kswapd0 R running task 0 30 2 0x00000000 ffff8801331ddae8 0000000000000082 ffff880135b8a340 0000000000000008 ffff880135b8a340 ffff8801331ddfd8 ffff8801331ddfd8 ffff8801331ddfd8 ffff880071db8000 ffff880135b8a340 0000000000000286 ffff8801331dc000 Call Trace: [] preempt_schedule+0x42/0x60 [] _raw_spin_unlock+0x55/0x60 [] put_super+0x31/0x40 [] drop_super+0x22/0x30 [] prune_super+0x149/0x1b0 [] shrink_slab+0xba/0x510 [] ? mem_cgroup_iter+0x17a/0x2e0 [] ? mem_cgroup_iter+0xca/0x2e0 [] balance_pgdat+0x629/0x7f0 [] kswapd+0x174/0x620 [] ? __init_waitqueue_head+0x60/0x60 [] ? balance_pgdat+0x7f0/0x7f0 [] kthread+0xdb/0xe0 [] ? kthread_create_on_node+0x140/0x140 [] ret_from_fork+0x7c/0xb0 [] ? kthread_create_on_node+0x140/0x140 Zdenek From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1761032Ab2KBKxm (ORCPT ); Fri, 2 Nov 2012 06:53:42 -0400 Received: from mail-ee0-f46.google.com ([74.125.83.46]:52860 "EHLO mail-ee0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754632Ab2KBKxl (ORCPT ); Fri, 2 Nov 2012 06:53:41 -0400 Message-ID: <5093A631.5020209@suse.cz> Date: Fri, 02 Nov 2012 11:53:37 +0100 From: Jiri Slaby User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/17.0 Thunderbird/17.0 MIME-Version: 1.0 To: Zdenek Kabelac CC: Mel Gorman , Valdis.Kletnieks@vt.edu, Jiri Slaby , linux-mm@kvack.org, LKML , Andrew Morton Subject: Re: kswapd0: excessive CPU usage References: <507688CC.9000104@suse.cz> <106695.1349963080@turing-police.cc.vt.edu> <5076E700.2030909@suse.cz> <118079.1349978211@turing-police.cc.vt.edu> <50770905.5070904@suse.cz> <119175.1349979570@turing-police.cc.vt.edu> <5077434D.7080008@suse.cz> <50780F26.7070007@suse.cz> <20121012135726.GY29125@suse.de> <507BDD45.1070705@suse.cz> <20121015110937.GE29125@suse.de> <5093A3F4.8090108@redhat.com> In-Reply-To: <5093A3F4.8090108@redhat.com> X-Enigmail-Version: 1.5a1pre Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 11/02/2012 11:44 AM, Zdenek Kabelac wrote: >>> Yes, applying this instead of the revert fixes the issue as well. > > I've applied this patch on 3.7.0-rc3 kernel - and I still see excessive > CPU usage - mainly after suspend/resume > > Here is just simple kswapd backtrace from running kernel: Yup, this is what we were seeing with the former patch only too. Try to apply the other one too: https://patchwork.kernel.org/patch/1673231/ For me I would say, it is fixed by the two patches now. I won't be able to report later, since I'm leaving to a conference tomorrow. > kswapd0 R running task 0 30 2 0x00000000 ... > [] shrink_slab+0xba/0x510 thanks, -- js suse labs From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752093Ab2KBTpO (ORCPT ); Fri, 2 Nov 2012 15:45:14 -0400 Received: from mail-ee0-f46.google.com ([74.125.83.46]:45963 "EHLO mail-ee0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750982Ab2KBTpM (ORCPT ); Fri, 2 Nov 2012 15:45:12 -0400 Message-ID: <509422C3.1000803@suse.cz> Date: Fri, 02 Nov 2012 20:45:07 +0100 From: Jiri Slaby User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/17.0 Thunderbird/17.0 MIME-Version: 1.0 To: Mel Gorman CC: Zdenek Kabelac , Valdis.Kletnieks@vt.edu, Jiri Slaby , linux-mm@kvack.org, LKML , Andrew Morton Subject: Re: kswapd0: excessive CPU usage References: <507688CC.9000104@suse.cz> <106695.1349963080@turing-police.cc.vt.edu> <5076E700.2030909@suse.cz> <118079.1349978211@turing-police.cc.vt.edu> <50770905.5070904@suse.cz> <119175.1349979570@turing-police.cc.vt.edu> <5077434D.7080008@suse.cz> <50780F26.7070007@suse.cz> <20121012135726.GY29125@suse.de> <507BDD45.1070705@suse.cz> <20121015110937.GE29125@suse.de> <5093A3F4.8090108@redhat.com> <5093A631.5020209@suse.cz> In-Reply-To: <5093A631.5020209@suse.cz> X-Enigmail-Version: 1.5a1pre Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 11/02/2012 11:53 AM, Jiri Slaby wrote: > On 11/02/2012 11:44 AM, Zdenek Kabelac wrote: >>>> Yes, applying this instead of the revert fixes the issue as well. >> >> I've applied this patch on 3.7.0-rc3 kernel - and I still see excessive >> CPU usage - mainly after suspend/resume >> >> Here is just simple kswapd backtrace from running kernel: > > Yup, this is what we were seeing with the former patch only too. Try to > apply the other one too: > https://patchwork.kernel.org/patch/1673231/ > > For me I would say, it is fixed by the two patches now. I won't be able > to report later, since I'm leaving to a conference tomorrow. Damn it. It recurred right now, with both patches applied. After I started a java program which consumed some more memory. Though there are still 2 gigs free, kswap is spinning: [] __cond_resched+0x2a/0x40 [] shrink_slab+0x1c0/0x2d0 [] kswapd+0x66d/0xb60 [] kthread+0xc0/0xd0 [] ret_from_fork+0x7c/0xb0 [] 0xffffffffffffffff -- js suse labs From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753332Ab2KDL0s (ORCPT ); Sun, 4 Nov 2012 06:26:48 -0500 Received: from mx1.redhat.com ([209.132.183.28]:10907 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751951Ab2KDL0p (ORCPT ); Sun, 4 Nov 2012 06:26:45 -0500 Message-ID: <509650EA.5060508@redhat.com> Date: Sun, 04 Nov 2012 12:26:34 +0100 From: Zdenek Kabelac Organization: Red Hat User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:16.0) Gecko/20121016 Thunderbird/16.0.1 MIME-Version: 1.0 To: Jiri Slaby CC: Mel Gorman , Valdis.Kletnieks@vt.edu, Jiri Slaby , linux-mm@kvack.org, LKML , Andrew Morton Subject: Re: kswapd0: excessive CPU usage References: <507688CC.9000104@suse.cz> <106695.1349963080@turing-police.cc.vt.edu> <5076E700.2030909@suse.cz> <118079.1349978211@turing-police.cc.vt.edu> <50770905.5070904@suse.cz> <119175.1349979570@turing-police.cc.vt.edu> <5077434D.7080008@suse.cz> <50780F26.7070007@suse.cz> <20121012135726.GY29125@suse.de> <507BDD45.1070705@suse.cz> <20121015110937.GE29125@suse.de> <5093A3F4.8090108@redhat.com> <5093A631.5020209@suse.cz> <509422C3.1000803@suse.cz> In-Reply-To: <509422C3.1000803@suse.cz> Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Dne 2.11.2012 20:45, Jiri Slaby napsal(a): > On 11/02/2012 11:53 AM, Jiri Slaby wrote: >> On 11/02/2012 11:44 AM, Zdenek Kabelac wrote: >>>>> Yes, applying this instead of the revert fixes the issue as well. >>> >>> I've applied this patch on 3.7.0-rc3 kernel - and I still see excessive >>> CPU usage - mainly after suspend/resume >>> >>> Here is just simple kswapd backtrace from running kernel: >> >> Yup, this is what we were seeing with the former patch only too. Try to >> apply the other one too: >> https://patchwork.kernel.org/patch/1673231/ >> >> For me I would say, it is fixed by the two patches now. I won't be able >> to report later, since I'm leaving to a conference tomorrow. > > Damn it. It recurred right now, with both patches applied. After I > started a java program which consumed some more memory. Though there are > still 2 gigs free, kswap is spinning: > [] __cond_resched+0x2a/0x40 > [] shrink_slab+0x1c0/0x2d0 > [] kswapd+0x66d/0xb60 > [] kthread+0xc0/0xd0 > [] ret_from_fork+0x7c/0xb0 > [] 0xffffffffffffffff > Yep - wanted to report myself again and noticed your replay. Yes - I've now also both patches installed - and I still observe kswapd eating my CPU. It seems (at least for me) that prior suspend and resume is way to trigger it more frequently. However there is a change in behaviour - while before kswapd was running almost indefinitely now the> CPU spikes are in the range of minutes. (i.e. uptime ~2days - kswapd has over 32minutes CPU time) My machine has 4GB, and no swap (disabled) firefox (22mins), thunderbird(3mins) and pidgin(0.5min) are the 3 most memory and CPU hungry apps for this moment. Zdenek From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754377Ab2KDQe0 (ORCPT ); Sun, 4 Nov 2012 11:34:26 -0500 Received: from mx1.redhat.com ([209.132.183.28]:3021 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751711Ab2KDQeZ (ORCPT ); Sun, 4 Nov 2012 11:34:25 -0500 Message-ID: <5096999F.1040405@redhat.com> Date: Sun, 04 Nov 2012 11:36:47 -0500 From: Rik van Riel User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:16.0) Gecko/20121009 Thunderbird/16.0 MIME-Version: 1.0 To: Mel Gorman CC: Thorsten Leemhuis , Jiri Slaby , Valdis.Kletnieks@vt.edu, Jiri Slaby , linux-mm@kvack.org, LKML , Andrew Morton Subject: Re: kswapd0: excessive CPU usage References: <5076E700.2030909@suse.cz> <118079.1349978211@turing-police.cc.vt.edu> <50770905.5070904@suse.cz> <119175.1349979570@turing-police.cc.vt.edu> <5077434D.7080008@suse.cz> <50780F26.7070007@suse.cz> <20121012135726.GY29125@suse.de> <507BDD45.1070705@suse.cz> <20121015110937.GE29125@suse.de> <508E5FD3.1060105@leemhuis.info> <20121030191843.GH3888@suse.de> In-Reply-To: <20121030191843.GH3888@suse.de> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 10/30/2012 03:18 PM, Mel Gorman wrote: > restart: > - wake_all_kswapd(order, zonelist, high_zoneidx, > + /* > + * kswapd is woken except when this is a THP request and compaction > + * is deferred. If we are backing off reclaim/compaction then kswapd > + * should not be awake aggressively reclaiming with no consumers of > + * the freed pages > + */ > + if (!(is_thp_alloc(gfp_mask, order) && > + compaction_deferred(preferred_zone, order))) > + wake_all_kswapd(order, zonelist, high_zoneidx, > zone_idx(preferred_zone)); What is special about thp allocations here? Surely other large allocations that keep failing should get the same treatment, of not waking up kswapd if compaction is deferred? From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754407Ab2KEOYz (ORCPT ); Mon, 5 Nov 2012 09:24:55 -0500 Received: from cantor2.suse.de ([195.135.220.15]:55165 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753217Ab2KEOYx (ORCPT ); Mon, 5 Nov 2012 09:24:53 -0500 Date: Mon, 5 Nov 2012 14:24:49 +0000 From: Mel Gorman To: Andrew Morton Cc: Zdenek Kabelac , Valdis.Kletnieks@vt.edu, Jiri Slaby , linux-mm@kvack.org, Rik van Riel , Jiri Slaby , LKML Subject: [PATCH] Revert "mm: vmscan: scale number of pages reclaimed by reclaim/compaction based on failures" Message-ID: <20121105142449.GI8218@suse.de> References: <50770905.5070904@suse.cz> <119175.1349979570@turing-police.cc.vt.edu> <5077434D.7080008@suse.cz> <50780F26.7070007@suse.cz> <20121012135726.GY29125@suse.de> <507BDD45.1070705@suse.cz> <20121015110937.GE29125@suse.de> <5093A3F4.8090108@redhat.com> <5093A631.5020209@suse.cz> <509422C3.1000803@suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <509422C3.1000803@suse.cz> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Jiri Slaby reported the following: (It's an effective revert of "mm: vmscan: scale number of pages reclaimed by reclaim/compaction based on failures".) Given kswapd had hours of runtime in ps/top output yesterday in the morning and after the revert it's now 2 minutes in sum for the last 24h, I would say, it's gone. The intention of the patch in question was to compensate for the loss of lumpy reclaim. Part of the reason lumpy reclaim worked is because it aggressively reclaimed pages and this patch was meant to be a sane compromise. When compaction fails, it gets deferred and both compaction and reclaim/compaction is deferred avoid excessive reclaim. However, since commit c6543459 (mm: remove __GFP_NO_KSWAPD), kswapd is woken up each time and continues reclaiming which was not taken into account when the patch was developed. Attempts to address the problem ended up just changing the shape of the problem instead of fixing it. The release window gets closer and while a THP allocation failing is not a major problem, kswapd chewing up a lot of CPU is. This patch reverts "mm: vmscan: scale number of pages reclaimed by reclaim/compaction based on failures" and will be revisited in the future. Signed-off-by: Mel Gorman --- mm/vmscan.c | 25 ------------------------- 1 file changed, 25 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 2624edc..e081ee8 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1760,28 +1760,6 @@ static bool in_reclaim_compaction(struct scan_control *sc) return false; } -#ifdef CONFIG_COMPACTION -/* - * If compaction is deferred for sc->order then scale the number of pages - * reclaimed based on the number of consecutive allocation failures - */ -static unsigned long scale_for_compaction(unsigned long pages_for_compaction, - struct lruvec *lruvec, struct scan_control *sc) -{ - struct zone *zone = lruvec_zone(lruvec); - - if (zone->compact_order_failed <= sc->order) - pages_for_compaction <<= zone->compact_defer_shift; - return pages_for_compaction; -} -#else -static unsigned long scale_for_compaction(unsigned long pages_for_compaction, - struct lruvec *lruvec, struct scan_control *sc) -{ - return pages_for_compaction; -} -#endif - /* * Reclaim/compaction is used for high-order allocation requests. It reclaims * order-0 pages before compacting the zone. should_continue_reclaim() returns @@ -1829,9 +1807,6 @@ static inline bool should_continue_reclaim(struct lruvec *lruvec, * inactive lists are large enough, continue reclaiming */ pages_for_compaction = (2UL << sc->order); - - pages_for_compaction = scale_for_compaction(pages_for_compaction, - lruvec, sc); inactive_lru_pages = get_lru_size(lruvec, LRU_INACTIVE_FILE); if (nr_swap_pages > 0) inactive_lru_pages += get_lru_size(lruvec, LRU_INACTIVE_ANON); From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751656Ab2KFKZP (ORCPT ); Tue, 6 Nov 2012 05:25:15 -0500 Received: from mail.fem.tu-ilmenau.de ([141.24.101.79]:47456 "EHLO mail.fem.tu-ilmenau.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750755Ab2KFKZN (ORCPT ); Tue, 6 Nov 2012 05:25:13 -0500 X-Greylist: delayed 556 seconds by postgrey-1.27 at vger.kernel.org; Tue, 06 Nov 2012 05:25:13 EST Date: Tue, 6 Nov 2012 11:15:54 +0100 From: Johannes Hirte To: Mel Gorman Cc: Andrew Morton , Zdenek Kabelac , Valdis.Kletnieks@vt.edu, Jiri Slaby , linux-mm@kvack.org, Rik van Riel , Jiri Slaby , LKML Subject: Re: [PATCH] Revert "mm: vmscan: scale number of pages reclaimed by reclaim/compaction based on failures" Message-ID: <20121106111554.1896c3f3@fem.tu-ilmenau.de> In-Reply-To: <20121105142449.GI8218@suse.de> References: <50770905.5070904@suse.cz> <119175.1349979570@turing-police.cc.vt.edu> <5077434D.7080008@suse.cz> <50780F26.7070007@suse.cz> <20121012135726.GY29125@suse.de> <507BDD45.1070705@suse.cz> <20121015110937.GE29125@suse.de> <5093A3F4.8090108@redhat.com> <5093A631.5020209@suse.cz> <509422C3.1000803@suse.cz> <20121105142449.GI8218@suse.de> X-Mailer: Claws Mail 3.8.1 (GTK+ 2.24.13; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Am Mon, 5 Nov 2012 14:24:49 +0000 schrieb Mel Gorman : > Jiri Slaby reported the following: > > (It's an effective revert of "mm: vmscan: scale number of > pages reclaimed by reclaim/compaction based on failures".) Given > kswapd had hours of runtime in ps/top output yesterday in the morning > and after the revert it's now 2 minutes in sum for the last > 24h, I would say, it's gone. > > The intention of the patch in question was to compensate for the loss > of lumpy reclaim. Part of the reason lumpy reclaim worked is because > it aggressively reclaimed pages and this patch was meant to be a sane > compromise. > > When compaction fails, it gets deferred and both compaction and > reclaim/compaction is deferred avoid excessive reclaim. However, since > commit c6543459 (mm: remove __GFP_NO_KSWAPD), kswapd is woken up each > time and continues reclaiming which was not taken into account when > the patch was developed. > > Attempts to address the problem ended up just changing the shape of > the problem instead of fixing it. The release window gets closer and > while a THP allocation failing is not a major problem, kswapd chewing > up a lot of CPU is. This patch reverts "mm: vmscan: scale number of > pages reclaimed by reclaim/compaction based on failures" and will be > revisited in the future. > > Signed-off-by: Mel Gorman > --- > mm/vmscan.c | 25 ------------------------- > 1 file changed, 25 deletions(-) > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 2624edc..e081ee8 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -1760,28 +1760,6 @@ static bool in_reclaim_compaction(struct > scan_control *sc) return false; > } > > -#ifdef CONFIG_COMPACTION > -/* > - * If compaction is deferred for sc->order then scale the number of > pages > - * reclaimed based on the number of consecutive allocation failures > - */ > -static unsigned long scale_for_compaction(unsigned long > pages_for_compaction, > - struct lruvec *lruvec, struct scan_control > *sc) -{ > - struct zone *zone = lruvec_zone(lruvec); > - > - if (zone->compact_order_failed <= sc->order) > - pages_for_compaction <<= zone->compact_defer_shift; > - return pages_for_compaction; > -} > -#else > -static unsigned long scale_for_compaction(unsigned long > pages_for_compaction, > - struct lruvec *lruvec, struct scan_control > *sc) -{ > - return pages_for_compaction; > -} > -#endif > - > /* > * Reclaim/compaction is used for high-order allocation requests. It > reclaims > * order-0 pages before compacting the zone. > should_continue_reclaim() returns @@ -1829,9 +1807,6 @@ static inline > bool should_continue_reclaim(struct lruvec *lruvec, > * inactive lists are large enough, continue reclaiming > */ > pages_for_compaction = (2UL << sc->order); > - > - pages_for_compaction = > scale_for_compaction(pages_for_compaction, > - lruvec, sc); > inactive_lru_pages = get_lru_size(lruvec, LRU_INACTIVE_FILE); > if (nr_swap_pages > 0) > inactive_lru_pages += get_lru_size(lruvec, > LRU_INACTIVE_ANON); -- Even with this patch I see kswapd0 very often on top. Much more than with kernel 3.6. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752570Ab2KIEWP (ORCPT ); Thu, 8 Nov 2012 23:22:15 -0500 Received: from e35.co.us.ibm.com ([32.97.110.153]:45118 "EHLO e35.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751439Ab2KIEWO (ORCPT ); Thu, 8 Nov 2012 23:22:14 -0500 Message-ID: <509C84ED.8090605@linux.vnet.ibm.com> Date: Thu, 08 Nov 2012 22:22:05 -0600 From: Seth Jennings User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:16.0) Gecko/20121028 Thunderbird/16.0.2 MIME-Version: 1.0 To: Jiri Slaby CC: Mel Gorman , Zdenek Kabelac , Valdis.Kletnieks@vt.edu, Jiri Slaby , linux-mm@kvack.org, LKML , Andrew Morton , Rik van Riel , Robert Jennings Subject: Re: kswapd0: excessive CPU usage References: <507688CC.9000104@suse.cz> <106695.1349963080@turing-police.cc.vt.edu> <5076E700.2030909@suse.cz> <118079.1349978211@turing-police.cc.vt.edu> <50770905.5070904@suse.cz> <119175.1349979570@turing-police.cc.vt.edu> <5077434D.7080008@suse.cz> <50780F26.7070007@suse.cz> <20121012135726.GY29125@suse.de> <507BDD45.1070705@suse.cz> <20121015110937.GE29125@suse.de> <5093A3F4.8090108@redhat.com> <5093A631.5020209@suse.cz> <509422C3.1000803@suse.cz> In-Reply-To: <509422C3.1000803@suse.cz> Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: 7bit X-Content-Scanned: Fidelis XPS MAILER x-cbid: 12110904-4834-0000-0000-00000043D38E Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 11/02/2012 02:45 PM, Jiri Slaby wrote: > On 11/02/2012 11:53 AM, Jiri Slaby wrote: >> On 11/02/2012 11:44 AM, Zdenek Kabelac wrote: >>>>> Yes, applying this instead of the revert fixes the issue as well. >>> >>> I've applied this patch on 3.7.0-rc3 kernel - and I still see excessive >>> CPU usage - mainly after suspend/resume >>> >>> Here is just simple kswapd backtrace from running kernel: >> >> Yup, this is what we were seeing with the former patch only too. Try to >> apply the other one too: >> https://patchwork.kernel.org/patch/1673231/ >> >> For me I would say, it is fixed by the two patches now. I won't be able >> to report later, since I'm leaving to a conference tomorrow. > > Damn it. It recurred right now, with both patches applied. After I > started a java program which consumed some more memory. Though there are > still 2 gigs free, kswap is spinning: > [] __cond_resched+0x2a/0x40 > [] shrink_slab+0x1c0/0x2d0 > [] kswapd+0x66d/0xb60 > [] kthread+0xc0/0xd0 > [] ret_from_fork+0x7c/0xb0 > [] 0xffffffffffffffff I'm also hitting this issue in v3.7-rc4. It appears that the last release not effected by this issue was v3.3. Bisecting the changes included for v3.4-rc1 showed that this commit introduced the issue: fe2c2a106663130a5ab45cb0e3414b52df2fff0c is the first bad commit commit fe2c2a106663130a5ab45cb0e3414b52df2fff0c Author: Rik van Riel Date: Wed Mar 21 16:33:51 2012 -0700 vmscan: reclaim at order 0 when compaction is enabled ... This is plausible since the issue seems to be in the kswapd + compaction realm. I've yet to figure out exactly what about this commit results in kswapd spinning. I would be interested if someone can confirm this finding. -- Seth From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751428Ab2KIIIF (ORCPT ); Fri, 9 Nov 2012 03:08:05 -0500 Received: from mx1.redhat.com ([209.132.183.28]:41556 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751357Ab2KIIIC (ORCPT ); Fri, 9 Nov 2012 03:08:02 -0500 Message-ID: <509CB9D1.6060704@redhat.com> Date: Fri, 09 Nov 2012 09:07:45 +0100 From: Zdenek Kabelac Organization: Red Hat User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:16.0) Gecko/20121016 Thunderbird/16.0.1 MIME-Version: 1.0 To: Seth Jennings CC: Jiri Slaby , Mel Gorman , Valdis.Kletnieks@vt.edu, Jiri Slaby , linux-mm@kvack.org, LKML , Andrew Morton , Rik van Riel , Robert Jennings Subject: Re: kswapd0: excessive CPU usage References: <507688CC.9000104@suse.cz> <106695.1349963080@turing-police.cc.vt.edu> <5076E700.2030909@suse.cz> <118079.1349978211@turing-police.cc.vt.edu> <50770905.5070904@suse.cz> <119175.1349979570@turing-police.cc.vt.edu> <5077434D.7080008@suse.cz> <50780F26.7070007@suse.cz> <20121012135726.GY29125@suse.de> <507BDD45.1070705@suse.cz> <20121015110937.GE29125@suse.de> <5093A3F4.8090108@redhat.com> <5093A631.5020209@suse.cz> <509422C3.1000803@suse.cz> <509C84ED.8090605@linux.vnet.ibm.com> In-Reply-To: <509C84ED.8090605@linux.vnet.ibm.com> Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Dne 9.11.2012 05:22, Seth Jennings napsal(a): > On 11/02/2012 02:45 PM, Jiri Slaby wrote: >> On 11/02/2012 11:53 AM, Jiri Slaby wrote: >>> On 11/02/2012 11:44 AM, Zdenek Kabelac wrote: >>>>>> Yes, applying this instead of the revert fixes the issue as well. >>>> >>>> I've applied this patch on 3.7.0-rc3 kernel - and I still see excessive >>>> CPU usage - mainly after suspend/resume >>>> >>>> Here is just simple kswapd backtrace from running kernel: >>> >>> Yup, this is what we were seeing with the former patch only too. Try to >>> apply the other one too: >>> https://patchwork.kernel.org/patch/1673231/ >>> >>> For me I would say, it is fixed by the two patches now. I won't be able >>> to report later, since I'm leaving to a conference tomorrow. >> >> Damn it. It recurred right now, with both patches applied. After I >> started a java program which consumed some more memory. Though there are >> still 2 gigs free, kswap is spinning: >> [] __cond_resched+0x2a/0x40 >> [] shrink_slab+0x1c0/0x2d0 >> [] kswapd+0x66d/0xb60 >> [] kthread+0xc0/0xd0 >> [] ret_from_fork+0x7c/0xb0 >> [] 0xffffffffffffffff > > I'm also hitting this issue in v3.7-rc4. It appears that the last > release not effected by this issue was v3.3. Bisecting the changes > included for v3.4-rc1 showed that this commit introduced the issue: > > fe2c2a106663130a5ab45cb0e3414b52df2fff0c is the first bad commit > commit fe2c2a106663130a5ab45cb0e3414b52df2fff0c > Author: Rik van Riel > Date: Wed Mar 21 16:33:51 2012 -0700 > > vmscan: reclaim at order 0 when compaction is enabled > ... > > This is plausible since the issue seems to be in the kswapd + compaction > realm. I've yet to figure out exactly what about this commit results in > kswapd spinning. > > I would be interested if someone can confirm this finding. > > -- > Seth > On my system 3.7-rc4 the problem seems to be effectively solved by revert patch: https://lkml.org/lkml/2012/11/5/308 i.e. in 2 days uptime kswapd0 eats 6 seconds which is IMHO ok - I'm not observing any busy loops on CPU with kswapd0. Zdenek From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751416Ab2KIIgr (ORCPT ); Fri, 9 Nov 2012 03:36:47 -0500 Received: from cantor2.suse.de ([195.135.220.15]:33274 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750942Ab2KIIgq (ORCPT ); Fri, 9 Nov 2012 03:36:46 -0500 Date: Fri, 9 Nov 2012 08:36:37 +0000 From: Mel Gorman To: Johannes Hirte Cc: Andrew Morton , Zdenek Kabelac , Valdis.Kletnieks@vt.edu, Jiri Slaby , linux-mm@kvack.org, Rik van Riel , Jiri Slaby , LKML Subject: Re: [PATCH] Revert "mm: vmscan: scale number of pages reclaimed by reclaim/compaction based on failures" Message-ID: <20121109083637.GD8218@suse.de> References: <5077434D.7080008@suse.cz> <50780F26.7070007@suse.cz> <20121012135726.GY29125@suse.de> <507BDD45.1070705@suse.cz> <20121015110937.GE29125@suse.de> <5093A3F4.8090108@redhat.com> <5093A631.5020209@suse.cz> <509422C3.1000803@suse.cz> <20121105142449.GI8218@suse.de> <20121106111554.1896c3f3@fem.tu-ilmenau.de> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <20121106111554.1896c3f3@fem.tu-ilmenau.de> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Nov 06, 2012 at 11:15:54AM +0100, Johannes Hirte wrote: > Am Mon, 5 Nov 2012 14:24:49 +0000 > schrieb Mel Gorman : > > > Jiri Slaby reported the following: > > > > (It's an effective revert of "mm: vmscan: scale number of > > pages reclaimed by reclaim/compaction based on failures".) Given > > kswapd had hours of runtime in ps/top output yesterday in the morning > > and after the revert it's now 2 minutes in sum for the last > > 24h, I would say, it's gone. > > > > The intention of the patch in question was to compensate for the loss > > of lumpy reclaim. Part of the reason lumpy reclaim worked is because > > it aggressively reclaimed pages and this patch was meant to be a sane > > compromise. > > > > When compaction fails, it gets deferred and both compaction and > > reclaim/compaction is deferred avoid excessive reclaim. However, since > > commit c6543459 (mm: remove __GFP_NO_KSWAPD), kswapd is woken up each > > time and continues reclaiming which was not taken into account when > > the patch was developed. > > > > Attempts to address the problem ended up just changing the shape of > > the problem instead of fixing it. The release window gets closer and > > while a THP allocation failing is not a major problem, kswapd chewing > > up a lot of CPU is. This patch reverts "mm: vmscan: scale number of > > pages reclaimed by reclaim/compaction based on failures" and will be > > revisited in the future. > > > > Signed-off-by: Mel Gorman > > --- > > mm/vmscan.c | 25 ------------------------- > > 1 file changed, 25 deletions(-) > > > > diff --git a/mm/vmscan.c b/mm/vmscan.c > > index 2624edc..e081ee8 100644 > > --- a/mm/vmscan.c > > +++ b/mm/vmscan.c > > @@ -1760,28 +1760,6 @@ static bool in_reclaim_compaction(struct > > scan_control *sc) return false; > > } > > > > -#ifdef CONFIG_COMPACTION > > -/* > > - * If compaction is deferred for sc->order then scale the number of > > pages > > - * reclaimed based on the number of consecutive allocation failures > > - */ > > -static unsigned long scale_for_compaction(unsigned long > > pages_for_compaction, > > - struct lruvec *lruvec, struct scan_control > > *sc) -{ > > - struct zone *zone = lruvec_zone(lruvec); > > - > > - if (zone->compact_order_failed <= sc->order) > > - pages_for_compaction <<= zone->compact_defer_shift; > > - return pages_for_compaction; > > -} > > -#else > > -static unsigned long scale_for_compaction(unsigned long > > pages_for_compaction, > > - struct lruvec *lruvec, struct scan_control > > *sc) -{ > > - return pages_for_compaction; > > -} > > -#endif > > - > > /* > > * Reclaim/compaction is used for high-order allocation requests. It > > reclaims > > * order-0 pages before compacting the zone. > > should_continue_reclaim() returns @@ -1829,9 +1807,6 @@ static inline > > bool should_continue_reclaim(struct lruvec *lruvec, > > * inactive lists are large enough, continue reclaiming > > */ > > pages_for_compaction = (2UL << sc->order); > > - > > - pages_for_compaction = > > scale_for_compaction(pages_for_compaction, > > - lruvec, sc); > > inactive_lru_pages = get_lru_size(lruvec, LRU_INACTIVE_FILE); > > if (nr_swap_pages > 0) > > inactive_lru_pages += get_lru_size(lruvec, > > LRU_INACTIVE_ANON); -- > > Even with this patch I see kswapd0 very often on top. Much more than > with kernel 3.6. How severe is the CPU usage? The higher usage can be explained by "mm: remove __GFP_NO_KSWAPD" which allows kswapd to compact memory to reduce the amount of time processes spend in compaction but will result in the CPU cost being incurred by kswapd. Is it really high like the bug was reporting with high usage over long periods of time or do you just see it using 2-6% of CPU for short periods? Thanks. -- Mel Gorman SUSE Labs From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751492Ab2KIIkz (ORCPT ); Fri, 9 Nov 2012 03:40:55 -0500 Received: from cantor2.suse.de ([195.135.220.15]:33405 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750711Ab2KIIkw (ORCPT ); Fri, 9 Nov 2012 03:40:52 -0500 Date: Fri, 9 Nov 2012 08:40:48 +0000 From: Mel Gorman To: Seth Jennings Cc: Jiri Slaby , Zdenek Kabelac , Valdis.Kletnieks@vt.edu, Jiri Slaby , linux-mm@kvack.org, LKML , Andrew Morton , Rik van Riel , Robert Jennings Subject: Re: kswapd0: excessive CPU usage Message-ID: <20121109084048.GE8218@suse.de> References: <119175.1349979570@turing-police.cc.vt.edu> <5077434D.7080008@suse.cz> <50780F26.7070007@suse.cz> <20121012135726.GY29125@suse.de> <507BDD45.1070705@suse.cz> <20121015110937.GE29125@suse.de> <5093A3F4.8090108@redhat.com> <5093A631.5020209@suse.cz> <509422C3.1000803@suse.cz> <509C84ED.8090605@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <509C84ED.8090605@linux.vnet.ibm.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Nov 08, 2012 at 10:22:05PM -0600, Seth Jennings wrote: > On 11/02/2012 02:45 PM, Jiri Slaby wrote: > > On 11/02/2012 11:53 AM, Jiri Slaby wrote: > >> On 11/02/2012 11:44 AM, Zdenek Kabelac wrote: > >>>>> Yes, applying this instead of the revert fixes the issue as well. > >>> > >>> I've applied this patch on 3.7.0-rc3 kernel - and I still see excessive > >>> CPU usage - mainly after suspend/resume > >>> > >>> Here is just simple kswapd backtrace from running kernel: > >> > >> Yup, this is what we were seeing with the former patch only too. Try to > >> apply the other one too: > >> https://patchwork.kernel.org/patch/1673231/ > >> > >> For me I would say, it is fixed by the two patches now. I won't be able > >> to report later, since I'm leaving to a conference tomorrow. > > > > Damn it. It recurred right now, with both patches applied. After I > > started a java program which consumed some more memory. Though there are > > still 2 gigs free, kswap is spinning: > > [] __cond_resched+0x2a/0x40 > > [] shrink_slab+0x1c0/0x2d0 > > [] kswapd+0x66d/0xb60 > > [] kthread+0xc0/0xd0 > > [] ret_from_fork+0x7c/0xb0 > > [] 0xffffffffffffffff > > I'm also hitting this issue in v3.7-rc4. It appears that the last > release not effected by this issue was v3.3. Bisecting the changes > included for v3.4-rc1 showed that this commit introduced the issue: > > fe2c2a106663130a5ab45cb0e3414b52df2fff0c is the first bad commit > commit fe2c2a106663130a5ab45cb0e3414b52df2fff0c > Author: Rik van Riel > Date: Wed Mar 21 16:33:51 2012 -0700 > > vmscan: reclaim at order 0 when compaction is enabled > ... > > This is plausible since the issue seems to be in the kswapd + compaction > realm. I've yet to figure out exactly what about this commit results in > kswapd spinning. > > I would be interested if someone can confirm this finding. > I cannot confirm the actual finding as I don't see the same sort of problems. However, this does make sense and was more or less expected. Reclaiming at order-0 would have forced compaction to be used more instead of lumpy reclaim (less CPU usage but greater system distruption that is harder to measure). Shortly after, lumpy reclaim was removed entirely so now larger amounts of CPU time is spent compacting memory that previously would have been reclaimed. -- Mel Gorman SUSE Labs From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751869Ab2KIJGn (ORCPT ); Fri, 9 Nov 2012 04:06:43 -0500 Received: from cantor2.suse.de ([195.135.220.15]:34351 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751228Ab2KIJGj (ORCPT ); Fri, 9 Nov 2012 04:06:39 -0500 Date: Fri, 9 Nov 2012 09:06:35 +0000 From: Mel Gorman To: Zdenek Kabelac Cc: Seth Jennings , Jiri Slaby , Valdis.Kletnieks@vt.edu, Jiri Slaby , linux-mm@kvack.org, LKML , Andrew Morton , Rik van Riel , Robert Jennings Subject: Re: kswapd0: excessive CPU usage Message-ID: <20121109090635.GG8218@suse.de> References: <5077434D.7080008@suse.cz> <50780F26.7070007@suse.cz> <20121012135726.GY29125@suse.de> <507BDD45.1070705@suse.cz> <20121015110937.GE29125@suse.de> <5093A3F4.8090108@redhat.com> <5093A631.5020209@suse.cz> <509422C3.1000803@suse.cz> <509C84ED.8090605@linux.vnet.ibm.com> <509CB9D1.6060704@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <509CB9D1.6060704@redhat.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Nov 09, 2012 at 09:07:45AM +0100, Zdenek Kabelac wrote: > >fe2c2a106663130a5ab45cb0e3414b52df2fff0c is the first bad commit > >commit fe2c2a106663130a5ab45cb0e3414b52df2fff0c > >Author: Rik van Riel > >Date: Wed Mar 21 16:33:51 2012 -0700 > > > > vmscan: reclaim at order 0 when compaction is enabled > >... > > > >This is plausible since the issue seems to be in the kswapd + compaction > >realm. I've yet to figure out exactly what about this commit results in > >kswapd spinning. > > > >I would be interested if someone can confirm this finding. > > > >-- > >Seth > > > > > On my system 3.7-rc4 the problem seems to be effectively solved by > revert patch: https://lkml.org/lkml/2012/11/5/308 > Ok, while there is still a question on whether it's enough I think it's sensible to at least start with the obvious one. Thanks very much. -- Mel Gorman SUSE Labs From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751735Ab2KIJNI (ORCPT ); Fri, 9 Nov 2012 04:13:08 -0500 Received: from cantor2.suse.de ([195.135.220.15]:34595 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750907Ab2KIJND (ORCPT ); Fri, 9 Nov 2012 04:13:03 -0500 Date: Fri, 9 Nov 2012 09:12:58 +0000 From: Mel Gorman To: Andrew Morton Cc: Zdenek Kabelac , Valdis.Kletnieks@vt.edu, Jiri Slaby , linux-mm@kvack.org, Rik van Riel , Jiri Slaby , LKML Subject: Re: [PATCH] Revert "mm: vmscan: scale number of pages reclaimed by reclaim/compaction based on failures" Message-ID: <20121109091258.GH8218@suse.de> References: <119175.1349979570@turing-police.cc.vt.edu> <5077434D.7080008@suse.cz> <50780F26.7070007@suse.cz> <20121012135726.GY29125@suse.de> <507BDD45.1070705@suse.cz> <20121015110937.GE29125@suse.de> <5093A3F4.8090108@redhat.com> <5093A631.5020209@suse.cz> <509422C3.1000803@suse.cz> <20121105142449.GI8218@suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <20121105142449.GI8218@suse.de> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Nov 05, 2012 at 02:24:49PM +0000, Mel Gorman wrote: > Jiri Slaby reported the following: > > (It's an effective revert of "mm: vmscan: scale number of pages > reclaimed by reclaim/compaction based on failures".) Given kswapd > had hours of runtime in ps/top output yesterday in the morning > and after the revert it's now 2 minutes in sum for the last 24h, > I would say, it's gone. > > The intention of the patch in question was to compensate for the loss > of lumpy reclaim. Part of the reason lumpy reclaim worked is because > it aggressively reclaimed pages and this patch was meant to be a sane > compromise. > > When compaction fails, it gets deferred and both compaction and > reclaim/compaction is deferred avoid excessive reclaim. However, since > commit c6543459 (mm: remove __GFP_NO_KSWAPD), kswapd is woken up each time > and continues reclaiming which was not taken into account when the patch > was developed. > > Attempts to address the problem ended up just changing the shape of the > problem instead of fixing it. The release window gets closer and while a > THP allocation failing is not a major problem, kswapd chewing up a lot of > CPU is. This patch reverts "mm: vmscan: scale number of pages reclaimed > by reclaim/compaction based on failures" and will be revisited in the future. > > Signed-off-by: Mel Gorman Andrew, can you pick up this patch please and drop mm-vmscan-scale-number-of-pages-reclaimed-by-reclaim-compaction-only-in-direct-reclaim.patch ? There are mixed reports on how much it helps but it comes down to "this fixes a problem" versus "kswapd is still showing higher usage". I think the higher kswapd usage is explained by the removal of __GFP_NO_KSWAPD and so while higher usage is bad, it is not necessarily unjustified. Ideally it would have been proven that having kswapd doing the work reduced application stalls in direct reclaim but unfortunately I do not have concrete evidence of that at this time. -- Mel Gorman SUSE Labs From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751733Ab2KKJNc (ORCPT ); Sun, 11 Nov 2012 04:13:32 -0500 Received: from mx1.redhat.com ([209.132.183.28]:30510 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750875Ab2KKJN2 (ORCPT ); Sun, 11 Nov 2012 04:13:28 -0500 Message-ID: <509F6C2A.9060502@redhat.com> Date: Sun, 11 Nov 2012 10:13:14 +0100 From: Zdenek Kabelac Organization: Red Hat User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:16.0) Gecko/20121016 Thunderbird/16.0.1 MIME-Version: 1.0 To: Mel Gorman CC: Seth Jennings , Jiri Slaby , Valdis.Kletnieks@vt.edu, Jiri Slaby , linux-mm@kvack.org, LKML , Andrew Morton , Rik van Riel , Robert Jennings Subject: Re: kswapd0: excessive CPU usage References: <5077434D.7080008@suse.cz> <50780F26.7070007@suse.cz> <20121012135726.GY29125@suse.de> <507BDD45.1070705@suse.cz> <20121015110937.GE29125@suse.de> <5093A3F4.8090108@redhat.com> <5093A631.5020209@suse.cz> <509422C3.1000803@suse.cz> <509C84ED.8090605@linux.vnet.ibm.com> <509CB9D1.6060704@redhat.com> <20121109090635.GG8218@suse.de> In-Reply-To: <20121109090635.GG8218@suse.de> Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Dne 9.11.2012 10:06, Mel Gorman napsal(a): > On Fri, Nov 09, 2012 at 09:07:45AM +0100, Zdenek Kabelac wrote: >>> fe2c2a106663130a5ab45cb0e3414b52df2fff0c is the first bad commit >>> commit fe2c2a106663130a5ab45cb0e3414b52df2fff0c >>> Author: Rik van Riel >>> Date: Wed Mar 21 16:33:51 2012 -0700 >>> >>> vmscan: reclaim at order 0 when compaction is enabled >>> ... >>> >>> This is plausible since the issue seems to be in the kswapd + compaction >>> realm. I've yet to figure out exactly what about this commit results in >>> kswapd spinning. >>> >>> I would be interested if someone can confirm this finding. >>> >>> -- >>> Seth >>> >> >> >> On my system 3.7-rc4 the problem seems to be effectively solved by >> revert patch: https://lkml.org/lkml/2012/11/5/308 >> > > Ok, while there is still a question on whether it's enough I think it's > sensible to at least start with the obvious one. > Hmm, so it's just took longer to hit the problem and observe kswapd0 spinning on my CPU again - it's not as endless like before - but still it easily eats minutes - it helps to turn off Firefox or TB (memory hungry apps) so kswapd0 stops soon - and restart those apps again. (And I still have like >1GB of cached memory) kswapd0 R running task 0 30 2 0x00000000 ffff8801331efae8 0000000000000082 0000000000000018 0000000000000246 ffff880135b9a340 ffff8801331effd8 ffff8801331effd8 ffff8801331effd8 ffff880055dfa340 ffff880135b9a340 00000000331efad8 ffff8801331ee000 Call Trace: [] preempt_schedule+0x42/0x60 [] _raw_spin_unlock+0x55/0x60 [] put_super+0x31/0x40 [] drop_super+0x22/0x30 [] prune_super+0x149/0x1b0 [] shrink_slab+0xba/0x510 [] ? mem_cgroup_iter+0x17a/0x2e0 [] ? mem_cgroup_iter+0xca/0x2e0 [] balance_pgdat+0x629/0x7f0 [] kswapd+0x174/0x620 [] ? __init_waitqueue_head+0x60/0x60 [] ? balance_pgdat+0x7f0/0x7f0 [] kthread+0xdb/0xe0 [] ? kthread_create_on_node+0x140/0x140 [] ret_from_fork+0x7c/0xb0 [] ? kthread_create_on_node+0x140/0x140 runnable tasks: task PID tree-key switches prio exec-runtime sum-exec sum-sleep ---------------------------------------------------------------------------------------------------------- kswapd0 30 8689943.729790 36266 120 8689943.729790 201495.640629 56609485.489414 / kworker/0:1 14790 8689937.729790 16969 120 8689937.729790 374.385996 150405.181652 / R bash 14855 821.749268 50 120 821.749268 24.027535 5252.291128 /autogroup-304 Mem-Info: DMA per-cpu: CPU 0: hi: 0, btch: 1 usd: 0 CPU 1: hi: 0, btch: 1 usd: 0 DMA32 per-cpu: CPU 0: hi: 186, btch: 31 usd: 146 CPU 1: hi: 186, btch: 31 usd: 135 Normal per-cpu: CPU 0: hi: 186, btch: 31 usd: 131 CPU 1: hi: 186, btch: 31 usd: 132 active_anon:726521 inactive_anon:26442 isolated_anon:0 active_file:77765 inactive_file:76890 isolated_file:0 unevictable:12 dirty:4 writeback:0 unstable:0 free:40261 slab_reclaimable:12414 slab_unreclaimable:9694 mapped:26382 shmem:162712 pagetables:6618 bounce:0 free_cma:0 DMA free:15676kB min:272kB low:340kB high:408kB active_anon:208kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15900kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:208kB slab_reclaimable:8kB slab_unreclaimable:8kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes lowmem_reserve[]: 0 2951 3836 3836 DMA32 free:126072kB min:51776kB low:64720kB high:77664kB active_anon:2175104kB inactive_anon:98976kB active_file:296252kB inactive_file:297648kB unevictable:48kB isolated(anon):0kB isolated(file):0kB present:3021960kB mlocked:48kB dirty:12kB writeback:0kB mapped:77664kB shmem:620388kB slab_reclaimable:19128kB slab_unreclaimable:6292kB kernel_stack:624kB pagetables:8900kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 885 885 Normal free:19296kB min:15532kB low:19412kB high:23296kB active_anon:730772kB inactive_anon:6792kB active_file:14808kB inactive_file:9912kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:906664kB mlocked:0kB dirty:4kB writeback:0kB mapped:27864kB shmem:30252kB slab_reclaimable:30520kB slab_unreclaimable:32476kB kernel_stack:2496kB pagetables:17572kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 0 0 DMA: 1*4kB 1*8kB 3*16kB 2*32kB 3*64kB 2*128kB 3*256kB 2*512kB 3*1024kB 3*2048kB 1*4096kB = 15676kB DMA32: 730*4kB 328*8kB 223*16kB 123*32kB 182*64kB 96*128kB 172*256kB 56*512kB 12*1024kB 1*2048kB 1*4096kB = 128120kB Normal: 600*4kB 384*8kB 164*16kB 122*32kB 40*64kB 7*128kB 1*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 19296kB 317367 total pagecache pages 0 pages in swap cache Swap cache stats: add 0, delete 0, find 0/0 Free swap = 0kB Total swap = 0kB 1032176 pages RAM 42789 pages reserved 642501 pages shared 869271 pages non-shared From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752762Ab2KLLhh (ORCPT ); Mon, 12 Nov 2012 06:37:37 -0500 Received: from cantor2.suse.de ([195.135.220.15]:59076 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752467Ab2KLLhg (ORCPT ); Mon, 12 Nov 2012 06:37:36 -0500 Date: Mon, 12 Nov 2012 11:37:31 +0000 From: Mel Gorman To: Zdenek Kabelac Cc: Seth Jennings , Jiri Slaby , Valdis.Kletnieks@vt.edu, Jiri Slaby , linux-mm@kvack.org, LKML , Andrew Morton , Rik van Riel , Robert Jennings Subject: [PATCH] Revert "mm: remove __GFP_NO_KSWAPD" Message-ID: <20121112113731.GS8218@suse.de> References: <20121012135726.GY29125@suse.de> <507BDD45.1070705@suse.cz> <20121015110937.GE29125@suse.de> <5093A3F4.8090108@redhat.com> <5093A631.5020209@suse.cz> <509422C3.1000803@suse.cz> <509C84ED.8090605@linux.vnet.ibm.com> <509CB9D1.6060704@redhat.com> <20121109090635.GG8218@suse.de> <509F6C2A.9060502@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <509F6C2A.9060502@redhat.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org With "mm: vmscan: scale number of pages reclaimed by reclaim/compaction based on failures" reverted, Zdenek Kabelac reported the following Hmm, so it's just took longer to hit the problem and observe kswapd0 spinning on my CPU again - it's not as endless like before - but still it easily eats minutes - it helps to turn off Firefox or TB (memory hungry apps) so kswapd0 stops soon - and restart those apps again. (And I still have like >1GB of cached memory) kswapd0 R running task 0 30 2 0x00000000 ffff8801331efae8 0000000000000082 0000000000000018 0000000000000246 ffff880135b9a340 ffff8801331effd8 ffff8801331effd8 ffff8801331effd8 ffff880055dfa340 ffff880135b9a340 00000000331efad8 ffff8801331ee000 Call Trace: [] preempt_schedule+0x42/0x60 [] _raw_spin_unlock+0x55/0x60 [] put_super+0x31/0x40 [] drop_super+0x22/0x30 [] prune_super+0x149/0x1b0 [] shrink_slab+0xba/0x510 The sysrq+m indicates the system has no swap so it'll never reclaim anonymous pages as part of reclaim/compaction. That is one part of the problem but not the root cause as file-backed pages could also be reclaimed. The likely underlying problem is that kswapd is woken up or kept awake for each THP allocation request in the page allocator slow path. If compaction fails for the requesting process then compaction will be deferred for a time and direct reclaim is avoided. However, if there are a storm of THP requests that are simply rejected, it will still be the the case that kswapd is awake for a prolonged period of time as pgdat->kswapd_max_order is updated each time. This is noticed by the main kswapd() loop and it will not call kswapd_try_to_sleep(). Instead it will loopp, shrinking a small number of pages and calling shrink_slab() on each iteration. The temptation is to supply a patch that checks if kswapd was woken for THP and if so ignore pgdat->kswapd_max_order but it'll be a hack and not backed up by proper testing. As 3.7 is very close to release and this is not a bug we should release with, a safer path is to revert "mm: remove __GFP_NO_KSWAPD" for now and revisit it with the view to ironing out the balance_pgdat() logic in general. Signed-off-by: Mel Gorman --- drivers/mtd/mtdcore.c | 6 ++++-- include/linux/gfp.h | 5 ++++- include/trace/events/gfpflags.h | 1 + mm/page_alloc.c | 7 ++++--- 4 files changed, 13 insertions(+), 6 deletions(-) diff --git a/drivers/mtd/mtdcore.c b/drivers/mtd/mtdcore.c index 374c46d..ec794a7 100644 --- a/drivers/mtd/mtdcore.c +++ b/drivers/mtd/mtdcore.c @@ -1077,7 +1077,8 @@ EXPORT_SYMBOL_GPL(mtd_writev); * until the request succeeds or until the allocation size falls below * the system page size. This attempts to make sure it does not adversely * impact system performance, so when allocating more than one page, we - * ask the memory allocator to avoid re-trying. + * ask the memory allocator to avoid re-trying, swapping, writing back + * or performing I/O. * * Note, this function also makes sure that the allocated buffer is aligned to * the MTD device's min. I/O unit, i.e. the "mtd->writesize" value. @@ -1091,7 +1092,8 @@ EXPORT_SYMBOL_GPL(mtd_writev); */ void *mtd_kmalloc_up_to(const struct mtd_info *mtd, size_t *size) { - gfp_t flags = __GFP_NOWARN | __GFP_WAIT | __GFP_NORETRY; + gfp_t flags = __GFP_NOWARN | __GFP_WAIT | + __GFP_NORETRY | __GFP_NO_KSWAPD; size_t min_alloc = max_t(size_t, mtd->writesize, PAGE_SIZE); void *kbuf; diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 02c1c971..d0a7967 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -31,6 +31,7 @@ struct vm_area_struct; #define ___GFP_THISNODE 0x40000u #define ___GFP_RECLAIMABLE 0x80000u #define ___GFP_NOTRACK 0x200000u +#define ___GFP_NO_KSWAPD 0x400000u #define ___GFP_OTHER_NODE 0x800000u #define ___GFP_WRITE 0x1000000u @@ -85,6 +86,7 @@ struct vm_area_struct; #define __GFP_RECLAIMABLE ((__force gfp_t)___GFP_RECLAIMABLE) /* Page is reclaimable */ #define __GFP_NOTRACK ((__force gfp_t)___GFP_NOTRACK) /* Don't track with kmemcheck */ +#define __GFP_NO_KSWAPD ((__force gfp_t)___GFP_NO_KSWAPD) #define __GFP_OTHER_NODE ((__force gfp_t)___GFP_OTHER_NODE) /* On behalf of other node */ #define __GFP_WRITE ((__force gfp_t)___GFP_WRITE) /* Allocator intends to dirty page */ @@ -114,7 +116,8 @@ struct vm_area_struct; __GFP_MOVABLE) #define GFP_IOFS (__GFP_IO | __GFP_FS) #define GFP_TRANSHUGE (GFP_HIGHUSER_MOVABLE | __GFP_COMP | \ - __GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_NOWARN) + __GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_NOWARN | \ + __GFP_NO_KSWAPD) #ifdef CONFIG_NUMA #define GFP_THISNODE (__GFP_THISNODE | __GFP_NOWARN | __GFP_NORETRY) diff --git a/include/trace/events/gfpflags.h b/include/trace/events/gfpflags.h index 9391706..d6fd8e5 100644 --- a/include/trace/events/gfpflags.h +++ b/include/trace/events/gfpflags.h @@ -36,6 +36,7 @@ {(unsigned long)__GFP_RECLAIMABLE, "GFP_RECLAIMABLE"}, \ {(unsigned long)__GFP_MOVABLE, "GFP_MOVABLE"}, \ {(unsigned long)__GFP_NOTRACK, "GFP_NOTRACK"}, \ + {(unsigned long)__GFP_NO_KSWAPD, "GFP_NO_KSWAPD"}, \ {(unsigned long)__GFP_OTHER_NODE, "GFP_OTHER_NODE"} \ ) : "GFP_NOWAIT" diff --git a/mm/page_alloc.c b/mm/page_alloc.c index bb90971..7228260 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2416,8 +2416,9 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, goto nopage; restart: - wake_all_kswapd(order, zonelist, high_zoneidx, - zone_idx(preferred_zone)); + if (!(gfp_mask & __GFP_NO_KSWAPD)) + wake_all_kswapd(order, zonelist, high_zoneidx, + zone_idx(preferred_zone)); /* * OK, we're below the kswapd watermark and have kicked background @@ -2494,7 +2495,7 @@ rebalance: * system then fail the allocation instead of entering direct reclaim. */ if ((deferred_compaction || contended_compaction) && - (gfp_mask & (__GFP_MOVABLE|__GFP_REPEAT)) == __GFP_MOVABLE) + (gfp_mask & __GFP_NO_KSWAPD)) goto nopage; /* Try direct reclaim and then allocating */ From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752651Ab2KLMUD (ORCPT ); Mon, 12 Nov 2012 07:20:03 -0500 Received: from cantor2.suse.de ([195.135.220.15]:33023 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751731Ab2KLMUB (ORCPT ); Mon, 12 Nov 2012 07:20:01 -0500 Date: Mon, 12 Nov 2012 12:19:57 +0000 From: Mel Gorman To: Zdenek Kabelac Cc: Seth Jennings , Jiri Slaby , Valdis.Kletnieks@vt.edu, Jiri Slaby , linux-mm@kvack.org, LKML , Andrew Morton , Rik van Riel , Robert Jennings Subject: Re: kswapd0: excessive CPU usage Message-ID: <20121112121956.GT8218@suse.de> References: <20121012135726.GY29125@suse.de> <507BDD45.1070705@suse.cz> <20121015110937.GE29125@suse.de> <5093A3F4.8090108@redhat.com> <5093A631.5020209@suse.cz> <509422C3.1000803@suse.cz> <509C84ED.8090605@linux.vnet.ibm.com> <509CB9D1.6060704@redhat.com> <20121109090635.GG8218@suse.de> <509F6C2A.9060502@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <509F6C2A.9060502@redhat.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, Nov 11, 2012 at 10:13:14AM +0100, Zdenek Kabelac wrote: > Hmm, so it's just took longer to hit the problem and observe kswapd0 > spinning on my CPU again - it's not as endless like before - but > still it easily eats minutes - it helps to turn off Firefox or TB > (memory hungry apps) so kswapd0 stops soon - and restart those apps > again. > (And I still have like >1GB of cached memory) > I posted a "safe" patch that I believe explains why you are seeing what you are seeing. It does mean that there will still be some stalls due to THP because kswapd is not helping and it's avoiding the problem rather than trying to deal with it. Hence, I'm also going to post this patch even though I have not tested it myself. If you find it fixes the problem then it would be a preferable patch to the revert. It still is the case that the balance_pgdat() logic is in sort need of a rethink as it's pretty twisted right now. Thanks ---8<--- mm: Avoid waking kswapd for THP allocations when compaction is deferred or contended With "mm: vmscan: scale number of pages reclaimed by reclaim/compaction based on failures" reverted, Zdenek Kabelac reported the following Hmm, so it's just took longer to hit the problem and observe kswapd0 spinning on my CPU again - it's not as endless like before - but still it easily eats minutes - it helps to turn off Firefox or TB (memory hungry apps) so kswapd0 stops soon - and restart those apps again. (And I still have like >1GB of cached memory) kswapd0 R running task 0 30 2 0x00000000 ffff8801331efae8 0000000000000082 0000000000000018 0000000000000246 ffff880135b9a340 ffff8801331effd8 ffff8801331effd8 ffff8801331effd8 ffff880055dfa340 ffff880135b9a340 00000000331efad8 ffff8801331ee000 Call Trace: [] preempt_schedule+0x42/0x60 [] _raw_spin_unlock+0x55/0x60 [] put_super+0x31/0x40 [] drop_super+0x22/0x30 [] prune_super+0x149/0x1b0 [] shrink_slab+0xba/0x510 The sysrq+m indicates the system has no swap so it'll never reclaim anonymous pages as part of reclaim/compaction. That is one part of the problem but not the root cause as file-backed pages could also be reclaimed. The likely underlying problem is that kswapd is woken up or kept awake for each THP allocation request in the page allocator slow path. If compaction fails for the requesting process then compaction will be deferred for a time and direct reclaim is avoided. However, if there are a storm of THP requests that are simply rejected, it will still be the the case that kswapd is awake for a prolonged period of time as pgdat->kswapd_max_order is updated each time. This is noticed by the main kswapd() loop and it will not call kswapd_try_to_sleep(). Instead it will loopp, shrinking a small number of pages and calling shrink_slab() on each iteration. This patch defers when kswapd gets woken up for THP allocations. For !THP allocations, kswapd is always woken up. For THP allocations, kswapd is woken up iff the process is willing to enter into direct reclaim/compaction. Signed-off-by: Mel Gorman diff --git a/mm/page_alloc.c b/mm/page_alloc.c index bb90971..0b469b4 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2378,6 +2378,15 @@ bool gfp_pfmemalloc_allowed(gfp_t gfp_mask) return !!(gfp_to_alloc_flags(gfp_mask) & ALLOC_NO_WATERMARKS); } +/* Returns true if the allocation is likely for THP */ +static bool is_thp_alloc(gfp_t gfp_mask, unsigned int order) +{ + if (order == pageblock_order && + (gfp_mask & (__GFP_MOVABLE|__GFP_REPEAT)) == __GFP_MOVABLE) + return true; + return false; +} + static inline struct page * __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, struct zonelist *zonelist, enum zone_type high_zoneidx, @@ -2416,7 +2425,9 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, goto nopage; restart: - wake_all_kswapd(order, zonelist, high_zoneidx, + /* The decision whether to wake kswapd for THP is made later */ + if (!is_thp_alloc(gfp_mask, order)) + wake_all_kswapd(order, zonelist, high_zoneidx, zone_idx(preferred_zone)); /* @@ -2487,15 +2498,21 @@ rebalance: goto got_pg; sync_migration = true; - /* - * If compaction is deferred for high-order allocations, it is because - * sync compaction recently failed. In this is the case and the caller - * requested a movable allocation that does not heavily disrupt the - * system then fail the allocation instead of entering direct reclaim. - */ - if ((deferred_compaction || contended_compaction) && - (gfp_mask & (__GFP_MOVABLE|__GFP_REPEAT)) == __GFP_MOVABLE) - goto nopage; + if (is_thp_alloc(gfp_mask, order)) { + /* + * If compaction is deferred for high-order allocations, it is + * because sync compaction recently failed. In this is the case + * and the caller requested a movable allocation that does not + * heavily disrupt the system then fail the allocation instead + * of entering direct reclaim. + */ + if (deferred_compaction || contended_compaction) + goto nopage; + + /* If process is willing to reclaim/compact then wake kswapd */ + wake_all_kswapd(order, zonelist, high_zoneidx, + zone_idx(preferred_zone)); + } /* Try direct reclaim and then allocating */ page = __alloc_pages_direct_reclaim(gfp_mask, order, From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752260Ab2KLNNg (ORCPT ); Mon, 12 Nov 2012 08:13:36 -0500 Received: from mx1.redhat.com ([209.132.183.28]:65306 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751283Ab2KLNNf (ORCPT ); Mon, 12 Nov 2012 08:13:35 -0500 Message-ID: <50A0F5F0.6090400@redhat.com> Date: Mon, 12 Nov 2012 14:13:20 +0100 From: Zdenek Kabelac Organization: Red Hat User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:16.0) Gecko/20121016 Thunderbird/16.0.1 MIME-Version: 1.0 To: Mel Gorman CC: Seth Jennings , Jiri Slaby , Valdis.Kletnieks@vt.edu, Jiri Slaby , linux-mm@kvack.org, LKML , Andrew Morton , Rik van Riel , Robert Jennings Subject: Re: kswapd0: excessive CPU usage References: <20121012135726.GY29125@suse.de> <507BDD45.1070705@suse.cz> <20121015110937.GE29125@suse.de> <5093A3F4.8090108@redhat.com> <5093A631.5020209@suse.cz> <509422C3.1000803@suse.cz> <509C84ED.8090605@linux.vnet.ibm.com> <509CB9D1.6060704@redhat.com> <20121109090635.GG8218@suse.de> <509F6C2A.9060502@redhat.com> <20121112121956.GT8218@suse.de> In-Reply-To: <20121112121956.GT8218@suse.de> Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Dne 12.11.2012 13:19, Mel Gorman napsal(a): > On Sun, Nov 11, 2012 at 10:13:14AM +0100, Zdenek Kabelac wrote: >> Hmm, so it's just took longer to hit the problem and observe kswapd0 >> spinning on my CPU again - it's not as endless like before - but >> still it easily eats minutes - it helps to turn off Firefox or TB >> (memory hungry apps) so kswapd0 stops soon - and restart those apps >> again. >> (And I still have like >1GB of cached memory) >> > > I posted a "safe" patch that I believe explains why you are seeing what > you are seeing. It does mean that there will still be some stalls due to > THP because kswapd is not helping and it's avoiding the problem rather > than trying to deal with it. > > Hence, I'm also going to post this patch even though I have not tested > it myself. If you find it fixes the problem then it would be a > preferable patch to the revert. It still is the case that the > balance_pgdat() logic is in sort need of a rethink as it's pretty > twisted right now. > Should I apply them all together for 3.7-rc5 ? 1) https://lkml.org/lkml/2012/11/5/308 2) https://lkml.org/lkml/2012/11/12/113 3) https://lkml.org/lkml/2012/11/12/151 Zdenek From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752701Ab2KLNbo (ORCPT ); Mon, 12 Nov 2012 08:31:44 -0500 Received: from cantor2.suse.de ([195.135.220.15]:36316 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751545Ab2KLNbn (ORCPT ); Mon, 12 Nov 2012 08:31:43 -0500 Date: Mon, 12 Nov 2012 13:31:39 +0000 From: Mel Gorman To: Zdenek Kabelac Cc: Seth Jennings , Jiri Slaby , Valdis.Kletnieks@vt.edu, Jiri Slaby , linux-mm@kvack.org, LKML , Andrew Morton , Rik van Riel , Robert Jennings Subject: Re: kswapd0: excessive CPU usage Message-ID: <20121112133139.GU8218@suse.de> References: <20121015110937.GE29125@suse.de> <5093A3F4.8090108@redhat.com> <5093A631.5020209@suse.cz> <509422C3.1000803@suse.cz> <509C84ED.8090605@linux.vnet.ibm.com> <509CB9D1.6060704@redhat.com> <20121109090635.GG8218@suse.de> <509F6C2A.9060502@redhat.com> <20121112121956.GT8218@suse.de> <50A0F5F0.6090400@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <50A0F5F0.6090400@redhat.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Nov 12, 2012 at 02:13:20PM +0100, Zdenek Kabelac wrote: > Dne 12.11.2012 13:19, Mel Gorman napsal(a): > >On Sun, Nov 11, 2012 at 10:13:14AM +0100, Zdenek Kabelac wrote: > >>Hmm, so it's just took longer to hit the problem and observe kswapd0 > >>spinning on my CPU again - it's not as endless like before - but > >>still it easily eats minutes - it helps to turn off Firefox or TB > >>(memory hungry apps) so kswapd0 stops soon - and restart those apps > >>again. > >>(And I still have like >1GB of cached memory) > >> > > > >I posted a "safe" patch that I believe explains why you are seeing what > >you are seeing. It does mean that there will still be some stalls due to > >THP because kswapd is not helping and it's avoiding the problem rather > >than trying to deal with it. > > > >Hence, I'm also going to post this patch even though I have not tested > >it myself. If you find it fixes the problem then it would be a > >preferable patch to the revert. It still is the case that the > >balance_pgdat() logic is in sort need of a rethink as it's pretty > >twisted right now. > > > > > Should I apply them all together for 3.7-rc5 ? > > 1) https://lkml.org/lkml/2012/11/5/308 > 2) https://lkml.org/lkml/2012/11/12/113 > 3) https://lkml.org/lkml/2012/11/12/151 > Not all together. Test either 1+2 or 1+3. 1+2 is the safer choice but does nothing about THP stalls. 1+3 is a riskier version but depends on me being correct on what the root cause of the problem you see it. If both 1+2 and 1+3 work for you, I'd choose 1+3 for merging. If you only have the time to test one combination then it would be preferred that you test the safe option of 1+2. Thanks. -- Mel Gorman SUSE Labs From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753557Ab2KLOu5 (ORCPT ); Mon, 12 Nov 2012 09:50:57 -0500 Received: from mx1.redhat.com ([209.132.183.28]:61495 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752804Ab2KLOu4 (ORCPT ); Mon, 12 Nov 2012 09:50:56 -0500 Message-ID: <50A10CBA.7000200@redhat.com> Date: Mon, 12 Nov 2012 15:50:34 +0100 From: Zdenek Kabelac Organization: Red Hat User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:16.0) Gecko/20121016 Thunderbird/16.0.1 MIME-Version: 1.0 To: Mel Gorman CC: Seth Jennings , Jiri Slaby , Valdis.Kletnieks@vt.edu, Jiri Slaby , linux-mm@kvack.org, LKML , Andrew Morton , Rik van Riel , Robert Jennings Subject: Re: kswapd0: excessive CPU usage References: <20121015110937.GE29125@suse.de> <5093A3F4.8090108@redhat.com> <5093A631.5020209@suse.cz> <509422C3.1000803@suse.cz> <509C84ED.8090605@linux.vnet.ibm.com> <509CB9D1.6060704@redhat.com> <20121109090635.GG8218@suse.de> <509F6C2A.9060502@redhat.com> <20121112121956.GT8218@suse.de> <50A0F5F0.6090400@redhat.com> <20121112133139.GU8218@suse.de> In-Reply-To: <20121112133139.GU8218@suse.de> Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Dne 12.11.2012 14:31, Mel Gorman napsal(a): > On Mon, Nov 12, 2012 at 02:13:20PM +0100, Zdenek Kabelac wrote: >> Dne 12.11.2012 13:19, Mel Gorman napsal(a): >>> On Sun, Nov 11, 2012 at 10:13:14AM +0100, Zdenek Kabelac wrote: >>>> Hmm, so it's just took longer to hit the problem and observe kswapd0 >>>> spinning on my CPU again - it's not as endless like before - but >>>> still it easily eats minutes - it helps to turn off Firefox or TB >>>> (memory hungry apps) so kswapd0 stops soon - and restart those apps >>>> again. >>>> (And I still have like >1GB of cached memory) >>>> >>> >>> I posted a "safe" patch that I believe explains why you are seeing what >>> you are seeing. It does mean that there will still be some stalls due to >>> THP because kswapd is not helping and it's avoiding the problem rather >>> than trying to deal with it. >>> >>> Hence, I'm also going to post this patch even though I have not tested >>> it myself. If you find it fixes the problem then it would be a >>> preferable patch to the revert. It still is the case that the >>> balance_pgdat() logic is in sort need of a rethink as it's pretty >>> twisted right now. >>> >> >> >> Should I apply them all together for 3.7-rc5 ? >> >> 1) https://lkml.org/lkml/2012/11/5/308 >> 2) https://lkml.org/lkml/2012/11/12/113 >> 3) https://lkml.org/lkml/2012/11/12/151 >> > > Not all together. Test either 1+2 or 1+3. 1+2 is the safer choice but > does nothing about THP stalls. 1+3 is a riskier version but depends on > me being correct on what the root cause of the problem you see it. > > If both 1+2 and 1+3 work for you, I'd choose 1+3 for merging. If you only > have the time to test one combination then it would be preferred that you > test the safe option of 1+2. > > I'll go with 1+2 for couple days - the issue is - I've no idea how it gets suddenly triggered - it seemed to be running fine for 2-3 days even with just 1) - but then kswapd0 started to occupy CPU for minutes. Looks like some intensive workload on firefox (flash) may lead to that. Anyway it's hard to tell quickly if it helped. Zdenek From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933118Ab2KNVoW (ORCPT ); Wed, 14 Nov 2012 16:44:22 -0500 Received: from mail.fem.tu-ilmenau.de ([141.24.101.79]:46840 "EHLO mail.fem.tu-ilmenau.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753480Ab2KNVoV (ORCPT ); Wed, 14 Nov 2012 16:44:21 -0500 Date: Wed, 14 Nov 2012 22:43:40 +0100 From: Johannes Hirte To: Mel Gorman Cc: Andrew Morton , Zdenek Kabelac , Valdis.Kletnieks@vt.edu, Jiri Slaby , linux-mm@kvack.org, Rik van Riel , Jiri Slaby , LKML Subject: Re: [PATCH] Revert "mm: vmscan: scale number of pages reclaimed by reclaim/compaction based on failures" Message-ID: <20121114224340.5f7cee78@fem.tu-ilmenau.de> In-Reply-To: <20121109083637.GD8218@suse.de> References: <5077434D.7080008@suse.cz> <50780F26.7070007@suse.cz> <20121012135726.GY29125@suse.de> <507BDD45.1070705@suse.cz> <20121015110937.GE29125@suse.de> <5093A3F4.8090108@redhat.com> <5093A631.5020209@suse.cz> <509422C3.1000803@suse.cz> <20121105142449.GI8218@suse.de> <20121106111554.1896c3f3@fem.tu-ilmenau.de> <20121109083637.GD8218@suse.de> X-Mailer: Claws Mail 3.8.1 (GTK+ 2.24.13; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Am Fri, 9 Nov 2012 08:36:37 +0000 schrieb Mel Gorman : > On Tue, Nov 06, 2012 at 11:15:54AM +0100, Johannes Hirte wrote: > > Am Mon, 5 Nov 2012 14:24:49 +0000 > > schrieb Mel Gorman : > > > > > Jiri Slaby reported the following: > > > > > > (It's an effective revert of "mm: vmscan: scale number of > > > pages reclaimed by reclaim/compaction based on failures".) Given > > > kswapd had hours of runtime in ps/top output yesterday in the > > > morning and after the revert it's now 2 minutes in sum for the > > > last 24h, I would say, it's gone. > > > > > > The intention of the patch in question was to compensate for the > > > loss of lumpy reclaim. Part of the reason lumpy reclaim worked is > > > because it aggressively reclaimed pages and this patch was meant > > > to be a sane compromise. > > > > > > When compaction fails, it gets deferred and both compaction and > > > reclaim/compaction is deferred avoid excessive reclaim. However, > > > since commit c6543459 (mm: remove __GFP_NO_KSWAPD), kswapd is > > > woken up each time and continues reclaiming which was not taken > > > into account when the patch was developed. > > > > > > Attempts to address the problem ended up just changing the shape > > > of the problem instead of fixing it. The release window gets > > > closer and while a THP allocation failing is not a major problem, > > > kswapd chewing up a lot of CPU is. This patch reverts "mm: > > > vmscan: scale number of pages reclaimed by reclaim/compaction > > > based on failures" and will be revisited in the future. > > > > > > Signed-off-by: Mel Gorman > > > --- > > > mm/vmscan.c | 25 ------------------------- > > > 1 file changed, 25 deletions(-) > > > > > > diff --git a/mm/vmscan.c b/mm/vmscan.c > > > index 2624edc..e081ee8 100644 > > > --- a/mm/vmscan.c > > > +++ b/mm/vmscan.c > > > @@ -1760,28 +1760,6 @@ static bool in_reclaim_compaction(struct > > > scan_control *sc) return false; > > > } > > > > > > -#ifdef CONFIG_COMPACTION > > > -/* > > > - * If compaction is deferred for sc->order then scale the number > > > of pages > > > - * reclaimed based on the number of consecutive allocation > > > failures > > > - */ > > > -static unsigned long scale_for_compaction(unsigned long > > > pages_for_compaction, > > > - struct lruvec *lruvec, struct > > > scan_control *sc) -{ > > > - struct zone *zone = lruvec_zone(lruvec); > > > - > > > - if (zone->compact_order_failed <= sc->order) > > > - pages_for_compaction <<= > > > zone->compact_defer_shift; > > > - return pages_for_compaction; > > > -} > > > -#else > > > -static unsigned long scale_for_compaction(unsigned long > > > pages_for_compaction, > > > - struct lruvec *lruvec, struct > > > scan_control *sc) -{ > > > - return pages_for_compaction; > > > -} > > > -#endif > > > - > > > /* > > > * Reclaim/compaction is used for high-order allocation > > > requests. It reclaims > > > * order-0 pages before compacting the zone. > > > should_continue_reclaim() returns @@ -1829,9 +1807,6 @@ static > > > inline bool should_continue_reclaim(struct lruvec *lruvec, > > > * inactive lists are large enough, continue reclaiming > > > */ > > > pages_for_compaction = (2UL << sc->order); > > > - > > > - pages_for_compaction = > > > scale_for_compaction(pages_for_compaction, > > > - lruvec, sc); > > > inactive_lru_pages = get_lru_size(lruvec, > > > LRU_INACTIVE_FILE); if (nr_swap_pages > 0) > > > inactive_lru_pages += get_lru_size(lruvec, > > > LRU_INACTIVE_ANON); -- > > > > Even with this patch I see kswapd0 very often on top. Much more than > > with kernel 3.6. > > How severe is the CPU usage? The higher usage can be explained by "mm: > remove __GFP_NO_KSWAPD" which allows kswapd to compact memory to > reduce the amount of time processes spend in compaction but will > result in the CPU cost being incurred by kswapd. > > Is it really high like the bug was reporting with high usage over long > periods of time or do you just see it using 2-6% of CPU for short > periods? It is really high. I've seen with compile-jobs (make -j4 on dual core) kswapd0 consuming at least 50% CPU most time. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753282Ab2KPTOt (ORCPT ); Fri, 16 Nov 2012 14:14:49 -0500 Received: from mail-vb0-f46.google.com ([209.85.212.46]:47601 "EHLO mail-vb0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753193Ab2KPTOs (ORCPT ); Fri, 16 Nov 2012 14:14:48 -0500 MIME-Version: 1.0 In-Reply-To: <20121112113731.GS8218@suse.de> References: <20121012135726.GY29125@suse.de> <507BDD45.1070705@suse.cz> <20121015110937.GE29125@suse.de> <5093A3F4.8090108@redhat.com> <5093A631.5020209@suse.cz> <509422C3.1000803@suse.cz> <509C84ED.8090605@linux.vnet.ibm.com> <509CB9D1.6060704@redhat.com> <20121109090635.GG8218@suse.de> <509F6C2A.9060502@redhat.com> <20121112113731.GS8218@suse.de> Date: Fri, 16 Nov 2012 14:14:47 -0500 Message-ID: Subject: Re: [PATCH] Revert "mm: remove __GFP_NO_KSWAPD" From: Josh Boyer To: Mel Gorman Cc: Zdenek Kabelac , Seth Jennings , Jiri Slaby , Valdis.Kletnieks@vt.edu, Jiri Slaby , linux-mm@kvack.org, LKML , Andrew Morton , Rik van Riel , Robert Jennings Content-Type: text/plain; charset=ISO-8859-1 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Nov 12, 2012 at 6:37 AM, Mel Gorman wrote: > With "mm: vmscan: scale number of pages reclaimed by reclaim/compaction > based on failures" reverted, Zdenek Kabelac reported the following > > Hmm, so it's just took longer to hit the problem and observe > kswapd0 spinning on my CPU again - it's not as endless like before - > but still it easily eats minutes - it helps to turn off Firefox > or TB (memory hungry apps) so kswapd0 stops soon - and restart > those apps again. (And I still have like >1GB of cached memory) > > kswapd0 R running task 0 30 2 0x00000000 > ffff8801331efae8 0000000000000082 0000000000000018 0000000000000246 > ffff880135b9a340 ffff8801331effd8 ffff8801331effd8 ffff8801331effd8 > ffff880055dfa340 ffff880135b9a340 00000000331efad8 ffff8801331ee000 > Call Trace: > [] preempt_schedule+0x42/0x60 > [] _raw_spin_unlock+0x55/0x60 > [] put_super+0x31/0x40 > [] drop_super+0x22/0x30 > [] prune_super+0x149/0x1b0 > [] shrink_slab+0xba/0x510 > > The sysrq+m indicates the system has no swap so it'll never reclaim > anonymous pages as part of reclaim/compaction. That is one part of the > problem but not the root cause as file-backed pages could also be reclaimed. > > The likely underlying problem is that kswapd is woken up or kept awake > for each THP allocation request in the page allocator slow path. > > If compaction fails for the requesting process then compaction will be > deferred for a time and direct reclaim is avoided. However, if there > are a storm of THP requests that are simply rejected, it will still > be the the case that kswapd is awake for a prolonged period of time > as pgdat->kswapd_max_order is updated each time. This is noticed by > the main kswapd() loop and it will not call kswapd_try_to_sleep(). > Instead it will loopp, shrinking a small number of pages and calling > shrink_slab() on each iteration. > > The temptation is to supply a patch that checks if kswapd was woken for > THP and if so ignore pgdat->kswapd_max_order but it'll be a hack and not > backed up by proper testing. As 3.7 is very close to release and this is > not a bug we should release with, a safer path is to revert "mm: remove > __GFP_NO_KSWAPD" for now and revisit it with the view to ironing out the > balance_pgdat() logic in general. > > Signed-off-by: Mel Gorman Does anyone know if this is queued to go into 3.7 somewhere? I looked a bit and can't find it in a tree. We have a few reports of Fedora rawhide users hitting this. josh From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751619Ab2KPTvc (ORCPT ); Fri, 16 Nov 2012 14:51:32 -0500 Received: from mail.linuxfoundation.org ([140.211.169.12]:44624 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751145Ab2KPTvb (ORCPT ); Fri, 16 Nov 2012 14:51:31 -0500 Date: Fri, 16 Nov 2012 11:51:24 -0800 From: Andrew Morton To: Josh Boyer Cc: Mel Gorman , Zdenek Kabelac , Seth Jennings , Jiri Slaby , Valdis.Kletnieks@vt.edu, Jiri Slaby , linux-mm@kvack.org, LKML , Rik van Riel , Robert Jennings Subject: Re: [PATCH] Revert "mm: remove __GFP_NO_KSWAPD" Message-Id: <20121116115124.c2981abc.akpm@linux-foundation.org> In-Reply-To: References: <20121012135726.GY29125@suse.de> <507BDD45.1070705@suse.cz> <20121015110937.GE29125@suse.de> <5093A3F4.8090108@redhat.com> <5093A631.5020209@suse.cz> <509422C3.1000803@suse.cz> <509C84ED.8090605@linux.vnet.ibm.com> <509CB9D1.6060704@redhat.com> <20121109090635.GG8218@suse.de> <509F6C2A.9060502@redhat.com> <20121112113731.GS8218@suse.de> X-Mailer: Sylpheed 3.0.2 (GTK+ 2.20.1; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 16 Nov 2012 14:14:47 -0500 Josh Boyer wrote: > > The temptation is to supply a patch that checks if kswapd was woken for > > THP and if so ignore pgdat->kswapd_max_order but it'll be a hack and not > > backed up by proper testing. As 3.7 is very close to release and this is > > not a bug we should release with, a safer path is to revert "mm: remove > > __GFP_NO_KSWAPD" for now and revisit it with the view to ironing out the > > balance_pgdat() logic in general. > > > > Signed-off-by: Mel Gorman > > Does anyone know if this is queued to go into 3.7 somewhere? I looked > a bit and can't find it in a tree. We have a few reports of Fedora > rawhide users hitting this. Still thinking about it. We're reverting quite a lot of material lately. mm-revert-mm-vmscan-scale-number-of-pages-reclaimed-by-reclaim-compaction-based-on-failures.patch and revert-mm-fix-up-zone-present-pages.patch are queued for 3.7. I'll toss this one in there as well, but I can't say I'm feeling terribly confident. How is Valdis's machine nowadays? From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753514Ab2KPUGW (ORCPT ); Fri, 16 Nov 2012 15:06:22 -0500 Received: from cantor2.suse.de ([195.135.220.15]:56651 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753367Ab2KPUGV (ORCPT ); Fri, 16 Nov 2012 15:06:21 -0500 Date: Fri, 16 Nov 2012 20:06:17 +0000 From: Mel Gorman To: Josh Boyer Cc: Zdenek Kabelac , Seth Jennings , Jiri Slaby , Valdis.Kletnieks@vt.edu, Jiri Slaby , linux-mm@kvack.org, LKML , Andrew Morton , Rik van Riel , Robert Jennings Subject: Re: [PATCH] Revert "mm: remove __GFP_NO_KSWAPD" Message-ID: <20121116200616.GK8218@suse.de> References: <20121015110937.GE29125@suse.de> <5093A3F4.8090108@redhat.com> <5093A631.5020209@suse.cz> <509422C3.1000803@suse.cz> <509C84ED.8090605@linux.vnet.ibm.com> <509CB9D1.6060704@redhat.com> <20121109090635.GG8218@suse.de> <509F6C2A.9060502@redhat.com> <20121112113731.GS8218@suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Nov 16, 2012 at 02:14:47PM -0500, Josh Boyer wrote: > On Mon, Nov 12, 2012 at 6:37 AM, Mel Gorman wrote: > > With "mm: vmscan: scale number of pages reclaimed by reclaim/compaction > > based on failures" reverted, Zdenek Kabelac reported the following > > > > Hmm, so it's just took longer to hit the problem and observe > > kswapd0 spinning on my CPU again - it's not as endless like before - > > but still it easily eats minutes - it helps to turn off Firefox > > or TB (memory hungry apps) so kswapd0 stops soon - and restart > > those apps again. (And I still have like >1GB of cached memory) > > > > kswapd0 R running task 0 30 2 0x00000000 > > ffff8801331efae8 0000000000000082 0000000000000018 0000000000000246 > > ffff880135b9a340 ffff8801331effd8 ffff8801331effd8 ffff8801331effd8 > > ffff880055dfa340 ffff880135b9a340 00000000331efad8 ffff8801331ee000 > > Call Trace: > > [] preempt_schedule+0x42/0x60 > > [] _raw_spin_unlock+0x55/0x60 > > [] put_super+0x31/0x40 > > [] drop_super+0x22/0x30 > > [] prune_super+0x149/0x1b0 > > [] shrink_slab+0xba/0x510 > > > > The sysrq+m indicates the system has no swap so it'll never reclaim > > anonymous pages as part of reclaim/compaction. That is one part of the > > problem but not the root cause as file-backed pages could also be reclaimed. > > > > The likely underlying problem is that kswapd is woken up or kept awake > > for each THP allocation request in the page allocator slow path. > > > > If compaction fails for the requesting process then compaction will be > > deferred for a time and direct reclaim is avoided. However, if there > > are a storm of THP requests that are simply rejected, it will still > > be the the case that kswapd is awake for a prolonged period of time > > as pgdat->kswapd_max_order is updated each time. This is noticed by > > the main kswapd() loop and it will not call kswapd_try_to_sleep(). > > Instead it will loopp, shrinking a small number of pages and calling > > shrink_slab() on each iteration. > > > > The temptation is to supply a patch that checks if kswapd was woken for > > THP and if so ignore pgdat->kswapd_max_order but it'll be a hack and not > > backed up by proper testing. As 3.7 is very close to release and this is > > not a bug we should release with, a safer path is to revert "mm: remove > > __GFP_NO_KSWAPD" for now and revisit it with the view to ironing out the > > balance_pgdat() logic in general. > > > > Signed-off-by: Mel Gorman > > Does anyone know if this is queued to go into 3.7 somewhere? I looked > a bit and can't find it in a tree. We have a few reports of Fedora > rawhide users hitting this. > No, because I was waiting to hear if a) it worked and preferably if the alternative "less safe" option worked. This close to release it might be better to just go with the safe option. -- Mel Gorman SUSE Labs From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752441Ab2KRTBA (ORCPT ); Sun, 18 Nov 2012 14:01:00 -0500 Received: from mx1.redhat.com ([209.132.183.28]:40194 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752300Ab2KRTA6 (ORCPT ); Sun, 18 Nov 2012 14:00:58 -0500 Message-ID: <50A9304E.3020205@redhat.com> Date: Sun, 18 Nov 2012 20:00:30 +0100 From: Zdenek Kabelac Organization: Red Hat User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:16.0) Gecko/20121016 Thunderbird/16.0.1 MIME-Version: 1.0 To: Mel Gorman CC: Seth Jennings , Jiri Slaby , Valdis.Kletnieks@vt.edu, Jiri Slaby , linux-mm@kvack.org, LKML , Andrew Morton , Rik van Riel , Robert Jennings Subject: Re: kswapd0: excessive CPU usage References: <20121015110937.GE29125@suse.de> <5093A3F4.8090108@redhat.com> <5093A631.5020209@suse.cz> <509422C3.1000803@suse.cz> <509C84ED.8090605@linux.vnet.ibm.com> <509CB9D1.6060704@redhat.com> <20121109090635.GG8218@suse.de> <509F6C2A.9060502@redhat.com> <20121112121956.GT8218@suse.de> <50A0F5F0.6090400@redhat.com> <20121112133139.GU8218@suse.de> In-Reply-To: <20121112133139.GU8218@suse.de> Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Dne 12.11.2012 14:31, Mel Gorman napsal(a): > On Mon, Nov 12, 2012 at 02:13:20PM +0100, Zdenek Kabelac wrote: >> Dne 12.11.2012 13:19, Mel Gorman napsal(a): >>> On Sun, Nov 11, 2012 at 10:13:14AM +0100, Zdenek Kabelac wrote: >>>> Hmm, so it's just took longer to hit the problem and observe kswapd0 >>>> spinning on my CPU again - it's not as endless like before - but >>>> still it easily eats minutes - it helps to turn off Firefox or TB >>>> (memory hungry apps) so kswapd0 stops soon - and restart those apps >>>> again. >>>> (And I still have like >1GB of cached memory) >>>> >>> >>> I posted a "safe" patch that I believe explains why you are seeing what >>> you are seeing. It does mean that there will still be some stalls due to >>> THP because kswapd is not helping and it's avoiding the problem rather >>> than trying to deal with it. >>> >>> Hence, I'm also going to post this patch even though I have not tested >>> it myself. If you find it fixes the problem then it would be a >>> preferable patch to the revert. It still is the case that the >>> balance_pgdat() logic is in sort need of a rethink as it's pretty >>> twisted right now. >>> >> >> >> Should I apply them all together for 3.7-rc5 ? >> >> 1) https://lkml.org/lkml/2012/11/5/308 >> 2) https://lkml.org/lkml/2012/11/12/113 >> 3) https://lkml.org/lkml/2012/11/12/151 >> > > Not all together. Test either 1+2 or 1+3. 1+2 is the safer choice but > does nothing about THP stalls. 1+3 is a riskier version but depends on > me being correct on what the root cause of the problem you see it. > > If both 1+2 and 1+3 work for you, I'd choose 1+3 for merging. If you only > have the time to test one combination then it would be preferred that you > test the safe option of 1+2. So I've tested 1+2 for a few days - once I've rebooted for another reason, but today happened this to me (with ~2day uptime) For some reason my machine went ouf of memory and OOM killed firefox and then even whole Xsession. Unsure whether it's related to those 2 patches - but I've never had such OOM failure before. Should I experiment now with 1+3 - or is there newer thing to test ? Zdenek X: page allocation failure: order:0, mode:0x200da Pid: 1126, comm: X Not tainted 3.7.0-rc5-00007-g95e21c5 #100 Call Trace: [] warn_alloc_failed+0xe9/0x140 [] __alloc_pages_nodemask+0x7fa/0xa40 [] shmem_getpage_gfp+0x603/0x9d0 [] ? native_sched_clock+0x26/0x90 [] shmem_fault+0x4f/0xa0 [] shm_fault+0x1e/0x20 [] __do_fault+0x73/0x4d0 [] ? generic_file_aio_write+0xb0/0x100 [] handle_pte_fault+0x97/0x9a0 [] ? __lock_is_held+0x5f/0x90 [] ? get_parent_ip+0x11/0x50 rsyslogd invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0 rsyslogd cpuset=/ mems_allowed=0 Pid: 571, comm: rsyslogd Not tainted 3.7.0-rc5-00007-g95e21c5 #100 Call Trace: [] dump_header.isra.12+0x78/0x224 [] ? sub_preempt_count+0x79/0xd0 [] ? _raw_spin_unlock_irqrestore+0x42/0x80 [] ? ___ratelimit+0x9e/0x130 [] oom_kill_process+0x1d3/0x330 [] out_of_memory+0x439/0x4a0 [] __alloc_pages_nodemask+0x976/0xa40 [] ? find_get_page+0x5/0x230 [] filemap_fault+0x2d0/0x480 [] __do_fault+0x73/0x4d0 [] handle_pte_fault+0x97/0x9a0 [] ? __lock_is_held+0x5f/0x90 [] ? get_parent_ip+0x11/0x50 [] handle_mm_fault+0x22f/0x2f0 [] __do_page_fault+0x15d/0x4e0 [] ? sub_preempt_count+0x79/0xd0 [] ? _raw_spin_unlock+0x35/0x60 [] ? proc_reg_read+0x8c/0xc0 [] ? error_sti+0x5/0x6 [] ? trace_hardirqs_off_thunk+0x3a/0x3c [] do_page_fault+0xe/0x10 [] page_fault+0x22/0x30 Mem-Info: DMA per-cpu: CPU 0: hi: 0, btch: 1 usd: 0 CPU 1: hi: 0, btch: 1 usd: 0 DMA32 per-cpu: CPU 0: hi: 186, btch: 31 usd: 30 CPU 1: hi: 186, btch: 31 usd: 6 Normal per-cpu: CPU 0: hi: 186, btch: 31 usd: 30 CPU 1: hi: 186, btch: 31 usd: 0 active_anon:900420 inactive_anon:28835 isolated_anon:0 active_file:43 inactive_file:21 isolated_file:0 unevictable:4 dirty:34 writeback:2 unstable:0 free:20731 slab_reclaimable:8641 slab_unreclaimable:10446 mapped:18325 shmem:243662 pagetables:7705 bounce:0 free_cma:0 DMA free:12120kB min:272kB low:340kB high:408kB active_anon:2892kB inactive_anon:872kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15900kB mlocked:0kB dirty:0kB writeback:0kB mapped:1672kB shmem:3596kB slab_reclaimable:0kB slab_unreclaimable:16kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes lowmem_reserve[]: 0 2951 3836 3836 DMA32 free:55296kB min:51776kB low:64720kB high:77664kB active_anon:2834992kB inactive_anon:107924kB active_file:92kB inactive_file:52kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3021968kB mlocked:0kB dirty:88kB writeback:0kB mapped:65460kB shmem:943100kB slab_reclaimable:11700kB slab_unreclaimable:8968kB kernel_stack:592kB pagetables:11852kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:180 all_unreclaimable? yes lowmem_reserve[]: 0 0 885 885 Normal free:15508kB min:15532kB low:19412kB high:23296kB active_anon:763796kB inactive_anon:6544kB active_file:80kB inactive_file:32kB unevictable:16kB isolated(anon):0kB isolated(file):0kB present:906664kB mlocked:16kB dirty:48kB writeback:52kB mapped:6168kB shmem:27952kB slab_reclaimable:22864kB slab_unreclaimable:32800kB kernel_stack:2568kB pagetables:18968kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:234 all_unreclaimable? yes lowmem_reserve[]: 0 0 0 0 DMA: 2*4kB 2*8kB 2*16kB 1*32kB 2*64kB 1*128kB 2*256kB 2*512kB 2*1024kB 2*2048kB 1*4096kB = 12120kB DMA32: 900*4kB 1512*8kB 513*16kB 635*32kB 109*64kB 8*128kB 0*256kB 0*512kB 1*1024kB 1*2048kB 0*4096kB = 55296kB Normal: 452*4kB 363*8kB 225*16kB 139*32kB 30*64kB 4*128kB 1*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 17496kB 243783 total pagecache pages 0 pages in swap cache Swap cache stats: add 0, delete 0, find 0/0 Free swap = 0kB Total swap = 0kB 1032176 pages RAM 42789 pages reserved 553592 pages shared 943414 pages non-shared [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [ 351] 0 351 74685 1679 154 0 0 systemd-journal [ 544] 0 544 5863 107 16 0 0 bluetoothd [ 545] 0 545 88977 725 56 0 0 NetworkManager [ 546] 0 546 30170 158 15 0 0 crond [ 552] 0 552 1879 28 8 0 0 gpm [ 557] 0 557 1092 37 8 0 0 acpid [ 564] 81 564 6361 373 16 0 -900 dbus-daemon [ 566] 0 566 61331 155 22 0 0 rsyslogd [ 567] 498 567 7026 104 19 0 0 avahi-daemon [ 568] 498 568 6994 59 17 0 0 avahi-daemon [ 573] 0 573 1758 33 9 0 0 mcelog [ 578] 0 578 5925 51 16 0 0 atd [ 586] 105 586 121536 4270 56 0 0 polkitd [ 593] 0 593 21967 205 48 0 -900 modem-manager [ 601] 0 601 1087 26 8 0 0 thinkfan [ 619] 0 619 122722 1085 129 0 0 libvirtd [ 630] 32 630 4812 68 13 0 0 rpcbind [ 633] 0 633 20080 199 43 0 -1000 sshd [ 653] 29 653 5905 116 16 0 0 rpc.statd [ 700] 0 700 13173 190 28 0 0 wpa_supplicant [ 719] 0 719 4810 50 14 0 0 rpc.idmapd [ 730] 0 730 28268 36 10 0 0 rpc.rquotad [ 766] 0 766 6030 153 15 0 0 rpc.mountd [ 806] 99 806 3306 45 11 0 0 dnsmasq [ 985] 0 985 21219 150 46 0 0 login [ 988] 0 988 260408 355 48 0 0 console-kit-dae [ 1053] 11641 1053 28706 241 14 0 0 bash [ 1097] 11641 1097 27972 58 10 0 0 startx [ 1125] 11641 1125 3487 48 13 0 0 xinit [ 1126] 11641 1126 80028 35289 154 0 0 X [ 1138] 11641 1138 142989 930 122 0 0 gnome-session [ 1151] 11641 1151 4013 64 12 0 0 dbus-launch [ 1152] 11641 1152 6069 82 17 0 0 dbus-daemon [ 1154] 11641 1154 85449 162 36 0 0 at-spi-bus-laun [ 1158] 11641 1158 6103 116 17 0 0 dbus-daemon [ 1161] 11641 1161 32328 174 33 0 0 at-spi2-registr [ 1172] 11641 1172 4013 65 13 0 0 dbus-launch [ 1173] 11641 1173 6350 265 18 0 0 dbus-daemon [ 1177] 11641 1177 37416 416 29 0 0 gconfd-2 [ 1184] 11641 1184 117556 1203 44 0 0 gnome-keyring-d [ 1185] 11641 1185 224829 2236 177 0 0 gnome-settings- [ 1194] 0 1194 57227 786 46 0 0 upowerd [ 1226] 11641 1226 77392 190 36 0 0 gvfsd [ 1246] 11641 1246 118201 772 90 0 0 pulseaudio [ 1247] 496 1247 41161 59 17 0 0 rtkit-daemon [ 1252] 11641 1252 29494 205 58 0 0 gconf-helper [ 1253] 106 1253 81296 355 46 0 0 colord [ 1257] 11641 1257 59080 1574 60 0 0 openbox [ 1258] 11641 1258 185569 3216 146 0 0 gnome-panel [ 1264] 11641 1264 64102 229 27 0 0 dconf-service [ 1268] 11641 1268 139203 858 116 0 0 gnome-user-shar [ 1269] 11641 1269 268645 27442 334 0 0 pidgin [ 1270] 11641 1270 142642 1064 117 0 0 bluetooth-apple [ 1271] 11641 1271 193218 1775 175 0 0 nm-applet [ 1272] 11641 1272 220194 1810 138 0 0 gnome-sound-app [ 1285] 11641 1285 80914 632 45 0 0 gvfs-udisks2-vo [ 1287] 0 1287 88101 599 41 0 0 udisksd [ 1295] 11641 1295 177162 14140 150 0 0 wnck-applet [ 1297] 11641 1297 281043 3161 199 0 0 clock-applet [ 1299] 11641 1299 142537 1053 120 0 0 cpufreq-applet [ 1302] 11641 1302 141960 986 113 0 0 notification-ar [ 1340] 11641 1340 190026 6265 144 0 0 gnome-terminal [ 1346] 11641 1346 2123 35 10 0 0 gnome-pty-helpe [ 1347] 11641 1347 28719 253 11 0 0 bash [ 1858] 11641 1858 10895 101 27 0 0 xfconfd [ 2052] 11641 2052 28720 255 11 0 0 bash [ 6239] 11641 6239 73437 711 88 0 0 kdeinit4 [ 6240] 11641 6240 83952 717 101 0 0 klauncher [ 6242] 11641 6242 126497 1479 172 0 0 kded4 [ 6244] 11641 6244 2977 48 11 0 0 gam_server [10804] 11641 10804 101320 307 47 0 0 gvfsd-http [12175] 0 12175 27197 32 10 0 0 agetty [12249] 11641 12249 28719 252 14 0 0 bash [14862] 0 14862 51773 344 55 0 0 cupsd [14868] 4 14868 18105 158 39 0 0 cups-polld [16728] 11641 16728 28691 244 12 0 0 bash [16975] 0 16975 9109 253 23 0 -1000 systemd-udevd [17618] 0 17618 8245 87 22 0 0 systemd-logind [ 3133] 11641 3133 43721 132 40 0 0 su [ 3136] 0 3136 28564 139 12 0 0 bash [ 3983] 11641 3983 43722 134 41 0 0 su [ 3986] 0 3986 28564 144 13 0 0 bash [16350] 11641 16350 28691 245 14 0 0 bash [31228] 11641 31228 28691 245 11 0 0 bash [31922] 11641 31922 28719 250 13 0 0 bash [ 2340] 11641 2340 28691 245 15 0 0 bash [12586] 38 12586 7851 150 19 0 0 ntpd [32658] 11641 32658 41192 424 35 0 0 mc [32660] 11641 32660 28692 245 13 0 0 bash [29193] 11641 29193 713846 414344 1614 0 0 firefox [10971] 11641 10971 43722 133 43 0 0 su [10974] 0 10974 28564 132 12 0 0 bash [11343] 0 11343 28497 66 11 0 0 ksmtuned [11387] 11641 11387 28719 254 11 0 0 bash [11450] 11641 11450 28691 246 13 0 0 bash [11576] 11641 11576 43722 133 40 0 0 su [11579] 0 11579 28564 141 13 0 0 bash [12106] 11641 12106 28691 244 12 0 0 bash [12141] 11641 12141 43722 132 44 0 0 su [12144] 0 12144 28564 140 11 0 0 bash [12264] 11641 12264 28691 245 11 0 0 bash [12299] 11641 12299 43721 133 40 0 0 su [12302] 0 12302 28564 137 12 0 0 bash [26024] 11641 26024 28691 245 13 0 0 bash [26083] 11641 26083 28691 245 13 0 0 bash [28235] 11641 28235 43721 132 42 0 0 su [28238] 0 28238 28564 143 13 0 0 bash [29460] 11641 29460 43721 132 42 0 0 su [29463] 0 29463 28564 137 12 0 0 bash [29758] 11641 29758 28720 256 12 0 0 bash [29864] 11641 29864 41916 1153 36 0 0 mc [29866] 11641 29866 28728 257 11 0 0 bash [32750] 0 32750 23164 2994 47 0 0 dhclient [ 323] 0 323 24081 471 48 0 0 sendmail [ 347] 51 347 20347 367 38 0 0 sendmail [ 907] 11641 907 379562 159766 707 0 0 thunderbird [ 6340] 11641 6340 28719 251 12 0 0 bash [ 6790] 11641 6790 80307 620 101 0 0 xfce4-notifyd [ 6844] 0 6844 26669 23 9 0 0 sleep Out of memory: Kill process 29193 (firefox) score 420 or sacrifice child Killed process 29193 (firefox) total-vm:2855384kB, anon-rss:1653868kB, file-rss:3508kB [] handle_mm_fault+0x22f/0x2f0 [] __get_user_pages+0x12a/0x530 [] get_dump_page+0x45/0x60 [] elf_core_dump+0x16bd/0x1960 [] ? elf_core_dump+0x9d6/0x1960 [] ? sub_preempt_count+0x79/0xd0 [] ? mutex_unlock+0xe/0x10 [] ? do_truncate+0x73/0xa0 [] do_coredump+0xa21/0xeb0 [] ? debug_check_no_locks_freed+0xe0/0x170 [] ? trace_hardirqs_off+0xd/0x10 [] get_signal_to_deliver+0x2e1/0x960 [] do_signal+0x3f/0x9a0 [] ? pci_fixup_msi_k8t_onboard_sound+0x7d/0x97 [] ? is_prefetch.isra.15+0x1a6/0x1fd [] ? error_sti+0x5/0x6 [] ? retint_signal+0x11/0x90 [] do_notify_resume+0x80/0xb0 [] retint_signal+0x46/0x90 Mem-Info: DMA per-cpu: CPU 0: hi: 0, btch: 1 usd: 0 CPU 1: hi: 0, btch: 1 usd: 0 DMA32 per-cpu: CPU 0: hi: 186, btch: 31 usd: 0 CPU 1: hi: 186, btch: 31 usd: 30 Normal per-cpu: CPU 0: hi: 186, btch: 31 usd: 0 CPU 1: hi: 186, btch: 31 usd: 30 active_anon:900420 inactive_anon:28835 isolated_anon:0 active_file:8 inactive_file:0 isolated_file:0 unevictable:4 dirty:34 writeback:2 unstable:0 free:20724 slab_reclaimable:8641 slab_unreclaimable:10446 mapped:18325 shmem:243662 pagetables:7705 bounce:0 free_cma:0 DMA free:12120kB min:272kB low:340kB high:408kB active_anon:2892kB inactive_anon:872kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15900kB mlocked:0kB dirty:0kB writeback:0kB mapped:1672kB shmem:3596kB slab_reclaimable:0kB slab_unreclaimable:16kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes lowmem_reserve[]: 0 2951 3836 3836 DMA32 free:55404kB min:51776kB low:64720kB high:77664kB active_anon:2834992kB inactive_anon:107924kB active_file:0kB inactive_file:28kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3021968kB mlocked:0kB dirty:0kB writeback:0kB mapped:65460kB shmem:943100kB slab_reclaimable:11700kB slab_unreclaimable:8968kB kernel_stack:592kB pagetables:11852kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:129 all_unreclaimable? yes lowmem_reserve[]: 0 0 885 885 Normal free:15364kB min:15532kB low:19412kB high:23296kB active_anon:763796kB inactive_anon:6544kB active_file:0kB inactive_file:24kB unevictable:16kB isolated(anon):0kB isolated(file):0kB present:906664kB mlocked:16kB dirty:48kB writeback:52kB mapped:6168kB shmem:27952kB slab_reclaimable:22864kB slab_unreclaimable:32800kB kernel_stack:2568kB pagetables:18968kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:379 all_unreclaimable? yes lowmem_reserve[]: 0 0 0 0 DMA: 2*4kB 2*8kB 2*16kB 1*32kB 2*64kB 1*128kB 2*256kB 2*512kB 2*1024kB 2*2048kB 1*4096kB = 12120kB DMA32: 896*4kB 1512*8kB 513*16kB 635*32kB 109*64kB 8*128kB 0*256kB 0*512kB 1*1024kB 1*2048kB 0*4096kB = 55280kB Normal: 403*4kB 377*8kB 225*16kB 139*32kB 30*64kB 4*128kB 1*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 17412kB 243733 total pagecache pages rsyslogd invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0 rsyslogd cpuset=/ mems_allowed=0 Pid: 571, comm: rsyslogd Not tainted 3.7.0-rc5-00007-g95e21c5 #100 Call Trace: [] dump_header.isra.12+0x78/0x224 [] ? sub_preempt_count+0x79/0xd0 [] ? _raw_spin_unlock_irqrestore+0x42/0x80 [] ? ___ratelimit+0x9e/0x130 [] oom_kill_process+0x1d3/0x330 [] out_of_memory+0x439/0x4a0 [] __alloc_pages_nodemask+0x976/0xa40 [] ? find_get_page+0x5/0x230 [] filemap_fault+0x2d0/0x480 [] __do_fault+0x73/0x4d0 [] handle_pte_fault+0x97/0x9a0 [] ? __lock_is_held+0x5f/0x90 [] ? get_parent_ip+0x11/0x50 [] handle_mm_fault+0x22f/0x2f0 [] __do_page_fault+0x15d/0x4e0 [] ? _raw_spin_unlock+0x35/0x60 [] ? proc_reg_read+0x8c/0xc0 [] ? error_sti+0x5/0x6 [] ? trace_hardirqs_off_thunk+0x3a/0x3c [] do_page_fault+0xe/0x10 [] page_fault+0x22/0x30 Mem-Info: DMA per-cpu: CPU 0: hi: 0, btch: 1 usd: 0 CPU 1: hi: 0, btch: 1 usd: 0 DMA32 per-cpu: CPU 0: hi: 186, btch: 31 usd: 0 CPU 1: hi: 186, btch: 31 usd: 30 Normal per-cpu: CPU 0: hi: 186, btch: 31 usd: 1 CPU 1: hi: 186, btch: 31 usd: 46 active_anon:900420 inactive_anon:28835 isolated_anon:0 active_file:0 inactive_file:7 isolated_file:0 unevictable:4 dirty:0 writeback:2 unstable:0 free:20691 slab_reclaimable:8641 slab_unreclaimable:10446 mapped:18325 shmem:243662 pagetables:7705 bounce:0 free_cma:0 DMA free:12120kB min:272kB low:340kB high:408kB active_anon:2892kB inactive_anon:872kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15900kB mlocked:0kB dirty:0kB writeback:0kB mapped:1672kB shmem:3596kB slab_reclaimable:0kB slab_unreclaimable:16kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes lowmem_reserve[]: 0 2951 3836 3836 DMA32 free:55280kB min:51776kB low:64720kB high:77664kB active_anon:2834992kB inactive_anon:107924kB active_file:0kB inactive_file:12kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3021968kB mlocked:0kB dirty:0kB writeback:0kB mapped:65460kB shmem:943100kB slab_reclaimable:11700kB slab_unreclaimable:8968kB kernel_stack:592kB pagetables:11852kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:520 all_unreclaimable? yes lowmem_reserve[]: 0 0 885 885 Normal free:15364kB min:15532kB low:19412kB high:23296kB active_anon:763796kB inactive_anon:6544kB active_file:0kB inactive_file:16kB unevictable:16kB isolated(anon):0kB isolated(file):0kB present:906664kB mlocked:16kB dirty:48kB writeback:52kB mapped:6168kB shmem:27952kB slab_reclaimable:22864kB slab_unreclaimable:32800kB kernel_stack:2568kB pagetables:18968kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:571 all_unreclaimable? yes lowmem_reserve[]: 0 0 0 0 DMA: 2*4kB 2*8kB 2*16kB 1*32kB 2*64kB 1*128kB 2*256kB 2*512kB 2*1024kB 2*2048kB 1*4096kB = 12120kB DMA32: 896*4kB 1512*8kB 513*16kB 635*32kB 109*64kB 8*128kB 0*256kB 0*512kB 1*1024kB 1*2048kB 0*4096kB = 55280kB Normal: 403*4kB 377*8kB 225*16kB 139*32kB 30*64kB 4*128kB 1*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 17412kB 243733 total pagecache pages 0 pages in swap cache Swap cache stats: add 0, delete 0, find 0/0 Free swap = 0kB Total swap = 0kB 0 pages in swap cache Swap cache stats: add 0, delete 0, find 0/0 Free swap = 0kB Total swap = 0kB 1032176 pages RAM 42789 pages reserved 553579 pages shared 943538 pages non-shared 1032176 pages RAM 42789 pages reserved 553576 pages shared 943549 pages non-shared [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [ 351] 0 351 74685 1682 154 0 0 systemd-journal [ 544] 0 544 5863 107 16 0 0 bluetoothd [ 545] 0 545 88977 725 56 0 0 NetworkManager [ 546] 0 546 30170 158 15 0 0 crond [ 552] 0 552 1879 28 8 0 0 gpm [ 557] 0 557 1092 37 8 0 0 acpid [ 564] 81 564 6361 373 16 0 -900 dbus-daemon [ 566] 0 566 61331 155 22 0 0 rsyslogd [ 567] 498 567 7026 104 19 0 0 avahi-daemon [ 568] 498 568 6994 59 17 0 0 avahi-daemon [ 573] 0 573 1758 33 9 0 0 mcelog [ 578] 0 578 5925 51 16 0 0 atd [ 586] 105 586 121536 4270 56 0 0 polkitd [ 593] 0 593 21967 205 48 0 -900 modem-manager [ 601] 0 601 1087 26 8 0 0 thinkfan [ 619] 0 619 122722 1085 129 0 0 libvirtd [ 630] 32 630 4812 68 13 0 0 rpcbind [ 633] 0 633 20080 199 43 0 -1000 sshd [ 653] 29 653 5905 116 16 0 0 rpc.statd [ 700] 0 700 13173 190 28 0 0 wpa_supplicant [ 719] 0 719 4810 50 14 0 0 rpc.idmapd [ 730] 0 730 28268 36 10 0 0 rpc.rquotad [ 766] 0 766 6030 153 15 0 0 rpc.mountd [ 806] 99 806 3306 45 11 0 0 dnsmasq [ 985] 0 985 21219 150 46 0 0 login [ 988] 0 988 260408 355 48 0 0 console-kit-dae [ 1053] 11641 1053 28706 241 14 0 0 bash [ 1097] 11641 1097 27972 58 10 0 0 startx [ 1125] 11641 1125 3487 48 13 0 0 xinit [ 1126] 11641 1126 80028 35379 154 0 0 X [ 1138] 11641 1138 142989 930 122 0 0 gnome-session [ 1151] 11641 1151 4013 64 12 0 0 dbus-launch [ 1152] 11641 1152 6069 82 17 0 0 dbus-daemon [ 1154] 11641 1154 85449 162 36 0 0 at-spi-bus-laun [ 1158] 11641 1158 6103 116 17 0 0 dbus-daemon [ 1161] 11641 1161 32328 174 33 0 0 at-spi2-registr [ 1172] 11641 1172 4013 65 13 0 0 dbus-launch [ 1173] 11641 1173 6350 265 18 0 0 dbus-daemon [ 1177] 11641 1177 37416 416 29 0 0 gconfd-2 [ 1184] 11641 1184 117556 1203 44 0 0 gnome-keyring-d [ 1185] 11641 1185 224829 2236 177 0 0 gnome-settings- [ 1194] 0 1194 57227 786 46 0 0 upowerd [ 1226] 11641 1226 77392 190 36 0 0 gvfsd [ 1246] 11641 1246 118201 772 90 0 0 pulseaudio [ 1247] 496 1247 41161 59 17 0 0 rtkit-daemon [ 1252] 11641 1252 29494 205 58 0 0 gconf-helper [ 1253] 106 1253 81296 355 46 0 0 colord [ 1257] 11641 1257 59080 1574 60 0 0 openbox [ 1258] 11641 1258 185569 3216 146 0 0 gnome-panel [ 1264] 11641 1264 64102 229 27 0 0 dconf-service [ 1268] 11641 1268 139203 858 116 0 0 gnome-user-shar [ 1269] 11641 1269 268645 27442 334 0 0 pidgin [ 1270] 11641 1270 142642 1064 117 0 0 bluetooth-apple [ 1271] 11641 1271 193218 1775 175 0 0 nm-applet [ 1272] 11641 1272 220194 1810 138 0 0 gnome-sound-app [ 1285] 11641 1285 80914 632 45 0 0 gvfs-udisks2-vo [ 1287] 0 1287 88101 599 41 0 0 udisksd [ 1295] 11641 1295 177162 14140 150 0 0 wnck-applet [ 1297] 11641 1297 281043 3161 199 0 0 clock-applet [ 1299] 11641 1299 142537 1051 120 0 0 cpufreq-applet [ 1302] 11641 1302 141960 986 113 0 0 notification-ar [ 1340] 11641 1340 190026 6265 144 0 0 gnome-terminal [ 1346] 11641 1346 2123 35 10 0 0 gnome-pty-helpe [ 1347] 11641 1347 28719 253 11 0 0 bash [ 1858] 11641 1858 10895 101 27 0 0 xfconfd X: page allocation failure: order:0, mode:0x200da Pid: 1126, comm: X Not tainted 3.7.0-rc5-00007-g95e21c5 #100 [ 2052] 11641 2052 28720 255 11 0 0 bash [ 6239] 11641 6239 73437 711 88 0 0 kdeinit4 [ 6240] 11641 6240 83952 717 101 0 0 klauncher Call Trace: [] warn_alloc_failed+0xe9/0x140 [] __alloc_pages_nodemask+0x7fa/0xa40 [] shmem_getpage_gfp+0x603/0x9d0 [] ? native_sched_clock+0x26/0x90 [] shmem_fault+0x4f/0xa0 [] shm_fault+0x1e/0x20 [] __do_fault+0x73/0x4d0 [] ? generic_file_aio_write+0xb0/0x100 [] handle_pte_fault+0x97/0x9a0 [] ? __lock_is_held+0x5f/0x90 [] ? get_parent_ip+0x11/0x50 [] handle_mm_fault+0x22f/0x2f0 [] __get_user_pages+0x12a/0x530 [] get_dump_page+0x45/0x60 [] elf_core_dump+0x16bd/0x1960 [] ? elf_core_dump+0x9d6/0x1960 [] ? sub_preempt_count+0x79/0xd0 [] ? mutex_unlock+0xe/0x10 [] ? do_truncate+0x73/0xa0 [] do_coredump+0xa21/0xeb0 [] ? debug_check_no_locks_freed+0xe0/0x170 [] ? trace_hardirqs_off+0xd/0x10 [] get_signal_to_deliver+0x2e1/0x960 [] do_signal+0x3f/0x9a0 [] ? pci_fixup_msi_k8t_onboard_sound+0x7d/0x97 [] ? is_prefetch.isra.15+0x1a6/0x1fd [] ? error_sti+0x5/0x6 [] ? retint_signal+0x11/0x90 [] do_notify_resume+0x80/0xb0 [] retint_signal+0x46/0x90 Mem-Info: DMA per-cpu: CPU 0: hi: 0, btch: 1 usd: 0 CPU 1: hi: 0, btch: 1 usd: 0 DMA32 per-cpu: CPU 0: hi: 186, btch: 31 usd: 0 CPU 1: hi: 186, btch: 31 usd: 0 Normal per-cpu: CPU 0: hi: 186, btch: 31 usd: 1 CPU 1: hi: 186, btch: 31 usd: 14 active_anon:900420 inactive_anon:28978 isolated_anon:0 active_file:22 inactive_file:24 isolated_file:0 unevictable:4 dirty:5 writeback:0 unstable:0 free:20346 slab_reclaimable:8656 slab_unreclaimable:10414 mapped:18437 shmem:243751 pagetables:7717 bounce:0 free_cma:0 DMA free:12120kB min:272kB low:340kB high:408kB active_anon:2892kB inactive_anon:872kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15900kB mlocked:0kB dirty:0kB writeback:0kB mapped:1672kB shmem:3596kB slab_reclaimable:0kB slab_unreclaimable:16kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes lowmem_reserve[]: 0 2951 3836 3836 DMA32 free:55316kB min:51776kB low:64720kB high:77664kB active_anon:2834992kB inactive_anon:108408kB active_file:52kB inactive_file:56kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3021968kB mlocked:0kB dirty:20kB writeback:0kB mapped:65916kB shmem:943452kB slab_reclaimable:11716kB slab_unreclaimable:8904kB kernel_stack:488kB pagetables:11880kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:3103 all_unreclaimable? yes lowmem_reserve[]: 0 0 885 885 Normal free:13948kB min:15532kB low:19412kB high:23296kB active_anon:763796kB inactive_anon:6632kB active_file:36kB inactive_file:40kB unevictable:16kB isolated(anon):0kB isolated(file):0kB present:906664kB mlocked:16kB dirty:0kB writeback:0kB mapped:6160kB shmem:27956kB slab_reclaimable:22908kB slab_unreclaimable:32736kB kernel_stack:2352kB pagetables:18988kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:602 all_unreclaimable? yes lowmem_reserve[]: 0 0 0 0 DMA: 2*4kB 2*8kB 2*16kB 1*32kB 2*64kB 1*128kB 2*256kB 2*512kB 2*1024kB 2*2048kB 1*4096kB = 12120kB DMA32: 883*4kB 1525*8kB 513*16kB 637*32kB 109*64kB 8*128kB 0*256kB 0*512kB 1*1024kB 1*2048kB 0*4096kB = 55396kB Normal: 269*4kB 255*8kB 227*16kB 141*32kB 30*64kB 4*128kB 1*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 15996kB 243797 total pagecache pages 0 pages in swap cache Swap cache stats: add 0, delete 0, find 0/0 Free swap = 0kB Total swap = 0kB 1032176 pages RAM 42789 pages reserved 553637 pages shared 943817 pages non-shared X: page allocation failure: order:0, mode:0x200da Pid: 1126, comm: X Not tainted 3.7.0-rc5-00007-g95e21c5 #100 Call Trace: [] warn_alloc_failed+0xe9/0x140 [] __alloc_pages_nodemask+0x7fa/0xa40 [] shmem_getpage_gfp+0x603/0x9d0 [] ? native_sched_clock+0x26/0x90 [] shmem_fault+0x4f/0xa0 [] shm_fault+0x1e/0x20 [] __do_fault+0x73/0x4d0 [] handle_pte_fault+0x97/0x9a0 [] ? __lock_is_held+0x5f/0x90 [] ? get_parent_ip+0x11/0x50 [] handle_mm_fault+0x22f/0x2f0 [] __get_user_pages+0x12a/0x530 [] ? _raw_spin_unlock+0x35/0x60 [] get_dump_page+0x45/0x60 [] elf_core_dump+0x16bd/0x1960 [] ? elf_core_dump+0x9d6/0x1960 [] ? sub_preempt_count+0x79/0xd0 [] ? mutex_unlock+0xe/0x10 [] ? do_truncate+0x73/0xa0 [] do_coredump+0xa21/0xeb0 [] ? debug_check_no_locks_freed+0xe0/0x170 [] ? trace_hardirqs_off+0xd/0x10 [] get_signal_to_deliver+0x2e1/0x960 [] do_signal+0x3f/0x9a0 [] ? pci_fixup_msi_k8t_onboard_sound+0x7d/0x97 [] ? is_prefetch.isra.15+0x1a6/0x1fd [] ? error_sti+0x5/0x6 [] ? retint_signal+0x11/0x90 [] do_notify_resume+0x80/0xb0 [] retint_signal+0x46/0x90 Mem-Info: DMA per-cpu: CPU 0: hi: 0, btch: 1 usd: 0 CPU 1: hi: 0, btch: 1 usd: 0 DMA32 per-cpu: CPU 0: hi: 186, btch: 31 usd: 0 CPU 1: hi: 186, btch: 31 usd: 0 Normal per-cpu: CPU 0: hi: 186, btch: 31 usd: 1 CPU 1: hi: 186, btch: 31 usd: 24 active_anon:900420 inactive_anon:28978 isolated_anon:0 active_file:22 inactive_file:24 isolated_file:19 unevictable:4 dirty:5 writeback:0 unstable:0 free:20222 slab_reclaimable:8656 slab_unreclaimable:10414 mapped:18437 shmem:243751 pagetables:7717 bounce:0 free_cma:0 DMA free:12120kB min:272kB low:340kB high:408kB active_anon:2892kB inactive_anon:872kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15900kB mlocked:0kB dirty:0kB writeback:0kB mapped:1672kB shmem:3596kB slab_reclaimable:0kB slab_unreclaimable:16kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes lowmem_reserve[]: 0 2951 3836 3836 DMA32 free:55316kB min:51776kB low:64720kB high:77664kB active_anon:2834992kB inactive_anon:108408kB active_file:52kB inactive_file:56kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3021968kB mlocked:0kB dirty:20kB writeback:0kB mapped:65916kB shmem:943452kB slab_reclaimable:11716kB slab_unreclaimable:8904kB kernel_stack:488kB pagetables:11880kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:3940 all_unreclaimable? yes [ 6242] 11641 6242 126497 1479 172 0 0 kded4 [ 6244] 11641 6244 2977 48 11 0 0 gam_server [10804] 11641 10804 101320 307 47 0 0 gvfsd-http [12175] 0 12175 27197 32 10 0 0 agetty [12249] 11641 12249 28719 252 14 0 0 bash [14862] 0 14862 51773 344 55 0 0 cupsd [14868] 4 14868 18105 158 39 0 0 cups-polld [16728] 11641 16728 28691 244 12 0 0 bash [16975] 0 16975 9109 253 23 0 -1000 systemd-udevd [17618] 0 17618 8245 87 22 0 0 systemd-logind [ 3133] 11641 3133 43721 132 40 0 0 su [ 3136] 0 3136 28564 139 12 0 0 bash [ 3983] 11641 3983 43722 134 41 0 0 su [ 3986] 0 3986 28564 144 13 0 0 bash [16350] 11641 16350 28691 245 14 0 0 bash [31228] 11641 31228 28691 245 11 0 0 bash [31922] 11641 31922 28719 250 13 0 0 bash [ 2340] 11641 2340 28691 245 15 0 0 bash [12586] 38 12586 7851 150 19 0 0 ntpd [32658] 11641 32658 41192 424 35 0 0 mc [32660] 11641 32660 28692 245 13 0 0 bash [10971] 11641 10971 43722 133 43 0 0 su [10974] 0 10974 28564 132 12 0 0 bash [11343] 0 11343 28497 66 11 0 0 ksmtuned [11387] 11641 11387 28719 254 11 0 0 bash [11450] 11641 11450 28691 246 13 0 0 bash [11576] 11641 11576 43722 133 40 0 0 su [11579] 0 11579 28564 141 13 0 0 bash [12106] 11641 12106 28691 244 12 0 0 bash [12141] 11641 12141 43722 132 44 0 0 su [12144] 0 12144 28564 140 11 0 0 bash [12264] 11641 12264 28691 245 11 0 0 bash [12299] 11641 12299 43721 133 40 0 0 su [12302] 0 12302 28564 137 12 0 0 bash [26024] 11641 26024 28691 245 13 0 0 bash [26083] 11641 26083 28691 245 13 0 0 bash [28235] 11641 28235 43721 132 42 0 0 su [28238] 0 28238 28564 143 13 0 0 bash [29460] 11641 29460 43721 132 42 0 0 su [29463] 0 29463 28564 137 12 0 0 bash [29758] 11641 29758 28720 256 12 0 0 bash [29864] 11641 29864 41916 1153 36 0 0 mc [29866] 11641 29866 28728 257 11 0 0 bash [32750] 0 32750 23164 2994 47 0 0 dhclient [ 323] 0 323 24081 471 48 0 0 sendmail [ 347] 51 347 20347 367 38 0 0 sendmail [ 907] 11641 907 379562 159766 707 0 0 thunderbird [ 6340] 11641 6340 28719 251 12 0 0 bash [ 6790] 11641 6790 80307 620 101 0 0 xfce4-notifyd [ 6844] 0 6844 26669 23 9 0 0 sleep Out of memory: Kill process 907 (thunderbird) score 162 or sacrifice child Killed process 907 (thunderbird) total-vm:1518248kB, anon-rss:638476kB, file-rss:588kB lowmem_reserve[]: 0 0 885 885 Normal free:12832kB min:15532kB low:19412kB high:23296kB active_anon:763796kB inactive_anon:6632kB active_file:36kB inactive_file:40kB unevictable:16kB isolated(anon):0kB isolated(file):0kB present:906664kB mlocked:16kB dirty:0kB writeback:0kB mapped:6160kB shmem:27956kB slab_reclaimable:22908kB slab_unreclaimable:32736kB kernel_stack:2352kB pagetables:18988kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:1742 all_unreclaimable? yes lowmem_reserve[]: 0 0 0 0 DMA: 2*4kB 2*8kB 2*16kB 1*32kB 2*64kB 1*128kB 2*256kB 2*512kB 2*1024kB 2*2048kB 1*4096kB = 12120kB DMA32: 883*4kB 1525*8kB 513*16kB 637*32kB 109*64kB 8*128kB 0*256kB 0*512kB 1*1024kB 1*2048kB 0*4096kB = 55396kB Normal: 270*4kB 173*8kB 198*16kB 141*32kB 30*64kB 4*128kB 1*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 14880kB 243797 total pagecache pages 0 pages in swap cache Swap cache stats: add 0, delete 0, find 0/0 Free swap = 0kB Total swap = 0kB 1032176 pages RAM 42789 pages reserved 553659 pages shared 937056 pages non-shared SysRq : Emergency Sync Emergency Sync complete SysRq : Emergency Remount R/O From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752530Ab2KRTIE (ORCPT ); Sun, 18 Nov 2012 14:08:04 -0500 Received: from mail-ee0-f46.google.com ([74.125.83.46]:53523 "EHLO mail-ee0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752267Ab2KRTIC (ORCPT ); Sun, 18 Nov 2012 14:08:02 -0500 Message-ID: <50A9320D.4060700@suse.cz> Date: Sun, 18 Nov 2012 20:07:57 +0100 From: Jiri Slaby User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/17.0 Thunderbird/17.0 MIME-Version: 1.0 To: Zdenek Kabelac CC: Mel Gorman , Seth Jennings , Jiri Slaby , Valdis.Kletnieks@vt.edu, linux-mm@kvack.org, LKML , Andrew Morton , Rik van Riel , Robert Jennings Subject: Re: kswapd0: excessive CPU usage References: <20121015110937.GE29125@suse.de> <5093A3F4.8090108@redhat.com> <5093A631.5020209@suse.cz> <509422C3.1000803@suse.cz> <509C84ED.8090605@linux.vnet.ibm.com> <509CB9D1.6060704@redhat.com> <20121109090635.GG8218@suse.de> <509F6C2A.9060502@redhat.com> <20121112121956.GT8218@suse.de> <50A0F5F0.6090400@redhat.com> <20121112133139.GU8218@suse.de> <50A9304E.3020205@redhat.com> In-Reply-To: <50A9304E.3020205@redhat.com> X-Enigmail-Version: 1.5a1pre Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 11/18/2012 08:00 PM, Zdenek Kabelac wrote: > For some reason my machine went ouf of memory and OOM killed > firefox and then even whole Xsession. > > Unsure whether it's related to those 2 patches - but I've never had > such OOM failure before. As I wrote, this would be me: https://lkml.org/lkml/2012/11/15/150 There is no -next tree for Friday which would contain the set already. So for now, it should be enough for you to apply: https://lkml.org/lkml/2012/11/15/95 Or, alternatively, if you use a brand new systemd, it likes to fork bomb using udev. thanks, -- js suse labs From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753055Ab2KTJSa (ORCPT ); Tue, 20 Nov 2012 04:18:30 -0500 Received: from mx2.parallels.com ([64.131.90.16]:33285 "EHLO mx2.parallels.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752103Ab2KTJS2 (ORCPT ); Tue, 20 Nov 2012 04:18:28 -0500 Message-ID: <50AB4ADB.6090506@parallels.com> Date: Tue, 20 Nov 2012 13:18:19 +0400 From: Glauber Costa User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:16.0) Gecko/20121029 Thunderbird/16.0.2 MIME-Version: 1.0 To: Mel Gorman CC: Zdenek Kabelac , Seth Jennings , Jiri Slaby , , Jiri Slaby , , LKML , Andrew Morton , Rik van Riel , Robert Jennings Subject: Re: [PATCH] Revert "mm: remove __GFP_NO_KSWAPD" References: <20121012135726.GY29125@suse.de> <507BDD45.1070705@suse.cz> <20121015110937.GE29125@suse.de> <5093A3F4.8090108@redhat.com> <5093A631.5020209@suse.cz> <509422C3.1000803@suse.cz> <509C84ED.8090605@linux.vnet.ibm.com> <509CB9D1.6060704@redhat.com> <20121109090635.GG8218@suse.de> <509F6C2A.9060502@redhat.com> <20121112113731.GS8218@suse.de> In-Reply-To: <20121112113731.GS8218@suse.de> Content-Type: text/plain; charset="ISO-8859-15" Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 11/12/2012 03:37 PM, Mel Gorman wrote: > diff --git a/include/linux/gfp.h b/include/linux/gfp.h > index 02c1c971..d0a7967 100644 > --- a/include/linux/gfp.h > +++ b/include/linux/gfp.h > @@ -31,6 +31,7 @@ struct vm_area_struct; > #define ___GFP_THISNODE 0x40000u > #define ___GFP_RECLAIMABLE 0x80000u > #define ___GFP_NOTRACK 0x200000u > +#define ___GFP_NO_KSWAPD 0x400000u > #define ___GFP_OTHER_NODE 0x800000u > #define ___GFP_WRITE 0x1000000u Keep in mind that this bit has been reused in -mm. If this patch needs to be reverted, we'll need to first change the definition of __GFP_KMEMCG (and __GFP_BITS_SHIFT as a result), or it would break things. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752364Ab2KTPir (ORCPT ); Tue, 20 Nov 2012 10:38:47 -0500 Received: from mail-vb0-f46.google.com ([209.85.212.46]:56220 "EHLO mail-vb0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751513Ab2KTPiq (ORCPT ); Tue, 20 Nov 2012 10:38:46 -0500 MIME-Version: 1.0 In-Reply-To: <20121116200616.GK8218@suse.de> References: <20121015110937.GE29125@suse.de> <5093A3F4.8090108@redhat.com> <5093A631.5020209@suse.cz> <509422C3.1000803@suse.cz> <509C84ED.8090605@linux.vnet.ibm.com> <509CB9D1.6060704@redhat.com> <20121109090635.GG8218@suse.de> <509F6C2A.9060502@redhat.com> <20121112113731.GS8218@suse.de> <20121116200616.GK8218@suse.de> Date: Tue, 20 Nov 2012 10:38:45 -0500 Message-ID: Subject: Re: [PATCH] Revert "mm: remove __GFP_NO_KSWAPD" From: Josh Boyer To: Mel Gorman Cc: Zdenek Kabelac , Seth Jennings , Jiri Slaby , Valdis.Kletnieks@vt.edu, Jiri Slaby , linux-mm@kvack.org, LKML , Andrew Morton , Rik van Riel , Robert Jennings , Thorsten Leemhuis , bruno@wolff.to Content-Type: text/plain; charset=ISO-8859-1 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Nov 16, 2012 at 3:06 PM, Mel Gorman wrote: > On Fri, Nov 16, 2012 at 02:14:47PM -0500, Josh Boyer wrote: >> On Mon, Nov 12, 2012 at 6:37 AM, Mel Gorman wrote: >> > With "mm: vmscan: scale number of pages reclaimed by reclaim/compaction >> > based on failures" reverted, Zdenek Kabelac reported the following >> > >> > Hmm, so it's just took longer to hit the problem and observe >> > kswapd0 spinning on my CPU again - it's not as endless like before - >> > but still it easily eats minutes - it helps to turn off Firefox >> > or TB (memory hungry apps) so kswapd0 stops soon - and restart >> > those apps again. (And I still have like >1GB of cached memory) >> > >> > kswapd0 R running task 0 30 2 0x00000000 >> > ffff8801331efae8 0000000000000082 0000000000000018 0000000000000246 >> > ffff880135b9a340 ffff8801331effd8 ffff8801331effd8 ffff8801331effd8 >> > ffff880055dfa340 ffff880135b9a340 00000000331efad8 ffff8801331ee000 >> > Call Trace: >> > [] preempt_schedule+0x42/0x60 >> > [] _raw_spin_unlock+0x55/0x60 >> > [] put_super+0x31/0x40 >> > [] drop_super+0x22/0x30 >> > [] prune_super+0x149/0x1b0 >> > [] shrink_slab+0xba/0x510 >> > >> > The sysrq+m indicates the system has no swap so it'll never reclaim >> > anonymous pages as part of reclaim/compaction. That is one part of the >> > problem but not the root cause as file-backed pages could also be reclaimed. >> > >> > The likely underlying problem is that kswapd is woken up or kept awake >> > for each THP allocation request in the page allocator slow path. >> > >> > If compaction fails for the requesting process then compaction will be >> > deferred for a time and direct reclaim is avoided. However, if there >> > are a storm of THP requests that are simply rejected, it will still >> > be the the case that kswapd is awake for a prolonged period of time >> > as pgdat->kswapd_max_order is updated each time. This is noticed by >> > the main kswapd() loop and it will not call kswapd_try_to_sleep(). >> > Instead it will loopp, shrinking a small number of pages and calling >> > shrink_slab() on each iteration. >> > >> > The temptation is to supply a patch that checks if kswapd was woken for >> > THP and if so ignore pgdat->kswapd_max_order but it'll be a hack and not >> > backed up by proper testing. As 3.7 is very close to release and this is >> > not a bug we should release with, a safer path is to revert "mm: remove >> > __GFP_NO_KSWAPD" for now and revisit it with the view to ironing out the >> > balance_pgdat() logic in general. >> > >> > Signed-off-by: Mel Gorman >> >> Does anyone know if this is queued to go into 3.7 somewhere? I looked >> a bit and can't find it in a tree. We have a few reports of Fedora >> rawhide users hitting this. >> > > No, because I was waiting to hear if a) it worked and preferably if the > alternative "less safe" option worked. This close to release it might be > better to just go with the safe option. We've been tracking it in https://bugzilla.redhat.com/show_bug.cgi?id=866988 and people say this revert patch doesn't seem to make the issue go away fully. Thorsten has created another kernel with the other patch applied for testing. At least I think that is the latest status from the bug. Hopefully the commenters will chime in. josh From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753348Ab2KTQOQ (ORCPT ); Tue, 20 Nov 2012 11:14:16 -0500 Received: from wolff.to ([65.117.131.163]:57561 "HELO wolff.to" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1752621Ab2KTQOP (ORCPT ); Tue, 20 Nov 2012 11:14:15 -0500 Date: Tue, 20 Nov 2012 10:13:15 -0600 From: Bruno Wolff III To: Josh Boyer Cc: Mel Gorman , Zdenek Kabelac , Seth Jennings , Jiri Slaby , Valdis.Kletnieks@vt.edu, Jiri Slaby , linux-mm@kvack.org, LKML , Andrew Morton , Rik van Riel , Robert Jennings , Thorsten Leemhuis Subject: Re: [PATCH] Revert "mm: remove __GFP_NO_KSWAPD" Message-ID: <20121120161315.GA8338@wolff.to> References: <5093A631.5020209@suse.cz> <509422C3.1000803@suse.cz> <509C84ED.8090605@linux.vnet.ibm.com> <509CB9D1.6060704@redhat.com> <20121109090635.GG8218@suse.de> <509F6C2A.9060502@redhat.com> <20121112113731.GS8218@suse.de> <20121116200616.GK8218@suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Nov 20, 2012 at 10:38:45 -0500, Josh Boyer wrote: > >We've been tracking it in https://bugzilla.redhat.com/show_bug.cgi?id=866988 >and people say this revert patch doesn't seem to make the issue go away >fully. Thorsten has created another kernel with the other patch applied >for testing. > >At least I think that is the latest status from the bug. Hopefully the >commenters will chime in. I am seeing kswapd0 hogging a cpu right now. I have two rsyncs and an md sync running and a couple of large memory processes (java and firefox) idle. I haven't been seeing this happen as often as previously. Before doing a yum update with an rsync was pretty good at triggering the problem. Now, not so much. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752408Ab2KTRnO (ORCPT ); Tue, 20 Nov 2012 12:43:14 -0500 Received: from basicbox7.server-home.net ([195.137.212.29]:43803 "EHLO basicbox7.server-home.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751238Ab2KTRnN (ORCPT ); Tue, 20 Nov 2012 12:43:13 -0500 Message-ID: <50ABC128.80706@leemhuis.info> Date: Tue, 20 Nov 2012 18:43:04 +0100 From: Thorsten Leemhuis User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:16.0) Gecko/20121016 Thunderbird/16.0.1 MIME-Version: 1.0 To: Josh Boyer CC: Mel Gorman , Zdenek Kabelac , Seth Jennings , Jiri Slaby , Valdis.Kletnieks@vt.edu, Jiri Slaby , linux-mm@kvack.org, LKML , Andrew Morton , Rik van Riel , Robert Jennings , bruno@wolff.to Subject: Re: [PATCH] Revert "mm: remove __GFP_NO_KSWAPD" References: <20121015110937.GE29125@suse.de> <5093A3F4.8090108@redhat.com> <5093A631.5020209@suse.cz> <509422C3.1000803@suse.cz> <509C84ED.8090605@linux.vnet.ibm.com> <509CB9D1.6060704@redhat.com> <20121109090635.GG8218@suse.de> <509F6C2A.9060502@redhat.com> <20121112113731.GS8218@suse.de> <20121116200616.GK8218@suse.de> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 20.11.2012 16:38, Josh Boyer wrote: > On Fri, Nov 16, 2012 at 3:06 PM, Mel Gorman wrote: >> On Fri, Nov 16, 2012 at 02:14:47PM -0500, Josh Boyer wrote: >>> On Mon, Nov 12, 2012 at 6:37 AM, Mel Gorman wrote: >>>> With "mm: vmscan: scale number of pages reclaimed by reclaim/compaction >>>> based on failures" reverted, Zdenek Kabelac reported the following >>>> >>>> Hmm, so it's just took longer to hit the problem and observe >>>> kswapd0 spinning on my CPU again - it's not as endless like before - >>>> but still it easily eats minutes - it helps to turn off Firefox >>>> or TB (memory hungry apps) so kswapd0 stops soon - and restart >>>> those apps again. (And I still have like >1GB of cached memory) >>>> >>>> kswapd0 R running task 0 30 2 0x00000000 >>>> ffff8801331efae8 0000000000000082 0000000000000018 0000000000000246 >>>> ffff880135b9a340 ffff8801331effd8 ffff8801331effd8 ffff8801331effd8 >>>> ffff880055dfa340 ffff880135b9a340 00000000331efad8 ffff8801331ee000 >>>> Call Trace: >>>> [] preempt_schedule+0x42/0x60 >>>> [] _raw_spin_unlock+0x55/0x60 >>>> [] put_super+0x31/0x40 >>>> [] drop_super+0x22/0x30 >>>> [] prune_super+0x149/0x1b0 >>>> [] shrink_slab+0xba/0x510 >>>> >>>> The sysrq+m indicates the system has no swap so it'll never reclaim >>>> anonymous pages as part of reclaim/compaction. That is one part of the >>>> problem but not the root cause as file-backed pages could also be reclaimed. >>>> >>>> The likely underlying problem is that kswapd is woken up or kept awake >>>> for each THP allocation request in the page allocator slow path. >>>> >>>> If compaction fails for the requesting process then compaction will be >>>> deferred for a time and direct reclaim is avoided. However, if there >>>> are a storm of THP requests that are simply rejected, it will still >>>> be the the case that kswapd is awake for a prolonged period of time >>>> as pgdat->kswapd_max_order is updated each time. This is noticed by >>>> the main kswapd() loop and it will not call kswapd_try_to_sleep(). >>>> Instead it will loopp, shrinking a small number of pages and calling >>>> shrink_slab() on each iteration. >>>> >>>> The temptation is to supply a patch that checks if kswapd was woken for >>>> THP and if so ignore pgdat->kswapd_max_order but it'll be a hack and not >>>> backed up by proper testing. As 3.7 is very close to release and this is >>>> not a bug we should release with, a safer path is to revert "mm: remove >>>> __GFP_NO_KSWAPD" for now and revisit it with the view to ironing out the >>>> balance_pgdat() logic in general. >>>> >>>> Signed-off-by: Mel Gorman >>> >>> Does anyone know if this is queued to go into 3.7 somewhere? I looked >>> a bit and can't find it in a tree. We have a few reports of Fedora >>> rawhide users hitting this. >> >> No, because I was waiting to hear if a) it worked and preferably if the >> alternative "less safe" option worked. This close to release it might be >> better to just go with the safe option. > > We've been tracking it in https://bugzilla.redhat.com/show_bug.cgi?id=866988 > and people say this revert patch doesn't seem to make the issue go away > fully. Thorsten has created another kernel with the other patch applied > for testing. > > At least I think that is the latest status from the bug. Hopefully the > commenters will chime in. The short story from my current point of view is: * my main machine at home where I initially saw the issue that started this thread seems to be running fine with rc6 and the "safe" patch Mel posted in https://lkml.org/lkml/2012/11/12/113 Before that I ran a rc5 kernel with the revert that went into rc6 and the "safe" patch -- that worked fine for a few days, too. * I have a second machine where I started to use 3.7-rc kernels only yesterday (the machine triggered a bug in the radeon driver that seems to be fixed in rc6) which showed symptoms like the ones Zdenek Kabelac mentions in this thread. I wasn't able to look closer at it, but simply tried rc6 with the safe patch, which didn't help. I'm now running rc6 with the "riskier" patch from https://lkml.org/lkml/2012/11/12/151 I can't yet tell if it helps. If the problems shows up again I'll try to capture more debugging data via sysrq -- there wasn't any time for that when I was running rc6 with the safe patch, sorry. Thorsten From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752803Ab2KTUSU (ORCPT ); Tue, 20 Nov 2012 15:18:20 -0500 Received: from mail.linuxfoundation.org ([140.211.169.12]:38880 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751849Ab2KTUST (ORCPT ); Tue, 20 Nov 2012 15:18:19 -0500 Date: Tue, 20 Nov 2012 12:18:17 -0800 From: Andrew Morton To: Glauber Costa Cc: Mel Gorman , Zdenek Kabelac , Seth Jennings , Jiri Slaby , , Jiri Slaby , , LKML , Rik van Riel , Robert Jennings Subject: Re: [PATCH] Revert "mm: remove __GFP_NO_KSWAPD" Message-Id: <20121120121817.cf80b8ad.akpm@linux-foundation.org> In-Reply-To: <50AB4ADB.6090506@parallels.com> References: <20121012135726.GY29125@suse.de> <507BDD45.1070705@suse.cz> <20121015110937.GE29125@suse.de> <5093A3F4.8090108@redhat.com> <5093A631.5020209@suse.cz> <509422C3.1000803@suse.cz> <509C84ED.8090605@linux.vnet.ibm.com> <509CB9D1.6060704@redhat.com> <20121109090635.GG8218@suse.de> <509F6C2A.9060502@redhat.com> <20121112113731.GS8218@suse.de> <50AB4ADB.6090506@parallels.com> X-Mailer: Sylpheed 3.0.2 (GTK+ 2.20.1; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 20 Nov 2012 13:18:19 +0400 Glauber Costa wrote: > On 11/12/2012 03:37 PM, Mel Gorman wrote: > > diff --git a/include/linux/gfp.h b/include/linux/gfp.h > > index 02c1c971..d0a7967 100644 > > --- a/include/linux/gfp.h > > +++ b/include/linux/gfp.h > > @@ -31,6 +31,7 @@ struct vm_area_struct; > > #define ___GFP_THISNODE 0x40000u > > #define ___GFP_RECLAIMABLE 0x80000u > > #define ___GFP_NOTRACK 0x200000u > > +#define ___GFP_NO_KSWAPD 0x400000u > > #define ___GFP_OTHER_NODE 0x800000u > > #define ___GFP_WRITE 0x1000000u > > Keep in mind that this bit has been reused in -mm. > If this patch needs to be reverted, we'll need to first change > the definition of __GFP_KMEMCG (and __GFP_BITS_SHIFT as a result), or it > would break things. I presently have /* Plain integer GFP bitmasks. Do not use this directly. */ #define ___GFP_DMA 0x01u #define ___GFP_HIGHMEM 0x02u #define ___GFP_DMA32 0x04u #define ___GFP_MOVABLE 0x08u #define ___GFP_WAIT 0x10u #define ___GFP_HIGH 0x20u #define ___GFP_IO 0x40u #define ___GFP_FS 0x80u #define ___GFP_COLD 0x100u #define ___GFP_NOWARN 0x200u #define ___GFP_REPEAT 0x400u #define ___GFP_NOFAIL 0x800u #define ___GFP_NORETRY 0x1000u #define ___GFP_MEMALLOC 0x2000u #define ___GFP_COMP 0x4000u #define ___GFP_ZERO 0x8000u #define ___GFP_NOMEMALLOC 0x10000u #define ___GFP_HARDWALL 0x20000u #define ___GFP_THISNODE 0x40000u #define ___GFP_RECLAIMABLE 0x80000u #define ___GFP_KMEMCG 0x100000u #define ___GFP_NOTRACK 0x200000u #define ___GFP_NO_KSWAPD 0x400000u #define ___GFP_OTHER_NODE 0x800000u #define ___GFP_WRITE 0x1000000u and #define __GFP_BITS_SHIFT 25 /* Room for N __GFP_FOO bits */ #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1)) Which I think is OK? I'd forgotten about __GFP_BITS_SHIFT. Should we do this? --- a/include/linux/gfp.h~a +++ a/include/linux/gfp.h @@ -35,6 +35,7 @@ struct vm_area_struct; #define ___GFP_NO_KSWAPD 0x400000u #define ___GFP_OTHER_NODE 0x800000u #define ___GFP_WRITE 0x1000000u +/* If the above are modified, __GFP_BITS_SHIFT may need updating */ /* * GFP bitmasks.. _ From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753514Ab2KUIaa (ORCPT ); Wed, 21 Nov 2012 03:30:30 -0500 Received: from mx2.parallels.com ([64.131.90.16]:47925 "EHLO mx2.parallels.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751115Ab2KUIa2 (ORCPT ); Wed, 21 Nov 2012 03:30:28 -0500 Message-ID: <50AC911A.3070501@parallels.com> Date: Wed, 21 Nov 2012 12:30:18 +0400 From: Glauber Costa User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:16.0) Gecko/20121029 Thunderbird/16.0.2 MIME-Version: 1.0 To: Andrew Morton CC: Mel Gorman , Zdenek Kabelac , Seth Jennings , Jiri Slaby , , Jiri Slaby , , LKML , Rik van Riel , Robert Jennings Subject: Re: [PATCH] Revert "mm: remove __GFP_NO_KSWAPD" References: <20121012135726.GY29125@suse.de> <507BDD45.1070705@suse.cz> <20121015110937.GE29125@suse.de> <5093A3F4.8090108@redhat.com> <5093A631.5020209@suse.cz> <509422C3.1000803@suse.cz> <509C84ED.8090605@linux.vnet.ibm.com> <509CB9D1.6060704@redhat.com> <20121109090635.GG8218@suse.de> <509F6C2A.9060502@redhat.com> <20121112113731.GS8218@suse.de> <50AB4ADB.6090506@parallels.com> <20121120121817.cf80b8ad.akpm@linux-foundation.org> In-Reply-To: <20121120121817.cf80b8ad.akpm@linux-foundation.org> Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 11/21/2012 12:18 AM, Andrew Morton wrote: > On Tue, 20 Nov 2012 13:18:19 +0400 > Glauber Costa wrote: > >> On 11/12/2012 03:37 PM, Mel Gorman wrote: >>> diff --git a/include/linux/gfp.h b/include/linux/gfp.h >>> index 02c1c971..d0a7967 100644 >>> --- a/include/linux/gfp.h >>> +++ b/include/linux/gfp.h >>> @@ -31,6 +31,7 @@ struct vm_area_struct; >>> #define ___GFP_THISNODE 0x40000u >>> #define ___GFP_RECLAIMABLE 0x80000u >>> #define ___GFP_NOTRACK 0x200000u >>> +#define ___GFP_NO_KSWAPD 0x400000u >>> #define ___GFP_OTHER_NODE 0x800000u >>> #define ___GFP_WRITE 0x1000000u >> >> Keep in mind that this bit has been reused in -mm. >> If this patch needs to be reverted, we'll need to first change >> the definition of __GFP_KMEMCG (and __GFP_BITS_SHIFT as a result), or it >> would break things. > > I presently have > > /* Plain integer GFP bitmasks. Do not use this directly. */ > #define ___GFP_DMA 0x01u > #define ___GFP_HIGHMEM 0x02u > #define ___GFP_DMA32 0x04u > #define ___GFP_MOVABLE 0x08u > #define ___GFP_WAIT 0x10u > #define ___GFP_HIGH 0x20u > #define ___GFP_IO 0x40u > #define ___GFP_FS 0x80u > #define ___GFP_COLD 0x100u > #define ___GFP_NOWARN 0x200u > #define ___GFP_REPEAT 0x400u > #define ___GFP_NOFAIL 0x800u > #define ___GFP_NORETRY 0x1000u > #define ___GFP_MEMALLOC 0x2000u > #define ___GFP_COMP 0x4000u > #define ___GFP_ZERO 0x8000u > #define ___GFP_NOMEMALLOC 0x10000u > #define ___GFP_HARDWALL 0x20000u > #define ___GFP_THISNODE 0x40000u > #define ___GFP_RECLAIMABLE 0x80000u > #define ___GFP_KMEMCG 0x100000u > #define ___GFP_NOTRACK 0x200000u > #define ___GFP_NO_KSWAPD 0x400000u > #define ___GFP_OTHER_NODE 0x800000u > #define ___GFP_WRITE 0x1000000u > > and > Humm, I didn't realize there were also another free space at 0x100000u. This seems fine. > #define __GFP_BITS_SHIFT 25 /* Room for N __GFP_FOO bits */ > #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1)) > > Which I think is OK? Yes, if we haven't increased the size of the flag-space, no need to change it. > > I'd forgotten about __GFP_BITS_SHIFT. Should we do this? > > --- a/include/linux/gfp.h~a > +++ a/include/linux/gfp.h > @@ -35,6 +35,7 @@ struct vm_area_struct; > #define ___GFP_NO_KSWAPD 0x400000u > #define ___GFP_OTHER_NODE 0x800000u > #define ___GFP_WRITE 0x1000000u > +/* If the above are modified, __GFP_BITS_SHIFT may need updating */ > This is a very helpful comment. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755099Ab2KUPI5 (ORCPT ); Wed, 21 Nov 2012 10:08:57 -0500 Received: from cantor2.suse.de ([195.135.220.15]:59316 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754959Ab2KUPI4 (ORCPT ); Wed, 21 Nov 2012 10:08:56 -0500 Date: Wed, 21 Nov 2012 15:08:50 +0000 From: Mel Gorman To: Josh Boyer Cc: Zdenek Kabelac , Seth Jennings , Jiri Slaby , Valdis.Kletnieks@vt.edu, Jiri Slaby , linux-mm@kvack.org, LKML , Andrew Morton , Rik van Riel , Robert Jennings , Thorsten Leemhuis , bruno@wolff.to Subject: Re: [PATCH] Revert "mm: remove __GFP_NO_KSWAPD" Message-ID: <20121121150850.GF8218@suse.de> References: <5093A631.5020209@suse.cz> <509422C3.1000803@suse.cz> <509C84ED.8090605@linux.vnet.ibm.com> <509CB9D1.6060704@redhat.com> <20121109090635.GG8218@suse.de> <509F6C2A.9060502@redhat.com> <20121112113731.GS8218@suse.de> <20121116200616.GK8218@suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Nov 20, 2012 at 10:38:45AM -0500, Josh Boyer wrote: > On Fri, Nov 16, 2012 at 3:06 PM, Mel Gorman wrote: > > On Fri, Nov 16, 2012 at 02:14:47PM -0500, Josh Boyer wrote: > >> On Mon, Nov 12, 2012 at 6:37 AM, Mel Gorman wrote: > >> > With "mm: vmscan: scale number of pages reclaimed by reclaim/compaction > >> > based on failures" reverted, Zdenek Kabelac reported the following > >> > > >> > Hmm, so it's just took longer to hit the problem and observe > >> > kswapd0 spinning on my CPU again - it's not as endless like before - > >> > but still it easily eats minutes - it helps to turn off Firefox > >> > or TB (memory hungry apps) so kswapd0 stops soon - and restart > >> > those apps again. (And I still have like >1GB of cached memory) > >> > > >> > kswapd0 R running task 0 30 2 0x00000000 > >> > ffff8801331efae8 0000000000000082 0000000000000018 0000000000000246 > >> > ffff880135b9a340 ffff8801331effd8 ffff8801331effd8 ffff8801331effd8 > >> > ffff880055dfa340 ffff880135b9a340 00000000331efad8 ffff8801331ee000 > >> > Call Trace: > >> > [] preempt_schedule+0x42/0x60 > >> > [] _raw_spin_unlock+0x55/0x60 > >> > [] put_super+0x31/0x40 > >> > [] drop_super+0x22/0x30 > >> > [] prune_super+0x149/0x1b0 > >> > [] shrink_slab+0xba/0x510 > >> > > >> > The sysrq+m indicates the system has no swap so it'll never reclaim > >> > anonymous pages as part of reclaim/compaction. That is one part of the > >> > problem but not the root cause as file-backed pages could also be reclaimed. > >> > > >> > The likely underlying problem is that kswapd is woken up or kept awake > >> > for each THP allocation request in the page allocator slow path. > >> > > >> > If compaction fails for the requesting process then compaction will be > >> > deferred for a time and direct reclaim is avoided. However, if there > >> > are a storm of THP requests that are simply rejected, it will still > >> > be the the case that kswapd is awake for a prolonged period of time > >> > as pgdat->kswapd_max_order is updated each time. This is noticed by > >> > the main kswapd() loop and it will not call kswapd_try_to_sleep(). > >> > Instead it will loopp, shrinking a small number of pages and calling > >> > shrink_slab() on each iteration. > >> > > >> > The temptation is to supply a patch that checks if kswapd was woken for > >> > THP and if so ignore pgdat->kswapd_max_order but it'll be a hack and not > >> > backed up by proper testing. As 3.7 is very close to release and this is > >> > not a bug we should release with, a safer path is to revert "mm: remove > >> > __GFP_NO_KSWAPD" for now and revisit it with the view to ironing out the > >> > balance_pgdat() logic in general. > >> > > >> > Signed-off-by: Mel Gorman > >> > >> Does anyone know if this is queued to go into 3.7 somewhere? I looked > >> a bit and can't find it in a tree. We have a few reports of Fedora > >> rawhide users hitting this. > >> > > > > No, because I was waiting to hear if a) it worked and preferably if the > > alternative "less safe" option worked. This close to release it might be > > better to just go with the safe option. > > We've been tracking it in https://bugzilla.redhat.com/show_bug.cgi?id=866988 > and people say this revert patch doesn't seem to make the issue go away > fully. Thorsten has created another kernel with the other patch applied > for testing. > There is also a potential accounting bug that could be affecting this. https://lkml.org/lkml/2012/11/20/613 . NR_FREE_PAGES affects watermark calculations. If it's drifts too far then processes would keep entering direct reclaim and waking kswapd even if there is no need to. -- Mel Gorman SUSE Labs From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752806Ab2KWPUx (ORCPT ); Fri, 23 Nov 2012 10:20:53 -0500 Received: from basicbox7.server-home.net ([195.137.212.29]:59907 "EHLO basicbox7.server-home.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752023Ab2KWPUw (ORCPT ); Fri, 23 Nov 2012 10:20:52 -0500 Message-ID: <50AF9450.9020803@leemhuis.info> Date: Fri, 23 Nov 2012 16:20:48 +0100 From: Thorsten Leemhuis User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/17.0 Thunderbird/17.0 MIME-Version: 1.0 To: Josh Boyer CC: Mel Gorman , Zdenek Kabelac , Seth Jennings , Jiri Slaby , Valdis.Kletnieks@vt.edu, Jiri Slaby , linux-mm@kvack.org, LKML , Andrew Morton , Rik van Riel , Robert Jennings , bruno@wolff.to Subject: Re: [PATCH] Revert "mm: remove __GFP_NO_KSWAPD" References: <20121015110937.GE29125@suse.de> <5093A3F4.8090108@redhat.com> <5093A631.5020209@suse.cz> <509422C3.1000803@suse.cz> <509C84ED.8090605@linux.vnet.ibm.com> <509CB9D1.6060704@redhat.com> <20121109090635.GG8218@suse.de> <509F6C2A.9060502@redhat.com> <20121112113731.GS8218@suse.de> <20121116200616.GK8218@suse.de> <50ABC128.80706@leemhuis.info> In-Reply-To: <50ABC128.80706@leemhuis.info> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Thorsten Leemhuis wrote on 20.11.2012 18:43: > On 20.11.2012 16:38, Josh Boyer wrote: > > The short story from my current point of view is: Quick update, in case anybody is interested: > * my main machine at home where I initially saw the issue that started > this thread seems to be running fine with rc6 and the "safe" patch Mel > posted in https://lkml.org/lkml/2012/11/12/113 Before that I ran a rc5 > kernel with the revert that went into rc6 and the "safe" patch -- that > worked fine for a few days, too. On this machine I'm running a rc6 kernel + the fix for the accounting bug(¹) that went into mainline ~40 hours ago + the "riskier" patch Mel posted in https://lkml.org/lkml/2012/11/12/151 Up to now everything works fine. (¹) https://lkml.org/lkml/2012/11/21/362 > * I have a second machine where I started to use 3.7-rc kernels only > yesterday (the machine triggered a bug in the radeon driver that seems > to be fixed in rc6) which showed symptoms like the ones Zdenek Kabelac > mentions in this thread. I wasn't able to look closer at it, but simply > tried rc6 with the safe patch, which didn't help. I'm now running rc6 > with the "riskier" patch from https://lkml.org/lkml/2012/11/12/151 > I can't yet tell if it helps. If the problems shows up again I'll try to > capture more debugging data via sysrq -- there wasn't any time for that > when I was running rc6 with the safe patch, sorry. This machine is now also behaving fine with above mentioned rc6 kernel + the two patches. It seems the accounting bug was the root cause for the problems this machine showed. CU Thorsten From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753802Ab2K0LMc (ORCPT ); Tue, 27 Nov 2012 06:12:32 -0500 Received: from cantor2.suse.de ([195.135.220.15]:40379 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753364Ab2K0LMa (ORCPT ); Tue, 27 Nov 2012 06:12:30 -0500 Date: Tue, 27 Nov 2012 11:12:25 +0000 From: Mel Gorman To: Thorsten Leemhuis Cc: Josh Boyer , Zdenek Kabelac , Seth Jennings , Jiri Slaby , Valdis.Kletnieks@vt.edu, Jiri Slaby , linux-mm@kvack.org, LKML , Andrew Morton , Rik van Riel , Robert Jennings , bruno@wolff.to Subject: Re: [PATCH] Revert "mm: remove __GFP_NO_KSWAPD" Message-ID: <20121127111225.GO8218@suse.de> References: <509C84ED.8090605@linux.vnet.ibm.com> <509CB9D1.6060704@redhat.com> <20121109090635.GG8218@suse.de> <509F6C2A.9060502@redhat.com> <20121112113731.GS8218@suse.de> <20121116200616.GK8218@suse.de> <50ABC128.80706@leemhuis.info> <50AF9450.9020803@leemhuis.info> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <50AF9450.9020803@leemhuis.info> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Nov 23, 2012 at 04:20:48PM +0100, Thorsten Leemhuis wrote: > Thorsten Leemhuis wrote on 20.11.2012 18:43: > > On 20.11.2012 16:38, Josh Boyer wrote: > > > > The short story from my current point of view is: > > Quick update, in case anybody is interested: > > > * my main machine at home where I initially saw the issue that started > > this thread seems to be running fine with rc6 and the "safe" patch Mel > > posted in https://lkml.org/lkml/2012/11/12/113 Before that I ran a rc5 > > kernel with the revert that went into rc6 and the "safe" patch -- that > > worked fine for a few days, too. > > On this machine I'm running a rc6 kernel + the fix for the accounting > bug(¹) that went into mainline ~40 hours ago + the "riskier" patch Mel > posted in https://lkml.org/lkml/2012/11/12/151 > > Up to now everything works fine. > > (¹) https://lkml.org/lkml/2012/11/21/362 > That's good news, thanks for the follow up. Maybe 3.7 will not be a complete disaster with respect to THP after all this. The riskier patch was not picked up simply because it was riskier and would still be vunerable to the effective infinite loop Johannes found in kswapd. It'll all need to be revisisted. > > * I have a second machine where I started to use 3.7-rc kernels only > > yesterday (the machine triggered a bug in the radeon driver that seems > > to be fixed in rc6) which showed symptoms like the ones Zdenek Kabelac > > mentions in this thread. I wasn't able to look closer at it, but simply > > tried rc6 with the safe patch, which didn't help. I'm now running rc6 > > with the "riskier" patch from https://lkml.org/lkml/2012/11/12/151 > > I can't yet tell if it helps. If the problems shows up again I'll try to > > capture more debugging data via sysrq -- there wasn't any time for that > > when I was running rc6 with the safe patch, sorry. > > This machine is now also behaving fine with above mentioned rc6 kernel + > the two patches. It seems the accounting bug was the root cause for the > problems this machine showed. > For some yes, for others no. Others are getting stuck within effective infinite loops in kswapd and the trigger cases are different although the symptoms loop similar. Thanks again. -- Mel Gorman SUSE Labs