From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qg0-f50.google.com (mail-qg0-f50.google.com [209.85.192.50]) by kanga.kvack.org (Postfix) with ESMTP id 77CB76B0038 for ; Thu, 10 Sep 2015 17:04:20 -0400 (EDT) Received: by qgev79 with SMTP id v79so46955843qge.0 for ; Thu, 10 Sep 2015 14:04:20 -0700 (PDT) Received: from mail.linuxfoundation.org (mail.linuxfoundation.org. [140.211.169.12]) by mx.google.com with ESMTPS id b16si11753589qhc.47.2015.09.10.14.04.19 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 10 Sep 2015 14:04:19 -0700 (PDT) Date: Thu, 10 Sep 2015 14:04:18 -0700 From: Andrew Morton Subject: Re: [Bug 99471] System locks with kswapd0 and kworker taking full IO and mem Message-Id: <20150910140418.73b33d3542bab739f8fd1826@linux-foundation.org> In-Reply-To: References: Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Johannes Weiner , Mel Gorman Cc: bugzilla-daemon@bugzilla.kernel.org, linux-mm@kvack.org, gaguilar@aguilardelgado.com, sgh@sgh.dk (switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). On Tue, 01 Sep 2015 12:32:10 +0000 bugzilla-daemon@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=99471 Guys, could you take a look please? The machine went oom when there's heaps of unused swap and most memory is being used on active_anon and inactive_anon. We should have just swapped that stuff out and kept going. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-io0-f178.google.com (mail-io0-f178.google.com [209.85.223.178]) by kanga.kvack.org (Postfix) with ESMTP id CDED06B0038 for ; Sat, 12 Sep 2015 02:18:23 -0400 (EDT) Received: by iofh134 with SMTP id h134so120250703iof.0 for ; Fri, 11 Sep 2015 23:18:23 -0700 (PDT) Received: from mail-pa0-x22f.google.com (mail-pa0-x22f.google.com. [2607:f8b0:400e:c03::22f]) by mx.google.com with ESMTPS id ok17si5483076pab.94.2015.09.11.23.18.23 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 11 Sep 2015 23:18:23 -0700 (PDT) Received: by padhk3 with SMTP id hk3so93841334pad.3 for ; Fri, 11 Sep 2015 23:18:23 -0700 (PDT) Subject: Re: [Bug 99471] System locks with kswapd0 and kworker taking full IO and mem References: <20150910140418.73b33d3542bab739f8fd1826@linux-foundation.org> From: Raymond Jennings Message-ID: <55F3C3AC.6070800@gmail.com> Date: Fri, 11 Sep 2015 23:18:20 -0700 MIME-Version: 1.0 In-Reply-To: <20150910140418.73b33d3542bab739f8fd1826@linux-foundation.org> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Andrew Morton , Johannes Weiner , Mel Gorman Cc: bugzilla-daemon@bugzilla.kernel.org, linux-mm@kvack.org, gaguilar@aguilardelgado.com, sgh@sgh.dk On 09/10/15 14:04, Andrew Morton wrote: > (switched to email. Please respond via emailed reply-to-all, not via the > bugzilla web interface). > > On Tue, 01 Sep 2015 12:32:10 +0000 bugzilla-daemon@bugzilla.kernel.org wrote: > >> https://bugzilla.kernel.org/show_bug.cgi?id=99471 > Guys, could you take a look please? > > The machine went oom when there's heaps of unused swap and most memory > is being used on active_anon and inactive_anon. We should have just > swapped that stuff out and kept going. Isn't there already logic in the kernel that disables OOM if there's swap space available? I saw it once, what happened to it? > > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wi0-f177.google.com (mail-wi0-f177.google.com [209.85.212.177]) by kanga.kvack.org (Postfix) with ESMTP id D99546B0256 for ; Tue, 15 Sep 2015 04:39:26 -0400 (EDT) Received: by wicge5 with SMTP id ge5so17869781wic.0 for ; Tue, 15 Sep 2015 01:39:26 -0700 (PDT) Received: from gum.cmpxchg.org (gum.cmpxchg.org. [85.214.110.215]) by mx.google.com with ESMTPS id bd6si22752112wib.116.2015.09.15.01.39.24 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 15 Sep 2015 01:39:25 -0700 (PDT) Date: Tue, 15 Sep 2015 10:39:19 +0200 From: Johannes Weiner Subject: Re: [Bug 99471] System locks with kswapd0 and kworker taking full IO and mem Message-ID: <20150915083919.GG2858@cmpxchg.org> References: <20150910140418.73b33d3542bab739f8fd1826@linux-foundation.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150910140418.73b33d3542bab739f8fd1826@linux-foundation.org> Sender: owner-linux-mm@kvack.org List-ID: To: Andrew Morton Cc: Mel Gorman , bugzilla-daemon@bugzilla.kernel.org, linux-mm@kvack.org, gaguilar@aguilardelgado.com, sgh@sgh.dk, Rik van Riel On Thu, Sep 10, 2015 at 02:04:18PM -0700, Andrew Morton wrote: > (switched to email. Please respond via emailed reply-to-all, not via the > bugzilla web interface). > > On Tue, 01 Sep 2015 12:32:10 +0000 bugzilla-daemon@bugzilla.kernel.org wrote: > > > https://bugzilla.kernel.org/show_bug.cgi?id=99471 > > Guys, could you take a look please? > > The machine went oom when there's heaps of unused swap and most memory > is being used on active_anon and inactive_anon. We should have just > swapped that stuff out and kept going. I think we need to re-evaluate the way we balance file and anon scan pressure. It's not just the "not swapping" aspect that bugs me, it's also the fact that the machine has been thrashing page cache at full load for *minutes* before signalling the OOM. SSDs can flush and reload pages quick enough that on memory pressure there are always reclaimable cache pages and the scanner never goes after anonymous memory. If anonymous memory does not leave enough room for page cache to hold the libraries and executables, userspace goes into a state where it's mostly waiting for cache to become uptodate. It's a very frustrating problem because it's hard to even detect. One idea I had to address the LRU balance problem in the past was to always reclaim the pages in the following order: inactive file, active file, anon*. As one set becomes empty, go after the next one. If the workingset code detects cache thrashing, it depends on the refault distances what to do: if they are smaller than the active file size, deactivate; if they are bigger than that, but smaller than active file + anon, we need to start swapping to alleviate the cache thrashing. Now, if the refault distances are bigger than active file + anon, no amount of deactivating and swapping are going to stop the thrashing and we have to think about triggering OOM. But OOM is drastic and the refaults might happen at a very slow pace (or, with sparse files, not require any IO at all) and the system might be completely fine. So in addition this would require a measure of overall time spent on thrashing IO, comparable to what Tejun proposed in "[RFD] memory pressure and sizing problem", where we say if thrashing IO takes up X percent of all execution time spent, we trigger the OOM killer--not to free memory, but to reduce the tasks that contribute to the thrashing and let the remaining tasks make progress, similar to the swap token or a BSD style memory scheduler. * we can ignore the difference between inactive and active anon here as anon is not aged the same way as the file LRU is aged -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qg0-f46.google.com (mail-qg0-f46.google.com [209.85.192.46]) by kanga.kvack.org (Postfix) with ESMTP id E77CD6B0038 for ; Tue, 15 Sep 2015 08:58:07 -0400 (EDT) Received: by qgx61 with SMTP id 61so141497383qgx.3 for ; Tue, 15 Sep 2015 05:58:07 -0700 (PDT) Received: from mx1.redhat.com (mx1.redhat.com. [209.132.183.28]) by mx.google.com with ESMTPS id k89si16822676qge.7.2015.09.15.05.58.06 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 15 Sep 2015 05:58:07 -0700 (PDT) Message-ID: <55F815D2.9010804@redhat.com> Date: Tue, 15 Sep 2015 08:57:54 -0400 From: Rik van Riel MIME-Version: 1.0 Subject: Re: [Bug 99471] System locks with kswapd0 and kworker taking full IO and mem References: <20150910140418.73b33d3542bab739f8fd1826@linux-foundation.org> <20150915083919.GG2858@cmpxchg.org> In-Reply-To: <20150915083919.GG2858@cmpxchg.org> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Johannes Weiner , Andrew Morton Cc: Mel Gorman , bugzilla-daemon@bugzilla.kernel.org, linux-mm@kvack.org, gaguilar@aguilardelgado.com, sgh@sgh.dk On 09/15/2015 04:39 AM, Johannes Weiner wrote: > On Thu, Sep 10, 2015 at 02:04:18PM -0700, Andrew Morton wrote: >> (switched to email. Please respond via emailed reply-to-all, not via the >> bugzilla web interface). >> >> On Tue, 01 Sep 2015 12:32:10 +0000 bugzilla-daemon@bugzilla.kernel.org wrote: >> >>> https://bugzilla.kernel.org/show_bug.cgi?id=99471 >> >> Guys, could you take a look please? >> >> The machine went oom when there's heaps of unused swap and most memory >> is being used on active_anon and inactive_anon. We should have just >> swapped that stuff out and kept going. > > I think we need to re-evaluate the way we balance file and anon scan > pressure. It's not just the "not swapping" aspect that bugs me, it's > also the fact that the machine has been thrashing page cache at full > load for *minutes* before signalling the OOM. > > SSDs can flush and reload pages quick enough that on memory pressure > there are always reclaimable cache pages and the scanner never goes > after anonymous memory. If anonymous memory does not leave enough room > for page cache to hold the libraries and executables, userspace goes > into a state where it's mostly waiting for cache to become uptodate. > > It's a very frustrating problem because it's hard to even detect. > > One idea I had to address the LRU balance problem in the past was to > always reclaim the pages in the following order: inactive file, active > file, anon*. As one set becomes empty, go after the next one. If the > workingset code detects cache thrashing, it depends on the refault > distances what to do: if they are smaller than the active file size, > deactivate; if they are bigger than that, but smaller than active file > + anon, we need to start swapping to alleviate the cache thrashing. > > Now, if the refault distances are bigger than active file + anon, no > amount of deactivating and swapping are going to stop the thrashing > and we have to think about triggering OOM. But OOM is drastic and the > refaults might happen at a very slow pace (or, with sparse files, not > require any IO at all) and the system might be completely fine. We already measure how much the system is slowed down by waiting on IO - iowait time. It's not perfect, but it can give us some indication whether or not we are thrashing on page cache access. > So in > addition this would require a measure of overall time spent on > thrashing IO, comparable to what Tejun proposed in "[RFD] memory > pressure and sizing problem", where we say if thrashing IO takes up X > percent of all execution time spent, we trigger the OOM killer--not to > free memory, but to reduce the tasks that contribute to the thrashing > and let the remaining tasks make progress, similar to the swap token > or a BSD style memory scheduler. The BSD style process swapping only takes mapped memory into account. Page cache thrashing is pretty much ignored. Maybe we should only count page fault stalls (do we track those already?) in addition to refault distances? -- All rights reversed -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wi0-f181.google.com (mail-wi0-f181.google.com [209.85.212.181]) by kanga.kvack.org (Postfix) with ESMTP id 9894682F6B for ; Mon, 5 Oct 2015 16:03:50 -0400 (EDT) Received: by wicge5 with SMTP id ge5so136884917wic.0 for ; Mon, 05 Oct 2015 13:03:50 -0700 (PDT) Received: from mail-wi0-f179.google.com (mail-wi0-f179.google.com. [209.85.212.179]) by mx.google.com with ESMTPS id m11si18584711wij.112.2015.10.05.13.03.49 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 05 Oct 2015 13:03:49 -0700 (PDT) Received: by wicfx3 with SMTP id fx3so130256777wic.0 for ; Mon, 05 Oct 2015 13:03:49 -0700 (PDT) Date: Mon, 5 Oct 2015 22:03:46 +0200 From: Michal Hocko Subject: Re: [Bug 99471] System locks with kswapd0 and kworker taking full IO and mem Message-ID: <20151005200345.GA12889@dhcp22.suse.cz> References: <20150910140418.73b33d3542bab739f8fd1826@linux-foundation.org> <20150915083919.GG2858@cmpxchg.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150915083919.GG2858@cmpxchg.org> Sender: owner-linux-mm@kvack.org List-ID: To: Johannes Weiner , Andrew Morton , Mel Gorman , bugzilla-daemon@bugzilla.kernel.org, linux-mm@kvack.org, gaguilar@aguilardelgado.com, sgh@sgh.dk, Rik van Riel , Daniel Vetter [Sorry for replying here but I couldn't find the original Andrew's email in my mailbox] On Tue 15-09-15 10:39:19, Johannes Weiner wrote: > On Thu, Sep 10, 2015 at 02:04:18PM -0700, Andrew Morton wrote: > > (switched to email. Please respond via emailed reply-to-all, not via the > > bugzilla web interface). > > > > On Tue, 01 Sep 2015 12:32:10 +0000 bugzilla-daemon@bugzilla.kernel.org wrote: > > > > > https://bugzilla.kernel.org/show_bug.cgi?id=99471 > > > > Guys, could you take a look please? > > > > The machine went oom when there's heaps of unused swap and most memory > > is being used on active_anon and inactive_anon. We should have just > > swapped that stuff out and kept going. I would strongly suspect the memory is pinned by somebody which completely ruins all the get_scan_count assumptions. The first referenced OOM report might contain a hint: [ 2162.123944] Purging GPU memory, 368640 bytes freed, 615292928 bytes still pinned. [ 2175.996060] Purging GPU memory, 499712 bytes freed, 615251968 bytes still pinned. [ 2175.998841] bash invoked oom-killer: gfp_mask=0x20858, order=0, oom_score_adj=0 [ 2175.998844] bash cpuset=/ mems_allowed=0 [...] [ 2175.999016] active_anon:305425 inactive_anon:141206 isolated_anon:0 active_file:5109 inactive_file:4666 isolated_file:0 unevictable:4 dirty:2 writeback:0 unstable:0 free:13218 slab_reclaimable:6552 slab_unreclaimable:11310 mapped:21203 shmem:155079 pagetables:10921 bounce:0 free_cma:0 [...] [ 2175.999074] 169619 total pagecache pages [ 2175.999076] 4752 pages in swap cache [ 2175.999078] Swap cache stats: add 468915, delete 464163, find 76521/98873 [ 2175.999080] Free swap = 1615656kB [ 2175.999082] Total swap = 2097148kB [ 2175.999083] 521838 pages RAM [ 2175.999084] 0 pages HighMem/MovableOnly [ 2175.999086] 11811 pages reserved [ 2175.999087] 0 pages hwpoisoned So there is more than 600MB used by the GPU. Later OOM invocations do not mention GPU OOM shrinker at all. Anon+File+Unevict+Free+Slab+Pagetbl gives us 1.9G so considerable amount of pinned memory has to be sitting on LRU lists. I would bet it is shmem here but there is still more than 1G on the anon LRU lists. Is it possible they are pinned indirectly? I am CCing Daniel for the GPU memory consumption. Maybe there is some additional diagnostic to look at. Another interesting thing to note is that [ 2175.999473] Out of memory: Kill process 3566 (java) score 170 or sacrifice child [ 2175.999477] Killed process 3566 (java) total-vm:3417044kB, anon-rss:703656kB, file-rss:0kB [...] [ 2176.000641] bash invoked oom-killer: gfp_mask=0x20858, order=0, oom_score_adj=0 [ 2176.000644] bash cpuset=/ mems_allowed=0 [...] [ 2176.000798] active_anon:305425 inactive_anon:141206 isolated_anon:0 active_file:5109 inactive_file:4666 isolated_file:0 unevictable:4 dirty:2 writeback:0 unstable:0 free:13187 slab_reclaimable:6552 slab_unreclaimable:11310 mapped:21203 shmem:155079 pagetables:10921 bounce:0 free_cma:0 So the anon LRU lists are intact even after java has exited so something is clearly wrong the anon LRU list and it looks like a leak via elevated page ref. counting. It sounds like this is reproducible for you Gonzalo, could you invoke a crash dump and save the vmcore so that the LRU can be investigated? We would see the state after something went wrong but maybe there will be some pattern to help us. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f42.google.com (mail-wm0-f42.google.com [74.125.82.42]) by kanga.kvack.org (Postfix) with ESMTP id 65E466B0005 for ; Tue, 16 Feb 2016 17:42:03 -0500 (EST) Received: by mail-wm0-f42.google.com with SMTP id b205so131030127wmb.1 for ; Tue, 16 Feb 2016 14:42:03 -0800 (PST) Received: from mail.linuxfoundation.org (mail.linuxfoundation.org. [140.211.169.12]) by mx.google.com with ESMTPS id k9si51811540wjr.241.2016.02.16.14.42.01 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 16 Feb 2016 14:42:02 -0800 (PST) Date: Tue, 16 Feb 2016 14:41:59 -0800 From: Andrew Morton Subject: Re: [Bug 99471] System locks with kswapd0 and kworker taking full IO and mem Message-Id: <20160216144159.9335e48d65b7327984d298ac@linux-foundation.org> In-Reply-To: <20151005200345.GA12889@dhcp22.suse.cz> References: <20150910140418.73b33d3542bab739f8fd1826@linux-foundation.org> <20150915083919.GG2858@cmpxchg.org> <20151005200345.GA12889@dhcp22.suse.cz> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Michal Hocko Cc: Johannes Weiner , Mel Gorman , bugzilla-daemon@bugzilla.kernel.org, linux-mm@kvack.org, gaguilar@aguilardelgado.com, sgh@sgh.dk, Rik van Riel , Daniel Vetter , serianox@gmail.com, spam@kernelspace.de, larsnostdal@gmail.com, viktorpal@yahoo.de, shentino@gmail.com On Mon, 5 Oct 2015 22:03:46 +0200 Michal Hocko wrote: > On Tue 15-09-15 10:39:19, Johannes Weiner wrote: > > On Thu, Sep 10, 2015 at 02:04:18PM -0700, Andrew Morton wrote: > > > (switched to email. Please respond via emailed reply-to-all, not via the > > > bugzilla web interface). > > > > > > On Tue, 01 Sep 2015 12:32:10 +0000 bugzilla-daemon@bugzilla.kernel.org wrote: > > > > > > > https://bugzilla.kernel.org/show_bug.cgi?id=99471 > > > > > > Guys, could you take a look please? So this isn't fixed and a number of new reporters (cc'ed) are chiming in (let's please keep this going via email, not via the bugzilla UI!). We have various theories but I don't think we've nailed it down yet. Are any of the reporters able to come up with a set of instructions which will permit the developers to reproduce this bug locally? Can we think up a way of adding some form of debug/instrumentation to the kernel which will permit us to diagnose and fix this? It could be something which a tester manually adds or it could be something permanent, perhaps controlled via a procfs knob. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f49.google.com (mail-wm0-f49.google.com [74.125.82.49]) by kanga.kvack.org (Postfix) with ESMTP id 748706B0005 for ; Sun, 21 Feb 2016 07:36:48 -0500 (EST) Received: by mail-wm0-f49.google.com with SMTP id b205so122585363wmb.1 for ; Sun, 21 Feb 2016 04:36:48 -0800 (PST) Received: from gir.skynet.ie (gir.skynet.ie. [193.1.99.77]) by mx.google.com with ESMTPS id y133si26227545wme.72.2016.02.21.04.36.46 for (version=TLS1 cipher=AES128-SHA bits=128/128); Sun, 21 Feb 2016 04:36:47 -0800 (PST) Date: Sun, 21 Feb 2016 12:36:44 +0000 From: Mel Gorman Subject: Re: [Bug 99471] System locks with kswapd0 and kworker taking full IO and mem Message-ID: <20160221123644.GJ4537@csn.ul.ie> References: <20150910140418.73b33d3542bab739f8fd1826@linux-foundation.org> <20150915083919.GG2858@cmpxchg.org> <20151005200345.GA12889@dhcp22.suse.cz> <20160216144159.9335e48d65b7327984d298ac@linux-foundation.org> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <20160216144159.9335e48d65b7327984d298ac@linux-foundation.org> Sender: owner-linux-mm@kvack.org List-ID: To: Andrew Morton Cc: Michal Hocko , Johannes Weiner , bugzilla-daemon@bugzilla.kernel.org, linux-mm@kvack.org, gaguilar@aguilardelgado.com, sgh@sgh.dk, Rik van Riel , Daniel Vetter , serianox@gmail.com, spam@kernelspace.de, larsnostdal@gmail.com, viktorpal@yahoo.de, shentino@gmail.com On Tue, Feb 16, 2016 at 02:41:59PM -0800, Andrew Morton wrote: > On Mon, 5 Oct 2015 22:03:46 +0200 Michal Hocko wrote: > > > On Tue 15-09-15 10:39:19, Johannes Weiner wrote: > > > On Thu, Sep 10, 2015 at 02:04:18PM -0700, Andrew Morton wrote: > > > > (switched to email. Please respond via emailed reply-to-all, not via the > > > > bugzilla web interface). > > > > > > > > On Tue, 01 Sep 2015 12:32:10 +0000 bugzilla-daemon@bugzilla.kernel.org wrote: > > > > > > > > > https://bugzilla.kernel.org/show_bug.cgi?id=99471 > > > > > > > > Guys, could you take a look please? > > So this isn't fixed and a number of new reporters (cc'ed) are chiming > in (let's please keep this going via email, not via the bugzilla UI!). > > We have various theories but I don't think we've nailed it down yet. > So, I'm nowhere close to this at the moment. I was aware of at least one swapping-related problem that was introduced between 4.0 and 4.1. The commit that introduced it only affects NUMA so there is no chance they are related. However, I'll still need to chase that down early next week before considering this problem. Someone else may figure it out faster. As the problem I'm aware of is NUMA only, I took a momentary look at this. The first log shows MCE errors but they may be overheating related so I'm willing to ignore that. The log clearly states that a lot of memory is pinned by the GPU just before the OOM triggers. [ 2175.996060] Purging GPU memory, 499712 bytes freed, 615251968 bytes still pinned. So that in itself is a major problem. Next the memory usage at the time of failure was [ 2175.999016] active_anon:305425 inactive_anon:141206 isolated_anon:0 active_file:5109 inactive_file:4666 isolated_file:0 unevictable:4 dirty:2 writeback:0 unstable:0 free:13218 slab_reclaimable:6552 slab_unreclaimable:11310 mapped:21203 shmem:155079 pagetables:10921 bounce:0 free_cma:0 1.8G of anony memory usage with almost 600M of that being GPU-related. The file usage is negligible so this is looking closer to being a true OOM situation [ 2175.999080] Free swap = 1615656kB [ 2175.999082] Total swap = 2097148kB Load of swap available. The IO is likely high because files are probably being continually reclaimed and paged back in so it's thrashing. Johannes is likely correct when he says there is a problem with balancing when the storage is fast. That's one aspect of the problem but it does not explain why the problem is recent. The one major candidate I can spot is this 1da58ee2: mm: vmscan: count only dirty pages as congested That alters how and when processes are put to sleep waiting on congestion to clear. While I can see the logic behind the patch, the impact was no quantified and it can mean that kswapd is no longer throttling when it used to. Try something like this untested diff --git a/mm/vmscan.c b/mm/vmscan.c index 2aec4241b42a..50b24a022db0 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -953,8 +953,7 @@ static unsigned long shrink_page_list(struct list_head *page_list, * end of the LRU a second time. */ mapping = page_mapping(page); - if (((dirty || writeback) && mapping && - inode_write_congested(mapping->host)) || + if ((mapping && inode_write_congested(mapping->host)) || (writeback && PageReclaim(page))) nr_congested++; This is not necessary the right fix, it just may narrow down where the problem is. The problem is compounded probably by scasnning one third of the LRU before any reclaim candidates are found. Is it known if all the people reporting problems are using an i915 GPU? If so, Daniel, are you aware of any commits between 3.18 and 4.1 that would potentially pin GPU memory permanently or alternative would have busted the shrinker? -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org