linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Mel Gorman <mgorman@techsingularity.net>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>, Vlastimil Babka <vbabka@suse.cz>,
	Ivan Babrou <ivan@cloudflare.com>,
	Rik van Riel <riel@surriel.com>, Linux-MM <linux-mm@kvack.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 0/3] Limit runaway reclaim due to watermark boosting
Date: Wed, 26 Feb 2020 08:04:26 +0000	[thread overview]
Message-ID: <20200226080426.GA3818@techsingularity.net> (raw)
In-Reply-To: <20200225185130.6a32a8a6920d11b4c098e90e@linux-foundation.org>

On Tue, Feb 25, 2020 at 06:51:30PM -0800, Andrew Morton wrote:
> On Tue, 25 Feb 2020 14:15:31 +0000 Mel Gorman <mgorman@techsingularity.net> wrote:
> 
> > Ivan Babrou reported the following
> 
> http://lkml.kernel.org/r/CABWYdi1eOUD1DHORJxTsWPMT3BcZhz++xP1pXhT=x4SgxtgQZA@mail.gmail.com
> is helpful.
> 

Noted for future reference.

> > 	Commit 1c30844d2dfe ("mm: reclaim small amounts of memory when
> > 	an external fragmentation event occurs") introduced undesired
> > 	effects in our environment.
> > 
> > 	  * NUMA with 2 x CPU
> > 	  * 128GB of RAM
> > 	  * THP disabled
> > 	  * Upgraded from 4.19 to 5.4
> > 
> > 	Before we saw free memory hover at around 1.4GB with no
> > 	spikes. After the upgrade we saw some machines decide that they
> > 	need a lot more than that, with frequent spikes above 10GB,
> > 	often only on a single numa node.
> > 
> > There have been a few reports recently that might be watermark boost
> > related. Unfortunately, finding someone that can reproduce the problem
> > and test a patch has been problematic.  This series intends to limit
> > potential damage only.
> 
> It's problematic that we don't understand what's happening.  And these
> palliatives can only reduce our ability to do that.
> 

Not for certain no, but we do know that there are conditions whereby
node 0 can end up reclaiming excessively for extended periods of time.
The available evidence does match a pattern whereby a lower zone on node
0 is getting stuck in a boosted state.

> Rik seems to have the means to reproduce this (or something similar)
> and it seems Ivan can test patches three weeks hence. 

If Rik can reproduce it great but I have a strong feeling that Ivan may
never be able to test this if it requires a production machine which is
why I did not wait the three weeks.

> So how about a
> debug patch which will help figure out what's going on in there?

A debug patch would not help much in this case given that we
have tracepoints. An ftrace containing mm_page_alloc_extfrag,
mm_vmscan_kswapd_wake, mm_vmscan_wakeup_kswapd and
mm_vmscan_node_reclaim_begin would be a big help for 30 seconds while the
problem is occurring would work. Ideally mm_vmscan_lru_shrink_inactive
would also be included to capture the priority but the size of the trace
is what's going to be problematic.

mm_page_alloc_extfrag would be correlated with the conditions that boost
the watermarks and the others would track what kswapd is doing to see if
it's persistently reclaiming. If they are, mm_vmscan_lru_shrink_inactive
would tell if it's persistently reclaiming at priority DEF_PRIORITY - 2
which would prove the patch would at least mitigate the problem.

It would be more preferable to have a description of a testcase that
reproduces the problem and I'll capture/analyse the trace myself.
It would also be something I could slot into a test grid to catch the
problem happening again in the future.

-- 
Mel Gorman
SUSE Labs


      reply	other threads:[~2020-02-26  8:04 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-25 14:15 [PATCH 0/3] Limit runaway reclaim due to watermark boosting Mel Gorman
2020-02-25 14:15 ` [PATCH 1/3] mm, page_alloc: Disable boosted watermark based reclaim on low-memory systems Mel Gorman
2020-02-25 14:15 ` [PATCH 2/3] mm, page_alloc: Disable watermark boosting if THP is disabled at boot Mel Gorman
2020-02-26  1:32   ` David Rientjes
2020-02-26  8:07     ` Mel Gorman
2020-02-25 14:15 ` [PATCH 3/3] mm, vmscan: Do not reclaim for boosted watermarks at high priority Mel Gorman
2020-02-26  2:51 ` [PATCH 0/3] Limit runaway reclaim due to watermark boosting Andrew Morton
2020-02-26  8:04   ` Mel Gorman [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200226080426.GA3818@techsingularity.net \
    --to=mgorman@techsingularity.net \
    --cc=akpm@linux-foundation.org \
    --cc=ivan@cloudflare.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=riel@surriel.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).