All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mel Gorman <mgorman@techsingularity.net>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>, Vlastimil Babka <vbabka@suse.cz>,
	Ivan Babrou <ivan@cloudflare.com>,
	Rik van Riel <riel@surriel.com>, Linux-MM <linux-mm@kvack.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 0/3] Limit runaway reclaim due to watermark boosting
Date: Wed, 26 Feb 2020 08:04:26 +0000	[thread overview]
Message-ID: <20200226080426.GA3818@techsingularity.net> (raw)
In-Reply-To: <20200225185130.6a32a8a6920d11b4c098e90e@linux-foundation.org>

On Tue, Feb 25, 2020 at 06:51:30PM -0800, Andrew Morton wrote:
> On Tue, 25 Feb 2020 14:15:31 +0000 Mel Gorman <mgorman@techsingularity.net> wrote:
> 
> > Ivan Babrou reported the following
> 
> http://lkml.kernel.org/r/CABWYdi1eOUD1DHORJxTsWPMT3BcZhz++xP1pXhT=x4SgxtgQZA@mail.gmail.com
> is helpful.
> 

Noted for future reference.

> > 	Commit 1c30844d2dfe ("mm: reclaim small amounts of memory when
> > 	an external fragmentation event occurs") introduced undesired
> > 	effects in our environment.
> > 
> > 	  * NUMA with 2 x CPU
> > 	  * 128GB of RAM
> > 	  * THP disabled
> > 	  * Upgraded from 4.19 to 5.4
> > 
> > 	Before we saw free memory hover at around 1.4GB with no
> > 	spikes. After the upgrade we saw some machines decide that they
> > 	need a lot more than that, with frequent spikes above 10GB,
> > 	often only on a single numa node.
> > 
> > There have been a few reports recently that might be watermark boost
> > related. Unfortunately, finding someone that can reproduce the problem
> > and test a patch has been problematic.  This series intends to limit
> > potential damage only.
> 
> It's problematic that we don't understand what's happening.  And these
> palliatives can only reduce our ability to do that.
> 

Not for certain no, but we do know that there are conditions whereby
node 0 can end up reclaiming excessively for extended periods of time.
The available evidence does match a pattern whereby a lower zone on node
0 is getting stuck in a boosted state.

> Rik seems to have the means to reproduce this (or something similar)
> and it seems Ivan can test patches three weeks hence. 

If Rik can reproduce it great but I have a strong feeling that Ivan may
never be able to test this if it requires a production machine which is
why I did not wait the three weeks.

> So how about a
> debug patch which will help figure out what's going on in there?

A debug patch would not help much in this case given that we
have tracepoints. An ftrace containing mm_page_alloc_extfrag,
mm_vmscan_kswapd_wake, mm_vmscan_wakeup_kswapd and
mm_vmscan_node_reclaim_begin would be a big help for 30 seconds while the
problem is occurring would work. Ideally mm_vmscan_lru_shrink_inactive
would also be included to capture the priority but the size of the trace
is what's going to be problematic.

mm_page_alloc_extfrag would be correlated with the conditions that boost
the watermarks and the others would track what kswapd is doing to see if
it's persistently reclaiming. If they are, mm_vmscan_lru_shrink_inactive
would tell if it's persistently reclaiming at priority DEF_PRIORITY - 2
which would prove the patch would at least mitigate the problem.

It would be more preferable to have a description of a testcase that
reproduces the problem and I'll capture/analyse the trace myself.
It would also be something I could slot into a test grid to catch the
problem happening again in the future.

-- 
Mel Gorman
SUSE Labs


      reply	other threads:[~2020-02-26  8:04 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-25 14:15 [PATCH 0/3] Limit runaway reclaim due to watermark boosting Mel Gorman
2020-02-25 14:15 ` [PATCH 1/3] mm, page_alloc: Disable boosted watermark based reclaim on low-memory systems Mel Gorman
2020-02-25 14:15 ` [PATCH 2/3] mm, page_alloc: Disable watermark boosting if THP is disabled at boot Mel Gorman
2020-02-26  1:32   ` David Rientjes
2020-02-26  8:07     ` Mel Gorman
2020-02-25 14:15 ` [PATCH 3/3] mm, vmscan: Do not reclaim for boosted watermarks at high priority Mel Gorman
2020-02-26  2:51 ` [PATCH 0/3] Limit runaway reclaim due to watermark boosting Andrew Morton
2020-02-26  8:04   ` Mel Gorman [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200226080426.GA3818@techsingularity.net \
    --to=mgorman@techsingularity.net \
    --cc=akpm@linux-foundation.org \
    --cc=ivan@cloudflare.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=riel@surriel.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.