From: Mel Gorman <mgorman@techsingularity.net>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>, Vlastimil Babka <vbabka@suse.cz>,
Ivan Babrou <ivan@cloudflare.com>,
Rik van Riel <riel@surriel.com>, Linux-MM <linux-mm@kvack.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 0/3] Limit runaway reclaim due to watermark boosting
Date: Wed, 26 Feb 2020 08:04:26 +0000 [thread overview]
Message-ID: <20200226080426.GA3818@techsingularity.net> (raw)
In-Reply-To: <20200225185130.6a32a8a6920d11b4c098e90e@linux-foundation.org>
On Tue, Feb 25, 2020 at 06:51:30PM -0800, Andrew Morton wrote:
> On Tue, 25 Feb 2020 14:15:31 +0000 Mel Gorman <mgorman@techsingularity.net> wrote:
>
> > Ivan Babrou reported the following
>
> http://lkml.kernel.org/r/CABWYdi1eOUD1DHORJxTsWPMT3BcZhz++xP1pXhT=x4SgxtgQZA@mail.gmail.com
> is helpful.
>
Noted for future reference.
> > Commit 1c30844d2dfe ("mm: reclaim small amounts of memory when
> > an external fragmentation event occurs") introduced undesired
> > effects in our environment.
> >
> > * NUMA with 2 x CPU
> > * 128GB of RAM
> > * THP disabled
> > * Upgraded from 4.19 to 5.4
> >
> > Before we saw free memory hover at around 1.4GB with no
> > spikes. After the upgrade we saw some machines decide that they
> > need a lot more than that, with frequent spikes above 10GB,
> > often only on a single numa node.
> >
> > There have been a few reports recently that might be watermark boost
> > related. Unfortunately, finding someone that can reproduce the problem
> > and test a patch has been problematic. This series intends to limit
> > potential damage only.
>
> It's problematic that we don't understand what's happening. And these
> palliatives can only reduce our ability to do that.
>
Not for certain no, but we do know that there are conditions whereby
node 0 can end up reclaiming excessively for extended periods of time.
The available evidence does match a pattern whereby a lower zone on node
0 is getting stuck in a boosted state.
> Rik seems to have the means to reproduce this (or something similar)
> and it seems Ivan can test patches three weeks hence.
If Rik can reproduce it great but I have a strong feeling that Ivan may
never be able to test this if it requires a production machine which is
why I did not wait the three weeks.
> So how about a
> debug patch which will help figure out what's going on in there?
A debug patch would not help much in this case given that we
have tracepoints. An ftrace containing mm_page_alloc_extfrag,
mm_vmscan_kswapd_wake, mm_vmscan_wakeup_kswapd and
mm_vmscan_node_reclaim_begin would be a big help for 30 seconds while the
problem is occurring would work. Ideally mm_vmscan_lru_shrink_inactive
would also be included to capture the priority but the size of the trace
is what's going to be problematic.
mm_page_alloc_extfrag would be correlated with the conditions that boost
the watermarks and the others would track what kswapd is doing to see if
it's persistently reclaiming. If they are, mm_vmscan_lru_shrink_inactive
would tell if it's persistently reclaiming at priority DEF_PRIORITY - 2
which would prove the patch would at least mitigate the problem.
It would be more preferable to have a description of a testcase that
reproduces the problem and I'll capture/analyse the trace myself.
It would also be something I could slot into a test grid to catch the
problem happening again in the future.
--
Mel Gorman
SUSE Labs
prev parent reply other threads:[~2020-02-26 8:04 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-02-25 14:15 [PATCH 0/3] Limit runaway reclaim due to watermark boosting Mel Gorman
2020-02-25 14:15 ` [PATCH 1/3] mm, page_alloc: Disable boosted watermark based reclaim on low-memory systems Mel Gorman
2020-02-25 14:15 ` [PATCH 2/3] mm, page_alloc: Disable watermark boosting if THP is disabled at boot Mel Gorman
2020-02-26 1:32 ` David Rientjes
2020-02-26 8:07 ` Mel Gorman
2020-02-25 14:15 ` [PATCH 3/3] mm, vmscan: Do not reclaim for boosted watermarks at high priority Mel Gorman
2020-02-26 2:51 ` [PATCH 0/3] Limit runaway reclaim due to watermark boosting Andrew Morton
2020-02-26 8:04 ` Mel Gorman [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200226080426.GA3818@techsingularity.net \
--to=mgorman@techsingularity.net \
--cc=akpm@linux-foundation.org \
--cc=ivan@cloudflare.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=riel@surriel.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).