From: Mel Gorman <mgorman@techsingularity.net>
To: howaboutsynergy@protonmail.com
Cc: Andrew Morton <akpm@linux-foundation.org>,
"bugzilla-daemon@bugzilla.kernel.org"
<bugzilla-daemon@bugzilla.kernel.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>
Subject: Re: [Bug 204165] New: 100% CPU usage in compact_zone_order
Date: Wed, 17 Jul 2019 18:53:32 +0100 [thread overview]
Message-ID: <20190717175332.GC24383@techsingularity.net> (raw)
In-Reply-To: <xZGQeie9gbbIEm7ZciNh3PrdV8kTu-SE7KtUYV3cloMCUEdzB7taS5BcTzSUSaThu5_ftcRjr3sYcQB1c9dVPX3i1kQ2eP-xjKvFIpT7wZs=@protonmail.com>
On Tue, Jul 16, 2019 at 07:15:08PM +0000, howaboutsynergy@protonmail.com wrote:
> On Tuesday, July 16, 2019 12:03 PM, Mel Gorman <mgorman@techsingularity.net> wrote:
> > I tried reproducing this but after 300 attempts with various parameters
> > and adding other workloads in the background, I was unable to reproduce
> > the problem.
> >
>
>
> The third time I ran this command `$ time stress -m 220 --vm-bytes 10000000000 --timeout 10`, got 10+ hung:
>
> PID %CPU COMMAND PR NI VIRT RES S USER
> 3785 94.5 stress 20 0 9769416 4 R user
> 3777 87.3 stress 20 0 9769416 4 R user
> 3923 85.5 stress 20 0 9769416 4 R user
> 3937 85.5 stress 20 0 9769416 4 R user
> 3943 81.8 stress 20 0 9769416 4 R user
> 3885 80.0 stress 20 0 9769416 4 R user
> 3970 80.0 stress 20 0 9769416 4 R user
>
> <SNIP>
>
> trace.dat is 1.3G
> -rw-r--r-- 1 root root 1326219264 16.07.2019 20:45 trace.dat
>
Ok, great. From the trace, it was obvious that the scanner is making no
progress. I don't think zswap is involved as such but it *may* be making
it easier to trigger due to altering timing. At least, I see no reason
why zswap would materially affect the termination conditions.
From the path and your trace, I think what *might* be happening is that
a fatal signal is pending which does not advance the scanner or look like
a proper abort. I think it ends up looping in compaction instead of dying
without either aborting or progressing the scanner. It might explain why
stress-ng is hitting is as it is probably sending fatal signals on timeout
(I didn't check the source).
Can you try this (compile tested only) patch please? Note that the stress
test might still take time to exit normally if it's stuck in a swap
storm of some sort but I'm hoping the 100% compaction CPU usage goes away
at least.
diff --git a/mm/compaction.c b/mm/compaction.c
index 9e1b9acb116b..952dc2fb24e5 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -842,13 +842,15 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
/*
* Periodically drop the lock (if held) regardless of its
- * contention, to give chance to IRQs. Abort async compaction
- * if contended.
+ * contention, to give chance to IRQs. Abort completely if
+ * a fatal signal is pending.
*/
if (!(low_pfn % SWAP_CLUSTER_MAX)
&& compact_unlock_should_abort(&pgdat->lru_lock,
- flags, &locked, cc))
- break;
+ flags, &locked, cc)) {
+ low_pfn = 0;
+ goto fatal_pending;
+ }
if (!pfn_valid_within(low_pfn))
goto isolate_fail;
@@ -1060,6 +1062,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
trace_mm_compaction_isolate_migratepages(start_pfn, low_pfn,
nr_scanned, nr_isolated);
+fatal_pending:
cc->total_migrate_scanned += nr_scanned;
if (nr_isolated)
count_compact_events(COMPACTISOLATED, nr_isolated);
--
Mel Gorman
SUSE Labs
next prev parent reply other threads:[~2019-07-17 17:53 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <bug-204165-27@https.bugzilla.kernel.org/>
2019-07-15 21:25 ` [Bug 204165] New: 100% CPU usage in compact_zone_order Andrew Morton
2019-07-15 23:28 ` howaboutsynergy
2019-07-16 1:52 ` howaboutsynergy
2019-07-16 3:25 ` howaboutsynergy
2019-07-16 3:57 ` howaboutsynergy
2019-07-16 7:11 ` Mel Gorman
2019-07-16 19:15 ` howaboutsynergy
2019-07-17 17:53 ` Mel Gorman [this message]
2019-07-17 22:00 ` howaboutsynergy
2019-07-18 8:37 ` Mel Gorman
2019-07-16 10:03 ` Mel Gorman
2019-07-16 1:08 ` howaboutsynergy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190717175332.GC24383@techsingularity.net \
--to=mgorman@techsingularity.net \
--cc=akpm@linux-foundation.org \
--cc=bugzilla-daemon@bugzilla.kernel.org \
--cc=howaboutsynergy@protonmail.com \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).