linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Mel Gorman <mgorman@techsingularity.net>
To: howaboutsynergy@protonmail.com
Cc: Andrew Morton <akpm@linux-foundation.org>,
	"bugzilla-daemon@bugzilla.kernel.org"
	<bugzilla-daemon@bugzilla.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>
Subject: Re: [Bug 204165] New: 100% CPU usage in compact_zone_order
Date: Wed, 17 Jul 2019 18:53:32 +0100	[thread overview]
Message-ID: <20190717175332.GC24383@techsingularity.net> (raw)
In-Reply-To: <xZGQeie9gbbIEm7ZciNh3PrdV8kTu-SE7KtUYV3cloMCUEdzB7taS5BcTzSUSaThu5_ftcRjr3sYcQB1c9dVPX3i1kQ2eP-xjKvFIpT7wZs=@protonmail.com>

On Tue, Jul 16, 2019 at 07:15:08PM +0000, howaboutsynergy@protonmail.com wrote:
> On Tuesday, July 16, 2019 12:03 PM, Mel Gorman <mgorman@techsingularity.net> wrote:
> > I tried reproducing this but after 300 attempts with various parameters
> > and adding other workloads in the background, I was unable to reproduce
> > the problem.
> > 
> 
> 
> The third time I ran this command `$ time stress -m 220 --vm-bytes 10000000000 --timeout 10`, got 10+ hung:
> 
>   PID  %CPU COMMAND                                                                            PR  NI    VIRT    RES S USER     
>  3785  94.5 stress                                                                             20   0 9769416      4 R user     
>  3777  87.3 stress                                                                             20   0 9769416      4 R user     
>  3923  85.5 stress                                                                             20   0 9769416      4 R user     
>  3937  85.5 stress                                                                             20   0 9769416      4 R user     
>  3943  81.8 stress                                                                             20   0 9769416      4 R user     
>  3885  80.0 stress                                                                             20   0 9769416      4 R user     
>  3970  80.0 stress                                                                             20   0 9769416      4 R user     
>
> <SNIP>
>
> trace.dat is 1.3G
> -rw-r--r--  1 root root 1326219264 16.07.2019 20:45 trace.dat
> 

Ok, great. From the trace, it was obvious that the scanner is making no
progress. I don't think zswap is involved as such but it *may* be making
it easier to trigger due to altering timing. At least, I see no reason
why zswap would materially affect the termination conditions.

From the path and your trace, I think what *might* be happening is that
a fatal signal is pending which does not advance the scanner or look like
a proper abort. I think it ends up looping in compaction instead of dying
without either aborting or progressing the scanner.  It might explain why
stress-ng is hitting is as it is probably sending fatal signals on timeout
(I didn't check the source).

Can you try this (compile tested only) patch please? Note that the stress
test might still take time to exit normally if it's stuck in a swap
storm of some sort but I'm hoping the 100% compaction CPU usage goes away
at least.

diff --git a/mm/compaction.c b/mm/compaction.c
index 9e1b9acb116b..952dc2fb24e5 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -842,13 +842,15 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 
 		/*
 		 * Periodically drop the lock (if held) regardless of its
-		 * contention, to give chance to IRQs. Abort async compaction
-		 * if contended.
+		 * contention, to give chance to IRQs. Abort completely if
+		 * a fatal signal is pending.
 		 */
 		if (!(low_pfn % SWAP_CLUSTER_MAX)
 		    && compact_unlock_should_abort(&pgdat->lru_lock,
-					    flags, &locked, cc))
-			break;
+					    flags, &locked, cc)) {
+			low_pfn = 0;
+			goto fatal_pending;
+		}
 
 		if (!pfn_valid_within(low_pfn))
 			goto isolate_fail;
@@ -1060,6 +1062,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 	trace_mm_compaction_isolate_migratepages(start_pfn, low_pfn,
 						nr_scanned, nr_isolated);
 
+fatal_pending:
 	cc->total_migrate_scanned += nr_scanned;
 	if (nr_isolated)
 		count_compact_events(COMPACTISOLATED, nr_isolated);

-- 
Mel Gorman
SUSE Labs


  reply	other threads:[~2019-07-17 17:53 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <bug-204165-27@https.bugzilla.kernel.org/>
2019-07-15 21:25 ` [Bug 204165] New: 100% CPU usage in compact_zone_order Andrew Morton
2019-07-15 23:28   ` howaboutsynergy
2019-07-16  1:52     ` howaboutsynergy
2019-07-16  3:25       ` howaboutsynergy
2019-07-16  3:57         ` howaboutsynergy
2019-07-16  7:11           ` Mel Gorman
2019-07-16 19:15             ` howaboutsynergy
2019-07-17 17:53               ` Mel Gorman [this message]
2019-07-17 22:00                 ` howaboutsynergy
2019-07-18  8:37                   ` Mel Gorman
2019-07-16 10:03           ` Mel Gorman
2019-07-16  1:08   ` howaboutsynergy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190717175332.GC24383@techsingularity.net \
    --to=mgorman@techsingularity.net \
    --cc=akpm@linux-foundation.org \
    --cc=bugzilla-daemon@bugzilla.kernel.org \
    --cc=howaboutsynergy@protonmail.com \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).