From: Mel Gorman <mgorman@suse.de>
To: Zdenek Kabelac <zkabelac@redhat.com>
Cc: Seth Jennings <sjenning@linux.vnet.ibm.com>,
Jiri Slaby <jslaby@suse.cz>,
Valdis.Kletnieks@vt.edu, Jiri Slaby <jirislaby@gmail.com>,
linux-mm@kvack.org, LKML <linux-kernel@vger.kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
Rik van Riel <riel@redhat.com>,
Robert Jennings <rcj@linux.vnet.ibm.com>
Subject: Re: kswapd0: excessive CPU usage
Date: Mon, 12 Nov 2012 12:19:57 +0000 [thread overview]
Message-ID: <20121112121956.GT8218@suse.de> (raw)
In-Reply-To: <509F6C2A.9060502@redhat.com>
On Sun, Nov 11, 2012 at 10:13:14AM +0100, Zdenek Kabelac wrote:
> Hmm, so it's just took longer to hit the problem and observe kswapd0
> spinning on my CPU again - it's not as endless like before - but
> still it easily eats minutes - it helps to turn off Firefox or TB
> (memory hungry apps) so kswapd0 stops soon - and restart those apps
> again.
> (And I still have like >1GB of cached memory)
>
I posted a "safe" patch that I believe explains why you are seeing what
you are seeing. It does mean that there will still be some stalls due to
THP because kswapd is not helping and it's avoiding the problem rather
than trying to deal with it.
Hence, I'm also going to post this patch even though I have not tested
it myself. If you find it fixes the problem then it would be a
preferable patch to the revert. It still is the case that the
balance_pgdat() logic is in sort need of a rethink as it's pretty
twisted right now.
Thanks
---8<---
mm: Avoid waking kswapd for THP allocations when compaction is deferred or contended
With "mm: vmscan: scale number of pages reclaimed by reclaim/compaction
based on failures" reverted, Zdenek Kabelac reported the following
Hmm, so it's just took longer to hit the problem and observe
kswapd0 spinning on my CPU again - it's not as endless like before -
but still it easily eats minutes - it helps to turn off Firefox
or TB (memory hungry apps) so kswapd0 stops soon - and restart
those apps again. (And I still have like >1GB of cached memory)
kswapd0 R running task 0 30 2 0x00000000
ffff8801331efae8 0000000000000082 0000000000000018 0000000000000246
ffff880135b9a340 ffff8801331effd8 ffff8801331effd8 ffff8801331effd8
ffff880055dfa340 ffff880135b9a340 00000000331efad8 ffff8801331ee000
Call Trace:
[<ffffffff81555bf2>] preempt_schedule+0x42/0x60
[<ffffffff81557a95>] _raw_spin_unlock+0x55/0x60
[<ffffffff81192971>] put_super+0x31/0x40
[<ffffffff81192a42>] drop_super+0x22/0x30
[<ffffffff81193b89>] prune_super+0x149/0x1b0
[<ffffffff81141e2a>] shrink_slab+0xba/0x510
The sysrq+m indicates the system has no swap so it'll never reclaim
anonymous pages as part of reclaim/compaction. That is one part of the
problem but not the root cause as file-backed pages could also be reclaimed.
The likely underlying problem is that kswapd is woken up or kept awake
for each THP allocation request in the page allocator slow path.
If compaction fails for the requesting process then compaction will be
deferred for a time and direct reclaim is avoided. However, if there
are a storm of THP requests that are simply rejected, it will still
be the the case that kswapd is awake for a prolonged period of time
as pgdat->kswapd_max_order is updated each time. This is noticed by
the main kswapd() loop and it will not call kswapd_try_to_sleep().
Instead it will loopp, shrinking a small number of pages and calling
shrink_slab() on each iteration.
This patch defers when kswapd gets woken up for THP allocations. For !THP
allocations, kswapd is always woken up. For THP allocations, kswapd is
woken up iff the process is willing to enter into direct
reclaim/compaction.
Signed-off-by: Mel Gorman <mgorman@suse.de>
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index bb90971..0b469b4 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2378,6 +2378,15 @@ bool gfp_pfmemalloc_allowed(gfp_t gfp_mask)
return !!(gfp_to_alloc_flags(gfp_mask) & ALLOC_NO_WATERMARKS);
}
+/* Returns true if the allocation is likely for THP */
+static bool is_thp_alloc(gfp_t gfp_mask, unsigned int order)
+{
+ if (order == pageblock_order &&
+ (gfp_mask & (__GFP_MOVABLE|__GFP_REPEAT)) == __GFP_MOVABLE)
+ return true;
+ return false;
+}
+
static inline struct page *
__alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
struct zonelist *zonelist, enum zone_type high_zoneidx,
@@ -2416,7 +2425,9 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
goto nopage;
restart:
- wake_all_kswapd(order, zonelist, high_zoneidx,
+ /* The decision whether to wake kswapd for THP is made later */
+ if (!is_thp_alloc(gfp_mask, order))
+ wake_all_kswapd(order, zonelist, high_zoneidx,
zone_idx(preferred_zone));
/*
@@ -2487,15 +2498,21 @@ rebalance:
goto got_pg;
sync_migration = true;
- /*
- * If compaction is deferred for high-order allocations, it is because
- * sync compaction recently failed. In this is the case and the caller
- * requested a movable allocation that does not heavily disrupt the
- * system then fail the allocation instead of entering direct reclaim.
- */
- if ((deferred_compaction || contended_compaction) &&
- (gfp_mask & (__GFP_MOVABLE|__GFP_REPEAT)) == __GFP_MOVABLE)
- goto nopage;
+ if (is_thp_alloc(gfp_mask, order)) {
+ /*
+ * If compaction is deferred for high-order allocations, it is
+ * because sync compaction recently failed. In this is the case
+ * and the caller requested a movable allocation that does not
+ * heavily disrupt the system then fail the allocation instead
+ * of entering direct reclaim.
+ */
+ if (deferred_compaction || contended_compaction)
+ goto nopage;
+
+ /* If process is willing to reclaim/compact then wake kswapd */
+ wake_all_kswapd(order, zonelist, high_zoneidx,
+ zone_idx(preferred_zone));
+ }
/* Try direct reclaim and then allocating */
page = __alloc_pages_direct_reclaim(gfp_mask, order,
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2012-11-12 12:20 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-10-11 8:52 kswapd0: wxcessive CPU usage Jiri Slaby
2012-10-11 13:44 ` Valdis.Kletnieks
2012-10-11 15:34 ` Jiri Slaby
2012-10-11 17:56 ` Valdis.Kletnieks
2012-10-11 17:59 ` Jiri Slaby
2012-10-11 18:19 ` Valdis.Kletnieks
2012-10-11 22:08 ` kswapd0: excessive " Jiri Slaby
2012-10-12 12:37 ` Jiri Slaby
2012-10-12 13:57 ` Mel Gorman
2012-10-15 9:54 ` Jiri Slaby
2012-10-15 11:09 ` Mel Gorman
2012-10-29 10:52 ` Thorsten Leemhuis
2012-10-30 19:18 ` Mel Gorman
2012-10-31 11:25 ` Thorsten Leemhuis
2012-10-31 15:04 ` Mel Gorman
2012-11-04 16:36 ` Rik van Riel
2012-11-02 10:44 ` Zdenek Kabelac
2012-11-02 10:53 ` Jiri Slaby
2012-11-02 19:45 ` Jiri Slaby
2012-11-04 11:26 ` Zdenek Kabelac
2012-11-05 14:24 ` [PATCH] Revert "mm: vmscan: scale number of pages reclaimed by reclaim/compaction based on failures" Mel Gorman
2012-11-06 10:15 ` Johannes Hirte
2012-11-09 8:36 ` Mel Gorman
2012-11-14 21:43 ` Johannes Hirte
2012-11-09 9:12 ` Mel Gorman
2012-11-09 4:22 ` kswapd0: excessive CPU usage Seth Jennings
2012-11-09 8:07 ` Zdenek Kabelac
2012-11-09 9:06 ` Mel Gorman
2012-11-11 9:13 ` Zdenek Kabelac
2012-11-12 11:37 ` [PATCH] Revert "mm: remove __GFP_NO_KSWAPD" Mel Gorman
2012-11-16 19:14 ` Josh Boyer
2012-11-16 19:51 ` Andrew Morton
2012-11-20 1:43 ` Valdis.Kletnieks
2012-11-16 20:06 ` Mel Gorman
2012-11-20 15:38 ` Josh Boyer
2012-11-20 16:13 ` Bruno Wolff III
2012-11-20 17:43 ` Thorsten Leemhuis
2012-11-23 15:20 ` Thorsten Leemhuis
2012-11-27 11:12 ` Mel Gorman
2012-11-21 15:08 ` Mel Gorman
2012-11-20 9:18 ` Glauber Costa
2012-11-20 20:18 ` Andrew Morton
2012-11-21 8:30 ` Glauber Costa
2012-11-12 12:19 ` Mel Gorman [this message]
2012-11-12 13:13 ` kswapd0: excessive CPU usage Zdenek Kabelac
2012-11-12 13:31 ` Mel Gorman
2012-11-12 14:50 ` Zdenek Kabelac
2012-11-18 19:00 ` Zdenek Kabelac
2012-11-18 19:07 ` Jiri Slaby
2012-11-09 8:40 ` Mel Gorman
2012-10-11 22:14 ` kswapd0: wxcessive " Andrew Morton
2012-10-11 22:26 ` Jiri Slaby
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20121112121956.GT8218@suse.de \
--to=mgorman@suse.de \
--cc=Valdis.Kletnieks@vt.edu \
--cc=akpm@linux-foundation.org \
--cc=jirislaby@gmail.com \
--cc=jslaby@suse.cz \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=rcj@linux.vnet.ibm.com \
--cc=riel@redhat.com \
--cc=sjenning@linux.vnet.ibm.com \
--cc=zkabelac@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).