From: Mel Gorman <mgorman@suse.de>
To: P?draig Brady <P@draigBrady.com>
Cc: linux-mm@kvack.org
Subject: Re: sandy bridge kswapd0 livelock with pagecache
Date: Tue, 21 Jun 2011 11:39:20 +0100 [thread overview]
Message-ID: <20110621103920.GF9396@suse.de> (raw)
In-Reply-To: <4E0069FE.4000708@draigBrady.com>
On Tue, Jun 21, 2011 at 10:53:02AM +0100, P?draig Brady wrote:
> I tried the 2 patches here to no avail:
> http://marc.info/?l=linux-mm&m=130503811704830&w=2
>
> I originally logged this at:
> https://bugzilla.redhat.com/show_bug.cgi?id=712019
>
> I can compile up and quickly test any suggestions.
>
I recently looked through what kswapd does and there are a number
of problem areas. Unfortunately, I haven't gotten around to doing
anything about it yet or running the test cases to see if they are
really problems. In your case, the following is a strong possibility
though. This should be applied on top of the two patches merged from
that thread.
This is not tested in any way, based on 3.0-rc3
==== CUT HERE ====
mm: vmscan: Stop looping in kswapd if high-order reclaim is failing
A number of people have identified a problem whereby kswapd consumes
99% of CPU in a tight loop. It was determined that there are constant
sources of high-order allocations but in the event the allocations are
failing, kswapd continues to consume CPU and reclaim too much memory.
kswapd can and does give up costly high-order reclaim but only if it is
failing to make forward progress. This patch tracks how much memory
kswapd has reclaimed. If it reclaims 4 times the size of the allocation
request, it resets to order-0, balance for that order and will go to
sleep unless there has been continued allocation requests. "4 times" is
a tad arbitrary but it's down to
(1<<PAGE_ALLOC_COSTLY_ORDER)*4 == SWAP_CLUSTER_MAX
which is the "standard" unit of reclaim kswapd works on so scale it
similarily for the higher orders.
Not signed off by as it is barely a prototype
---
mm/vmscan.c | 11 ++++++++++-
1 files changed, 10 insertions(+), 1 deletions(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index faa0a08..8fb262f 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2376,6 +2376,7 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order,
int i;
int end_zone = 0; /* Inclusive. 0 = ZONE_DMA */
unsigned long total_scanned;
+ unsigned long total_reclaimed;
struct reclaim_state *reclaim_state = current->reclaim_state;
unsigned long nr_soft_reclaimed;
unsigned long nr_soft_scanned;
@@ -2397,6 +2398,7 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order,
};
loop_again:
total_scanned = 0;
+ total_reclaimed = 0;
sc.nr_reclaimed = 0;
sc.may_writepage = !laptop_mode;
count_vm_event(PAGEOUTRUN);
@@ -2564,6 +2566,7 @@ loop_again:
break;
}
out:
+ total_reclaimed += sc.nr_reclaimed;
/*
* order-0: All zones must meet high watermark for a balanced node
@@ -2584,12 +2587,18 @@ out:
* little point trying all over again as kswapd may
* infinite loop.
*
+ * Similarly, if we have reclaimed far more pages than the
+ * original request size, it's likely that contiguous reclaim
+ * is not finding the pages it needs and it should give
+ * up.
+ *
* Instead, recheck all watermarks at order-0 as they
* are the most important. If watermarks are ok, kswapd will go
* back to sleep. High-order users can still perform direct
* reclaim if they wish.
*/
- if (sc.nr_reclaimed < SWAP_CLUSTER_MAX)
+ if (sc.nr_reclaimed < SWAP_CLUSTER_MAX ||
+ (order > PAGE_ALLOC_COSTLY_ORDER && total_reclaimed > (4UL << order)) )
order = sc.order = 0;
goto loop_again;
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2011-06-21 10:39 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-06-21 9:53 sandy bridge kswapd0 livelock with pagecache Pádraig Brady
2011-06-21 10:39 ` Mel Gorman [this message]
2011-06-21 10:47 ` Pádraig Brady
2011-06-21 11:34 ` Mel Gorman
2011-06-21 11:59 ` Pádraig Brady
2011-06-21 13:07 ` Mel Gorman
2011-06-21 14:23 ` Pádraig Brady
2011-06-22 9:44 ` Mel Gorman
2011-06-22 10:19 ` Pádraig Brady
2011-06-23 11:46 ` Mel Gorman
2011-06-23 13:04 ` Pádraig Brady
2011-06-23 15:24 ` Mel Gorman
2011-06-23 15:32 ` Pádraig Brady
2011-06-23 16:59 ` Mel Gorman
2011-06-23 19:25 ` Pádraig Brady
2011-06-24 11:44 ` Mel Gorman
2011-06-24 13:10 ` Pádraig Brady
2011-06-24 15:04 ` Mel Gorman
2011-06-24 6:33 ` Shaohua Li
2011-06-21 14:34 ` Mel Gorman
2011-06-21 15:29 ` Pádraig Brady
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110621103920.GF9396@suse.de \
--to=mgorman@suse.de \
--cc=P@draigBrady.com \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).