linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Mel Gorman <mgorman@suse.de>
To: P?draig Brady <P@draigBrady.com>
Cc: linux-mm@kvack.org
Subject: Re: sandy bridge kswapd0 livelock with pagecache
Date: Tue, 21 Jun 2011 11:39:20 +0100	[thread overview]
Message-ID: <20110621103920.GF9396@suse.de> (raw)
In-Reply-To: <4E0069FE.4000708@draigBrady.com>

On Tue, Jun 21, 2011 at 10:53:02AM +0100, P?draig Brady wrote:
> I tried the 2 patches here to no avail:
> http://marc.info/?l=linux-mm&m=130503811704830&w=2
> 
> I originally logged this at:
> https://bugzilla.redhat.com/show_bug.cgi?id=712019
> 
> I can compile up and quickly test any suggestions.
> 

I recently looked through what kswapd does and there are a number
of problem areas. Unfortunately, I haven't gotten around to doing
anything about it yet or running the test cases to see if they are
really problems. In your case, the following is a strong possibility
though. This should be applied on top of the two patches merged from
that thread.

This is not tested in any way, based on 3.0-rc3

==== CUT HERE ====
mm: vmscan: Stop looping in kswapd if high-order reclaim is failing

A number of people have identified a problem whereby kswapd consumes
99% of CPU in a tight loop. It was determined that there are constant
sources of high-order allocations but in the event the allocations are
failing, kswapd continues to consume CPU and reclaim too much memory.

kswapd can and does give up costly high-order reclaim but only if it is
failing to make forward progress. This patch tracks how much memory
kswapd has reclaimed. If it reclaims 4 times the size of the allocation
request, it resets to order-0, balance for that order and will go to
sleep unless there has been continued allocation requests. "4 times" is
a tad arbitrary but it's down to
(1<<PAGE_ALLOC_COSTLY_ORDER)*4 == SWAP_CLUSTER_MAX
which is the "standard" unit of reclaim kswapd works on so scale it
similarily for the higher orders.

Not signed off by as it is barely a prototype
---
 mm/vmscan.c |   11 ++++++++++-
 1 files changed, 10 insertions(+), 1 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index faa0a08..8fb262f 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2376,6 +2376,7 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order,
 	int i;
 	int end_zone = 0;	/* Inclusive.  0 = ZONE_DMA */
 	unsigned long total_scanned;
+	unsigned long total_reclaimed;
 	struct reclaim_state *reclaim_state = current->reclaim_state;
 	unsigned long nr_soft_reclaimed;
 	unsigned long nr_soft_scanned;
@@ -2397,6 +2398,7 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order,
 	};
 loop_again:
 	total_scanned = 0;
+	total_reclaimed = 0;
 	sc.nr_reclaimed = 0;
 	sc.may_writepage = !laptop_mode;
 	count_vm_event(PAGEOUTRUN);
@@ -2564,6 +2566,7 @@ loop_again:
 			break;
 	}
 out:
+	total_reclaimed += sc.nr_reclaimed;
 
 	/*
 	 * order-0: All zones must meet high watermark for a balanced node
@@ -2584,12 +2587,18 @@ out:
 		 * little point trying all over again as kswapd may
 		 * infinite loop.
 		 *
+		 * Similarly, if we have reclaimed far more pages than the
+		 * original request size, it's likely that contiguous reclaim
+		 * is not finding the pages it needs and it should give
+		 * up.
+		 *
 		 * Instead, recheck all watermarks at order-0 as they
 		 * are the most important. If watermarks are ok, kswapd will go
 		 * back to sleep. High-order users can still perform direct
 		 * reclaim if they wish.
 		 */
-		if (sc.nr_reclaimed < SWAP_CLUSTER_MAX)
+		if (sc.nr_reclaimed < SWAP_CLUSTER_MAX ||
+				(order > PAGE_ALLOC_COSTLY_ORDER && total_reclaimed > (4UL << order)) )
 			order = sc.order = 0;
 
 		goto loop_again;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2011-06-21 10:39 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-06-21  9:53 sandy bridge kswapd0 livelock with pagecache Pádraig Brady
2011-06-21 10:39 ` Mel Gorman [this message]
2011-06-21 10:47   ` Pádraig Brady
2011-06-21 11:34     ` Mel Gorman
2011-06-21 11:59       ` Pádraig Brady
2011-06-21 13:07         ` Mel Gorman
2011-06-21 14:23           ` Pádraig Brady
2011-06-22  9:44             ` Mel Gorman
2011-06-22 10:19               ` Pádraig Brady
2011-06-23 11:46                 ` Mel Gorman
2011-06-23 13:04                   ` Pádraig Brady
2011-06-23 15:24                     ` Mel Gorman
2011-06-23 15:32                       ` Pádraig Brady
2011-06-23 16:59                         ` Mel Gorman
2011-06-23 19:25                           ` Pádraig Brady
2011-06-24 11:44                             ` Mel Gorman
2011-06-24 13:10                               ` Pádraig Brady
2011-06-24 15:04                                 ` Mel Gorman
2011-06-24  6:33             ` Shaohua Li
2011-06-21 14:34         ` Mel Gorman
2011-06-21 15:29           ` Pádraig Brady

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110621103920.GF9396@suse.de \
    --to=mgorman@suse.de \
    --cc=P@draigBrady.com \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).