linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Vlastimil Babka <vbabka@suse.cz>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Vlastimil Babka <vbabka@suse.cz>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Mel Gorman <mgorman@suse.de>, Rik van Riel <riel@redhat.com>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>
Subject: [PATCH V2 4/6] mm: compaction: detect when scanners meet in isolate_freepages
Date: Wed, 11 Dec 2013 11:24:35 +0100	[thread overview]
Message-ID: <1386757477-10333-5-git-send-email-vbabka@suse.cz> (raw)
In-Reply-To: <1386757477-10333-1-git-send-email-vbabka@suse.cz>

Compaction of a zone is finished when the migrate scanner (which begins at the
zone's lowest pfn) meets the free page scanner (which begins at the zone's
highest pfn). This is detected in compact_zone() and in the case of direct
compaction, the compact_blockskip_flush flag is set so that kswapd later resets
the cached scanner pfn's, and a new compaction may again start at the zone's
borders.

The meeting of the scanners can happen during either scanner's activity.
However, it may currently fail to be detected when it occurs in the free page
scanner, due to two problems. First, isolate_freepages() keeps free_pfn at the
highest block where it isolated pages from, for the purposes of not missing the
pages that are returned back to allocator when migration fails. Second, failing
to isolate enough free pages due to scanners meeting results in -ENOMEM being
returned by migrate_pages(), which makes compact_zone() bail out immediately
without calling compact_finished() that would detect scanners meeting.

This failure to detect scanners meeting might result in repeated attempts at
compaction of a zone that keep starting from the cached pfn's close to the
meeting point, and quickly failing through the -ENOMEM path, without the cached
pfns being reset, over and over. This has been observed (through additional
tracepoints) in the third phase of the mmtests stress-highalloc benchmark, where
the allocator runs on an otherwise idle system. The problem was observed in the
DMA32 zone, which was used as a fallback to the preferred Normal zone, but on
the 4GB system it was actually the largest zone. The problem is even amplified
for such fallback zone - the deferred compaction logic, which could (after
being fixed by a previous patch) reset the cached scanner pfn's, is only
applied to the preferred zone and not for the fallbacks.

The problem in the third phase of the benchmark was further amplified by commit
81c0a2bb ("mm: page_alloc: fair zone allocator policy") which resulted in a
non-deterministic regression of the allocation success rate from ~85% to ~65%.
This occurs in about half of benchmark runs, making bisection problematic.
It is unlikely that the commit itself is buggy, but it should put more pressure
on the DMA32 zone during phases 1 and 2, which may leave it more fragmented in
phase 3 and expose the bugs that this patch fixes.

The fix is to make scanners meeting in isolate_freepage() stay that way, and
to check in compact_zone() for scanners meeting when migrate_pages() returns
-ENOMEM. The result is that compact_finished() also detects scanners meeting
and sets the compact_blockskip_flush flag to make kswapd reset the scanner
pfn's.

The results in stress-highalloc benchmark show that the "regression" by commit
81c0a2bb in phase 3 no longer occurs, and phase 1 and 2 allocation success rates
are also significantly improved.

Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/compaction.c | 19 +++++++++++++++----
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 3313cc8..ae83a1c 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -656,7 +656,7 @@ static void isolate_freepages(struct zone *zone,
 	 * is the end of the pageblock the migration scanner is using.
 	 */
 	pfn = cc->free_pfn;
-	low_pfn = cc->migrate_pfn + pageblock_nr_pages;
+	low_pfn = ALIGN(cc->migrate_pfn + 1, pageblock_nr_pages);
 
 	/*
 	 * Take care that if the migration scanner is at the end of the zone
@@ -672,7 +672,7 @@ static void isolate_freepages(struct zone *zone,
 	 * pages on cc->migratepages. We stop searching if the migrate
 	 * and free page scanners meet or enough free pages are isolated.
 	 */
-	for (; pfn > low_pfn && cc->nr_migratepages > nr_freepages;
+	for (; pfn >= low_pfn && cc->nr_migratepages > nr_freepages;
 					pfn -= pageblock_nr_pages) {
 		unsigned long isolated;
 
@@ -734,7 +734,14 @@ static void isolate_freepages(struct zone *zone,
 	/* split_free_page does not map the pages */
 	map_pages(freelist);
 
-	cc->free_pfn = high_pfn;
+	/*
+	 * If we crossed the migrate scanner, we want to keep it that way
+	 * so that compact_finished() may detect this
+	 */
+	if (pfn < low_pfn)
+		cc->free_pfn = max(pfn, zone->zone_start_pfn);
+	else
+		cc->free_pfn = high_pfn;
 	cc->nr_freepages = nr_freepages;
 }
 
@@ -1001,7 +1008,11 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
 		if (err) {
 			putback_movable_pages(&cc->migratepages);
 			cc->nr_migratepages = 0;
-			if (err == -ENOMEM) {
+			/*
+			 * migrate_pages() may return -ENOMEM when scanners meet
+			 * and we want compact_finished() to detect it
+			 */
+			if (err == -ENOMEM && cc->free_pfn > cc->migrate_pfn) {
 				ret = COMPACT_PARTIAL;
 				goto out;
 			}
-- 
1.8.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2013-12-11 10:24 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-11 10:24 [PATCH V2 0/6] Memory compaction efficiency improvements Vlastimil Babka
2013-12-11 10:24 ` [PATCH V2 1/6] mm: compaction: trace compaction begin and end Vlastimil Babka
2013-12-11 10:24 ` [PATCH V2 2/6] mm: compaction: encapsulate defer reset logic Vlastimil Babka
2013-12-11 10:24 ` [PATCH V2 3/6] mm: compaction: reset cached scanner pfn's before reading them Vlastimil Babka
2013-12-11 10:24 ` Vlastimil Babka [this message]
2013-12-11 10:24 ` [PATCH V2 5/6] mm: compaction: do not mark unmovable pageblocks as skipped in async compaction Vlastimil Babka
2013-12-11 10:24 ` [PATCH V2 6/6] mm: compaction: reset scanner positions immediately when they meet Vlastimil Babka
2013-12-12  6:12 ` [PATCH V2 0/6] Memory compaction efficiency improvements Joonsoo Kim
2013-12-12 13:26   ` Vlastimil Babka
2013-12-13  2:03     ` Joonsoo Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1386757477-10333-5-git-send-email-vbabka@suse.cz \
    --to=vbabka@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).