stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Vlastimil Babka <vbabka@suse.cz>,
	Mel Gorman <mgorman@suse.de>, Rik van Riel <riel@redhat.com>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: [PATCH 3.10 12/18] mm: compaction: detect when scanners meet in isolate_freepages
Date: Thu, 12 Jun 2014 16:22:16 -0700	[thread overview]
Message-ID: <20140612232213.468071267@linuxfoundation.org> (raw)
In-Reply-To: <20140612232212.960235342@linuxfoundation.org>

3.10-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Vlastimil Babka <vbabka@suse.cz>

commit 7ed695e069c3cbea5e1fd08f84a04536da91f584 upstream.

Compaction of a zone is finished when the migrate scanner (which begins
at the zone's lowest pfn) meets the free page scanner (which begins at
the zone's highest pfn).  This is detected in compact_zone() and in the
case of direct compaction, the compact_blockskip_flush flag is set so
that kswapd later resets the cached scanner pfn's, and a new compaction
may again start at the zone's borders.

The meeting of the scanners can happen during either scanner's activity.
However, it may currently fail to be detected when it occurs in the free
page scanner, due to two problems.  First, isolate_freepages() keeps
free_pfn at the highest block where it isolated pages from, for the
purposes of not missing the pages that are returned back to allocator
when migration fails.  Second, failing to isolate enough free pages due
to scanners meeting results in -ENOMEM being returned by
migrate_pages(), which makes compact_zone() bail out immediately without
calling compact_finished() that would detect scanners meeting.

This failure to detect scanners meeting might result in repeated
attempts at compaction of a zone that keep starting from the cached
pfn's close to the meeting point, and quickly failing through the
-ENOMEM path, without the cached pfns being reset, over and over.  This
has been observed (through additional tracepoints) in the third phase of
the mmtests stress-highalloc benchmark, where the allocator runs on an
otherwise idle system.  The problem was observed in the DMA32 zone,
which was used as a fallback to the preferred Normal zone, but on the
4GB system it was actually the largest zone.  The problem is even
amplified for such fallback zone - the deferred compaction logic, which
could (after being fixed by a previous patch) reset the cached scanner
pfn's, is only applied to the preferred zone and not for the fallbacks.

The problem in the third phase of the benchmark was further amplified by
commit 81c0a2bb515f ("mm: page_alloc: fair zone allocator policy") which
resulted in a non-deterministic regression of the allocation success
rate from ~85% to ~65%.  This occurs in about half of benchmark runs,
making bisection problematic.  It is unlikely that the commit itself is
buggy, but it should put more pressure on the DMA32 zone during phases 1
and 2, which may leave it more fragmented in phase 3 and expose the bugs
that this patch fixes.

The fix is to make scanners meeting in isolate_freepage() stay that way,
and to check in compact_zone() for scanners meeting when migrate_pages()
returns -ENOMEM.  The result is that compact_finished() also detects
scanners meeting and sets the compact_blockskip_flush flag to make
kswapd reset the scanner pfn's.

The results in stress-highalloc benchmark show that the "regression" by
commit 81c0a2bb515f in phase 3 no longer occurs, and phase 1 and 2
allocation success rates are also significantly improved.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 mm/compaction.c |   19 +++++++++++++++----
 1 file changed, 15 insertions(+), 4 deletions(-)

--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -667,7 +667,7 @@ static void isolate_freepages(struct zon
 	 * is the end of the pageblock the migration scanner is using.
 	 */
 	pfn = cc->free_pfn;
-	low_pfn = cc->migrate_pfn + pageblock_nr_pages;
+	low_pfn = ALIGN(cc->migrate_pfn + 1, pageblock_nr_pages);
 
 	/*
 	 * Take care that if the migration scanner is at the end of the zone
@@ -683,7 +683,7 @@ static void isolate_freepages(struct zon
 	 * pages on cc->migratepages. We stop searching if the migrate
 	 * and free page scanners meet or enough free pages are isolated.
 	 */
-	for (; pfn > low_pfn && cc->nr_migratepages > nr_freepages;
+	for (; pfn >= low_pfn && cc->nr_migratepages > nr_freepages;
 					pfn -= pageblock_nr_pages) {
 		unsigned long isolated;
 
@@ -738,7 +738,14 @@ static void isolate_freepages(struct zon
 	/* split_free_page does not map the pages */
 	map_pages(freelist);
 
-	cc->free_pfn = high_pfn;
+	/*
+	 * If we crossed the migrate scanner, we want to keep it that way
+	 * so that compact_finished() may detect this
+	 */
+	if (pfn < low_pfn)
+		cc->free_pfn = max(pfn, zone->zone_start_pfn);
+	else
+		cc->free_pfn = high_pfn;
 	cc->nr_freepages = nr_freepages;
 }
 
@@ -1003,7 +1010,11 @@ static int compact_zone(struct zone *zon
 		if (err) {
 			putback_movable_pages(&cc->migratepages);
 			cc->nr_migratepages = 0;
-			if (err == -ENOMEM) {
+			/*
+			 * migrate_pages() may return -ENOMEM when scanners meet
+			 * and we want compact_finished() to detect it
+			 */
+			if (err == -ENOMEM && cc->free_pfn > cc->migrate_pfn) {
 				ret = COMPACT_PARTIAL;
 				goto out;
 			}



  parent reply	other threads:[~2014-06-12 23:22 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-12 23:22 [PATCH 3.10 00/18] 3.10.44-stable review Greg Kroah-Hartman
2014-06-12 23:22 ` [PATCH 3.10 01/18] fs,userns: Change inode_capable to capable_wrt_inode_uidgid Greg Kroah-Hartman
2014-06-12 23:22 ` [PATCH 3.10 02/18] mlx4_en: dont use napi_synchronize inside mlx4_en_netpoll Greg Kroah-Hartman
2014-06-12 23:22 ` [PATCH 3.10 03/18] ARM: mvebu: fix NOR bus-width in Armada XP GP Device Tree Greg Kroah-Hartman
2014-06-12 23:22 ` [PATCH 3.10 04/18] ARM: mvebu: fix NOR bus-width in Armada XP OpenBlocks AX3 " Greg Kroah-Hartman
2014-06-12 23:22 ` [PATCH 3.10 05/18] netfilter: ipv4: defrag: set local_df flag on defragmented skb Greg Kroah-Hartman
2014-06-12 23:22 ` [PATCH 3.10 06/18] Target/iscsi,iser: Avoid accepting transport connections during stop stage Greg Kroah-Hartman
2014-06-12 23:22 ` [PATCH 3.10 07/18] iser-target: Fix multi network portal shutdown regression Greg Kroah-Hartman
2014-06-12 23:22 ` [PATCH 3.10 08/18] iscsi-target: Fix wrong buffer / buffer overrun in iscsi_change_param_value() Greg Kroah-Hartman
2014-06-12 23:22 ` [PATCH 3.10 09/18] target: Allow READ_CAPACITY opcode in ALUA Standby access state Greg Kroah-Hartman
2014-06-12 23:22 ` [PATCH 3.10 10/18] target: Fix alua_access_state attribute OOPs for un-configured devices Greg Kroah-Hartman
2014-06-12 23:22 ` [PATCH 3.10 11/18] mm: compaction: reset cached scanner pfns before reading them Greg Kroah-Hartman
2014-06-12 23:22 ` Greg Kroah-Hartman [this message]
2014-06-12 23:22 ` [PATCH 3.10 13/18] mm/compaction: make isolate_freepages start at pageblock boundary Greg Kroah-Hartman
2014-06-12 23:22 ` [PATCH 3.10 14/18] auditsc: audit_krule mask accesses need bounds checking Greg Kroah-Hartman
2014-06-12 23:22 ` [PATCH 3.10 15/18] SCSI: megaraid: Use resource_size_t for PCI resources, not long Greg Kroah-Hartman
2014-06-12 23:22 ` [PATCH 3.10 16/18] mei: me: drop harmful wait optimization Greg Kroah-Hartman
2014-06-13  5:45 ` [PATCH 3.10 00/18] 3.10.44-stable review Guenter Roeck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140612232213.468071267@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=akpm@linux-foundation.org \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=riel@redhat.com \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).