From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756444Ab3KZKpq (ORCPT ); Tue, 26 Nov 2013 05:45:46 -0500 Received: from cantor2.suse.de ([195.135.220.15]:40610 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755565Ab3KZKpp (ORCPT ); Tue, 26 Nov 2013 05:45:45 -0500 Date: Tue, 26 Nov 2013 10:45:42 +0000 From: Mel Gorman To: Vlastimil Babka Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Rik van Riel Subject: Re: [PATCH 3/5] mm: compaction: detect when scanners meet in isolate_freepages Message-ID: <20131126104542.GH5285@suse.de> References: <1385389570-11393-1-git-send-email-vbabka@suse.cz> <1385389570-11393-4-git-send-email-vbabka@suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <1385389570-11393-4-git-send-email-vbabka@suse.cz> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Nov 25, 2013 at 03:26:08PM +0100, Vlastimil Babka wrote: > Compaction of a zone is finished when the migrate scanner (which begins at the > zone's lowest pfn) meets the free page scanner (which begins at the zone's > highest pfn). This is detected in compact_zone() and in the case of direct > compaction, the compact_blockskip_flush flag is set so that kswapd later resets > the cached scanner pfn's, and a new compaction may again start at the zone's > borders. > > The meeting of the scanners can happen during either scanner's activity. > However, it may currently fail to be detected when it occurs in the free page > scanner, due to two problems. First, isolate_freepages() keeps free_pfn at the > highest block where it isolated pages from, for the purposes of not missing the > pages that are returned back to allocator when migration fails. Second, failing > to isolate enough free pages due to scanners meeting results in -ENOMEM being > returned by migrate_pages(), which makes compact_zone() bail out immediately > without calling compact_finished() that would detect scanners meeting. > > This failure to detect scanners meeting might result in repeated attempts at > compaction of a zone that keep starting from the cached pfn's close to the > meeting point, and quickly failing through the -ENOMEM path, without the cached > pfns being reset, over and over. This has been observed (through additional > tracepoints) in the third phase of the mmtests stress-highalloc benchmark, where > the allocator runs on an otherwise idle system. The problem was observed in the > DMA32 zone, which was used as a fallback to the preferred Normal zone, but on > the 4GB system it was actually the largest zone. The problem is even amplified > for such fallback zone - the deferred compaction logic, which could (after > being fixed by a previous patch) reset the cached scanner pfn's, is only > applied to the preferred zone and not for the fallbacks. > > The problem in the third phase of the benchmark was further amplified by commit > 81c0a2bb ("mm: page_alloc: fair zone allocator policy") which resulted in a > non-deterministic regression of the allocation success rate from ~85% to ~65%. > This occurs in about half of benchmark runs, making bisection problematic. > It is unlikely that the commit itself is buggy, but it should put more pressure > on the DMA32 zone during phases 1 and 2, which may leave it more fragmented in > phase 3 and expose the bugs that this patch fixes. > > The fix is to make scanners meeting in isolate_freepage() stay that way, and > to check in compact_zone() for scanners meeting when migrate_pages() returns > -ENOMEM. The result is that compact_finished() also detects scanners meeting > and sets the compact_blockskip_flush flag to make kswapd reset the scanner > pfn's. > > The results in stress-highalloc benchmark show that the "regression" by commit > 81c0a2bb in phase 3 no longer occurs, and phase 1 and 2 allocation success rates > are significantly improved. > > Cc: Mel Gorman > Cc: Rik van Riel > Signed-off-by: Vlastimil Babka > --- > mm/compaction.c | 19 +++++++++++++++---- > 1 file changed, 15 insertions(+), 4 deletions(-) > > diff --git a/mm/compaction.c b/mm/compaction.c > index 6a2f0c2..0702bdf 100644 > --- a/mm/compaction.c > +++ b/mm/compaction.c > @@ -656,7 +656,7 @@ static void isolate_freepages(struct zone *zone, > * is the end of the pageblock the migration scanner is using. > */ > pfn = cc->free_pfn; > - low_pfn = cc->migrate_pfn + pageblock_nr_pages; > + low_pfn = ALIGN(cc->migrate_pfn + 1, pageblock_nr_pages); > > /* > * Take care that if the migration scanner is at the end of the zone > @@ -672,7 +672,7 @@ static void isolate_freepages(struct zone *zone, > * pages on cc->migratepages. We stop searching if the migrate > * and free page scanners meet or enough free pages are isolated. > */ > - for (; pfn > low_pfn && cc->nr_migratepages > nr_freepages; > + for (; pfn >= low_pfn && cc->nr_migratepages > nr_freepages; > pfn -= pageblock_nr_pages) { > unsigned long isolated; > > @@ -734,7 +734,14 @@ static void isolate_freepages(struct zone *zone, > /* split_free_page does not map the pages */ > map_pages(freelist); > > - cc->free_pfn = high_pfn; > + /* > + * If we crossed the migrate scanner, we want to keep it that way > + * so that compact_finished() may detect this > + */ Whitespace damage. > + if (pfn < low_pfn) > + cc->free_pfn = max(pfn, zone->zone_start_pfn); Is it even possible for this condition to occur? low_pfn bound is cc->migrate_pfn and the free scanner should only start after some pages have already been isolated for migration. > + else > + cc->free_pfn = high_pfn; > cc->nr_freepages = nr_freepages; > } > > @@ -999,7 +1006,11 @@ static int compact_zone(struct zone *zone, struct compact_control *cc) > if (err) { > putback_movable_pages(&cc->migratepages); > cc->nr_migratepages = 0; > - if (err == -ENOMEM) { > + /* > + * migrate_pages() may return -ENOMEM when scanners meet > + * and we want compact_finished() to detect it > + */ > + if (err == -ENOMEM && cc->free_pfn > cc->migrate_pfn) { > ret = COMPACT_PARTIAL; > goto out; > } > -- > 1.8.1.4 > -- Mel Gorman SUSE Labs