All of lore.kernel.org
 help / color / mirror / Atom feed
From: Minchan Kim <minchan@kernel.org>
To: Mel Gorman <mgorman@suse.de>
Cc: Linux-MM <linux-mm@kvack.org>, Rik van Riel <riel@redhat.com>,
	Jim Schutt <jaschut@sandia.gov>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 6/6] mm: have order > 0 compaction start near a pageblock with free pages
Date: Wed, 8 Aug 2012 13:36:00 +0900	[thread overview]
Message-ID: <20120808043600.GD4247@bbox> (raw)
In-Reply-To: <1344342677-5845-7-git-send-email-mgorman@suse.de>

On Tue, Aug 07, 2012 at 01:31:17PM +0100, Mel Gorman wrote:
> commit [7db8889a: mm: have order > 0 compaction start off where it left]
> introduced a caching mechanism to reduce the amount work the free page
> scanner does in compaction. However, it has a problem. Consider two process
> simultaneously scanning free pages
> 
> 				    			C
> Process A		M     S     			F
> 		|---------------------------------------|
> Process B		M 	FS
> 
> C is zone->compact_cached_free_pfn
> S is cc->start_pfree_pfn
> M is cc->migrate_pfn
> F is cc->free_pfn
> 
> In this diagram, Process A has just reached its migrate scanner, wrapped
> around and updated compact_cached_free_pfn accordingly.
> 
> Simultaneously, Process B finishes isolating in a block and updates
> compact_cached_free_pfn again to the location of its free scanner.
> 
> Process A moves to "end_of_zone - one_pageblock" and runs this check
> 
>                 if (cc->order > 0 && (!cc->wrapped ||
>                                       zone->compact_cached_free_pfn >
>                                       cc->start_free_pfn))
>                         pfn = min(pfn, zone->compact_cached_free_pfn);
> 
> compact_cached_free_pfn is above where it started so the free scanner skips
> almost the entire space it should have scanned. When there are multiple
> processes compacting it can end in a situation where the entire zone is
> not being scanned at all.  Further, it is possible for two processes to
> ping-pong update to compact_cached_free_pfn which is just random.
> 
> Overall, the end result wrecks allocation success rates.
> 
> There is not an obvious way around this problem without introducing new
> locking and state so this patch takes a different approach.
> 
> First, it gets rid of the skip logic because it's not clear that it matters
> if two free scanners happen to be in the same block but with racing updates
> it's too easy for it to skip over blocks it should not.

Okay.

> 
> Second, it updates compact_cached_free_pfn in a more limited set of
> circumstances.
> 
> If a scanner has wrapped, it updates compact_cached_free_pfn to the end
> 	of the zone. Each time a wrapped scanner isoaltes a page, it
> 	updates compact_cached_free_pfn. The intention is that after
> 	wrapping, the compact_cached_free_pfn will be at the highest
> 	pageblock with free pages when compaction completes.

Okay.

> 
> If a scanner has not wrapped when compaction completes and

Compaction complete?
Your code seem to do it in isolate_freepages.
Isn't it compaction complete?

> 	compact_cached_free_pfn is set the end of the the zone, initialise
> 	it once.

I can't understad this part.
Could you elaborate a bit more?

> 
> This is not optimal and it can still race but the compact_cached_free_pfn
> will be pointing to or very near a pageblock with free pages.
> 
> Signed-off-by: Mel Gorman <mgorman@suse.de>
> ---
>  mm/compaction.c |   54 ++++++++++++++++++++++++++++--------------------------
>  1 file changed, 28 insertions(+), 26 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index be310f1..df50f73 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -419,6 +419,20 @@ static bool suitable_migration_target(struct page *page)
>  }
>  
>  /*
> + * Returns the start pfn of the last page block in a zone.  This is the starting
> + * point for full compaction of a zone.  Compaction searches for free pages from
> + * the end of each zone, while isolate_freepages_block scans forward inside each
> + * page block.
> + */
> +static unsigned long start_free_pfn(struct zone *zone)
> +{
> +	unsigned long free_pfn;
> +	free_pfn = zone->zone_start_pfn + zone->spanned_pages;
> +	free_pfn &= ~(pageblock_nr_pages-1);
> +	return free_pfn;
> +}
> +
> +/*
>   * Based on information in the current compact_control, find blocks
>   * suitable for isolating free pages from and then isolate them.
>   */
> @@ -457,17 +471,6 @@ static void isolate_freepages(struct zone *zone,
>  					pfn -= pageblock_nr_pages) {
>  		unsigned long isolated;
>  
> -		/*
> -		 * Skip ahead if another thread is compacting in the area
> -		 * simultaneously. If we wrapped around, we can only skip
> -		 * ahead if zone->compact_cached_free_pfn also wrapped to
> -		 * above our starting point.
> -		 */
> -		if (cc->order > 0 && (!cc->wrapped ||
> -				      zone->compact_cached_free_pfn >
> -				      cc->start_free_pfn))
> -			pfn = min(pfn, zone->compact_cached_free_pfn);
> -
>  		if (!pfn_valid(pfn))
>  			continue;
>  
> @@ -510,7 +513,15 @@ static void isolate_freepages(struct zone *zone,
>  		 */
>  		if (isolated) {
>  			high_pfn = max(high_pfn, pfn);
> -			if (cc->order > 0)
> +
> +			/*
> +			 * If the free scanner has wrapped, update
> +			 * compact_cached_free_pfn to point to the highest
> +			 * pageblock with free pages. This reduces excessive
> +			 * scanning of full pageblocks near the end of the
> +			 * zone
> +			 */
> +			if (cc->order > 0 && cc->wrapped)
>  				zone->compact_cached_free_pfn = high_pfn;
>  		}
>  	}
> @@ -520,6 +531,11 @@ static void isolate_freepages(struct zone *zone,
>  
>  	cc->free_pfn = high_pfn;
>  	cc->nr_freepages = nr_freepages;
> +
> +	/* If compact_cached_free_pfn is reset then set it now */
> +	if (cc->order > 0 && !cc->wrapped &&
> +			zone->compact_cached_free_pfn == start_free_pfn(zone))
> +		zone->compact_cached_free_pfn = high_pfn;
>  }
>  
>  /*
> @@ -607,20 +623,6 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
>  	return ISOLATE_SUCCESS;
>  }
>  
> -/*
> - * Returns the start pfn of the last page block in a zone.  This is the starting
> - * point for full compaction of a zone.  Compaction searches for free pages from
> - * the end of each zone, while isolate_freepages_block scans forward inside each
> - * page block.
> - */
> -static unsigned long start_free_pfn(struct zone *zone)
> -{
> -	unsigned long free_pfn;
> -	free_pfn = zone->zone_start_pfn + zone->spanned_pages;
> -	free_pfn &= ~(pageblock_nr_pages-1);
> -	return free_pfn;
> -}
> -
>  static int compact_finished(struct zone *zone,
>  			    struct compact_control *cc)
>  {
> -- 
> 1.7.9.2
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Minchan Kim <minchan@kernel.org>
To: Mel Gorman <mgorman@suse.de>
Cc: Linux-MM <linux-mm@kvack.org>, Rik van Riel <riel@redhat.com>,
	Jim Schutt <jaschut@sandia.gov>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 6/6] mm: have order > 0 compaction start near a pageblock with free pages
Date: Wed, 8 Aug 2012 13:36:00 +0900	[thread overview]
Message-ID: <20120808043600.GD4247@bbox> (raw)
In-Reply-To: <1344342677-5845-7-git-send-email-mgorman@suse.de>

On Tue, Aug 07, 2012 at 01:31:17PM +0100, Mel Gorman wrote:
> commit [7db8889a: mm: have order > 0 compaction start off where it left]
> introduced a caching mechanism to reduce the amount work the free page
> scanner does in compaction. However, it has a problem. Consider two process
> simultaneously scanning free pages
> 
> 				    			C
> Process A		M     S     			F
> 		|---------------------------------------|
> Process B		M 	FS
> 
> C is zone->compact_cached_free_pfn
> S is cc->start_pfree_pfn
> M is cc->migrate_pfn
> F is cc->free_pfn
> 
> In this diagram, Process A has just reached its migrate scanner, wrapped
> around and updated compact_cached_free_pfn accordingly.
> 
> Simultaneously, Process B finishes isolating in a block and updates
> compact_cached_free_pfn again to the location of its free scanner.
> 
> Process A moves to "end_of_zone - one_pageblock" and runs this check
> 
>                 if (cc->order > 0 && (!cc->wrapped ||
>                                       zone->compact_cached_free_pfn >
>                                       cc->start_free_pfn))
>                         pfn = min(pfn, zone->compact_cached_free_pfn);
> 
> compact_cached_free_pfn is above where it started so the free scanner skips
> almost the entire space it should have scanned. When there are multiple
> processes compacting it can end in a situation where the entire zone is
> not being scanned at all.  Further, it is possible for two processes to
> ping-pong update to compact_cached_free_pfn which is just random.
> 
> Overall, the end result wrecks allocation success rates.
> 
> There is not an obvious way around this problem without introducing new
> locking and state so this patch takes a different approach.
> 
> First, it gets rid of the skip logic because it's not clear that it matters
> if two free scanners happen to be in the same block but with racing updates
> it's too easy for it to skip over blocks it should not.

Okay.

> 
> Second, it updates compact_cached_free_pfn in a more limited set of
> circumstances.
> 
> If a scanner has wrapped, it updates compact_cached_free_pfn to the end
> 	of the zone. Each time a wrapped scanner isoaltes a page, it
> 	updates compact_cached_free_pfn. The intention is that after
> 	wrapping, the compact_cached_free_pfn will be at the highest
> 	pageblock with free pages when compaction completes.

Okay.

> 
> If a scanner has not wrapped when compaction completes and

Compaction complete?
Your code seem to do it in isolate_freepages.
Isn't it compaction complete?

> 	compact_cached_free_pfn is set the end of the the zone, initialise
> 	it once.

I can't understad this part.
Could you elaborate a bit more?

> 
> This is not optimal and it can still race but the compact_cached_free_pfn
> will be pointing to or very near a pageblock with free pages.
> 
> Signed-off-by: Mel Gorman <mgorman@suse.de>
> ---
>  mm/compaction.c |   54 ++++++++++++++++++++++++++++--------------------------
>  1 file changed, 28 insertions(+), 26 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index be310f1..df50f73 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -419,6 +419,20 @@ static bool suitable_migration_target(struct page *page)
>  }
>  
>  /*
> + * Returns the start pfn of the last page block in a zone.  This is the starting
> + * point for full compaction of a zone.  Compaction searches for free pages from
> + * the end of each zone, while isolate_freepages_block scans forward inside each
> + * page block.
> + */
> +static unsigned long start_free_pfn(struct zone *zone)
> +{
> +	unsigned long free_pfn;
> +	free_pfn = zone->zone_start_pfn + zone->spanned_pages;
> +	free_pfn &= ~(pageblock_nr_pages-1);
> +	return free_pfn;
> +}
> +
> +/*
>   * Based on information in the current compact_control, find blocks
>   * suitable for isolating free pages from and then isolate them.
>   */
> @@ -457,17 +471,6 @@ static void isolate_freepages(struct zone *zone,
>  					pfn -= pageblock_nr_pages) {
>  		unsigned long isolated;
>  
> -		/*
> -		 * Skip ahead if another thread is compacting in the area
> -		 * simultaneously. If we wrapped around, we can only skip
> -		 * ahead if zone->compact_cached_free_pfn also wrapped to
> -		 * above our starting point.
> -		 */
> -		if (cc->order > 0 && (!cc->wrapped ||
> -				      zone->compact_cached_free_pfn >
> -				      cc->start_free_pfn))
> -			pfn = min(pfn, zone->compact_cached_free_pfn);
> -
>  		if (!pfn_valid(pfn))
>  			continue;
>  
> @@ -510,7 +513,15 @@ static void isolate_freepages(struct zone *zone,
>  		 */
>  		if (isolated) {
>  			high_pfn = max(high_pfn, pfn);
> -			if (cc->order > 0)
> +
> +			/*
> +			 * If the free scanner has wrapped, update
> +			 * compact_cached_free_pfn to point to the highest
> +			 * pageblock with free pages. This reduces excessive
> +			 * scanning of full pageblocks near the end of the
> +			 * zone
> +			 */
> +			if (cc->order > 0 && cc->wrapped)
>  				zone->compact_cached_free_pfn = high_pfn;
>  		}
>  	}
> @@ -520,6 +531,11 @@ static void isolate_freepages(struct zone *zone,
>  
>  	cc->free_pfn = high_pfn;
>  	cc->nr_freepages = nr_freepages;
> +
> +	/* If compact_cached_free_pfn is reset then set it now */
> +	if (cc->order > 0 && !cc->wrapped &&
> +			zone->compact_cached_free_pfn == start_free_pfn(zone))
> +		zone->compact_cached_free_pfn = high_pfn;
>  }
>  
>  /*
> @@ -607,20 +623,6 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
>  	return ISOLATE_SUCCESS;
>  }
>  
> -/*
> - * Returns the start pfn of the last page block in a zone.  This is the starting
> - * point for full compaction of a zone.  Compaction searches for free pages from
> - * the end of each zone, while isolate_freepages_block scans forward inside each
> - * page block.
> - */
> -static unsigned long start_free_pfn(struct zone *zone)
> -{
> -	unsigned long free_pfn;
> -	free_pfn = zone->zone_start_pfn + zone->spanned_pages;
> -	free_pfn &= ~(pageblock_nr_pages-1);
> -	return free_pfn;
> -}
> -
>  static int compact_finished(struct zone *zone,
>  			    struct compact_control *cc)
>  {
> -- 
> 1.7.9.2
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

  parent reply	other threads:[~2012-08-08  4:34 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-08-07 12:31 [RFC PATCH 0/6] Improve hugepage allocation success rates under load Mel Gorman
2012-08-07 12:31 ` Mel Gorman
2012-08-07 12:31 ` [PATCH 1/6] mm: compaction: Update comment in try_to_compact_pages Mel Gorman
2012-08-07 12:31   ` Mel Gorman
2012-08-07 13:19   ` Rik van Riel
2012-08-07 13:19     ` Rik van Riel
2012-08-07 23:25   ` Minchan Kim
2012-08-07 23:25     ` Minchan Kim
2012-08-07 12:31 ` [PATCH 2/6] mm: vmscan: Scale number of pages reclaimed by reclaim/compaction based on failures Mel Gorman
2012-08-07 12:31   ` Mel Gorman
2012-08-07 13:23   ` Rik van Riel
2012-08-07 13:23     ` Rik van Riel
2012-08-08  1:48   ` Minchan Kim
2012-08-08  1:48     ` Minchan Kim
2012-08-08  7:55     ` Mel Gorman
2012-08-08  7:55       ` Mel Gorman
2012-08-08  8:27       ` Minchan Kim
2012-08-08  8:27         ` Minchan Kim
2012-08-08  8:51         ` Mel Gorman
2012-08-08  8:51           ` Mel Gorman
2012-08-08 23:51           ` Minchan Kim
2012-08-08 23:51             ` Minchan Kim
2012-08-09  7:49             ` Mel Gorman
2012-08-09  7:49               ` Mel Gorman
2012-08-09  8:27               ` Minchan Kim
2012-08-09  8:27                 ` Minchan Kim
2012-08-09  9:20                 ` Mel Gorman
2012-08-09  9:20                   ` Mel Gorman
2012-08-09 20:29                   ` Rik van Riel
2012-08-09 20:29                     ` Rik van Riel
2012-08-10  8:14                     ` Mel Gorman
2012-08-10  8:14                       ` Mel Gorman
2012-08-09 23:27                   ` Minchan Kim
2012-08-09 23:27                     ` Minchan Kim
2012-08-10  8:34                     ` Mel Gorman
2012-08-10  8:34                       ` Mel Gorman
2012-08-10  8:48                       ` Minchan Kim
2012-08-10  8:48                         ` Minchan Kim
2012-08-07 12:31 ` [PATCH 3/6] mm: kswapd: Continue reclaiming for reclaim/compaction if the minimum number of pages have not been reclaimed Mel Gorman
2012-08-07 12:31   ` Mel Gorman
2012-08-07 13:26   ` Rik van Riel
2012-08-07 13:26     ` Rik van Riel
2012-08-08  2:07   ` Minchan Kim
2012-08-08  2:07     ` Minchan Kim
2012-08-08  9:07     ` Mel Gorman
2012-08-08  9:07       ` Mel Gorman
2012-08-08  9:58       ` Mel Gorman
2012-08-08  9:58         ` Mel Gorman
2012-08-07 12:31 ` [PATCH 4/6] mm: compaction: Capture a suitable high-order page immediately when it is made available Mel Gorman
2012-08-07 12:31   ` Mel Gorman
2012-08-07 13:30   ` Rik van Riel
2012-08-07 13:30     ` Rik van Riel
2012-08-07 12:31 ` [PATCH 5/6] mm: have order > 0 compaction start off where it left Mel Gorman
2012-08-07 12:31   ` Mel Gorman
2012-08-07 12:31 ` [PATCH 6/6] mm: have order > 0 compaction start near a pageblock with free pages Mel Gorman
2012-08-07 12:31   ` Mel Gorman
2012-08-07 14:45   ` Rik van Riel
2012-08-07 14:45     ` Rik van Riel
2012-08-07 14:52     ` Mel Gorman
2012-08-07 14:52       ` Mel Gorman
2012-08-07 15:20       ` Jim Schutt
2012-08-07 15:20         ` Jim Schutt
2012-08-07 15:45         ` Mel Gorman
2012-08-07 15:45           ` Mel Gorman
2012-08-08  4:36   ` Minchan Kim [this message]
2012-08-08  4:36     ` Minchan Kim
2012-08-08 10:18     ` Mel Gorman
2012-08-08 10:18       ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120808043600.GD4247@bbox \
    --to=minchan@kernel.org \
    --cc=jaschut@sandia.gov \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.