linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [patch 1/2 v2]compaction: abort compaction loop if lock is contended or run too long
@ 2012-09-10  1:18 Shaohua Li
  2012-09-10  8:11 ` Mel Gorman
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Shaohua Li @ 2012-09-10  1:18 UTC (permalink / raw)
  To: linux-mm; +Cc: akpm, mgorman, aarcange

isolate_migratepages_range() might isolate none pages, for example, when
zone->lru_lock is contended and compaction is async. In this case, we should
abort compaction, otherwise, compact_zone will run a useless loop and make
zone->lru_lock is even contended.

V2:
only abort the compaction if lock is contended or run too long
Rearranged the code by Andrea Arcangeli.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Shaohua Li <shli@fusionio.com>
---
 mm/compaction.c |   12 +++++++-----
 mm/internal.h   |    2 +-
 2 files changed, 8 insertions(+), 6 deletions(-)

Index: linux/mm/compaction.c
===================================================================
--- linux.orig/mm/compaction.c	2012-09-06 18:37:52.636413761 +0800
+++ linux/mm/compaction.c	2012-09-10 08:49:40.377869710 +0800
@@ -70,8 +70,7 @@ static bool compact_checklock_irqsave(sp
 
 		/* async aborts if taking too long or contended */
 		if (!cc->sync) {
-			if (cc->contended)
-				*cc->contended = true;
+			cc->contended = true;
 			return false;
 		}
 
@@ -634,7 +633,7 @@ static isolate_migrate_t isolate_migrate
 
 	/* Perform the isolation */
 	low_pfn = isolate_migratepages_range(zone, cc, low_pfn, end_pfn);
-	if (!low_pfn)
+	if (!low_pfn || cc->contended)
 		return ISOLATE_ABORT;
 
 	cc->migrate_pfn = low_pfn;
@@ -831,6 +830,7 @@ static unsigned long compact_zone_order(
 				 int order, gfp_t gfp_mask,
 				 bool sync, bool *contended)
 {
+	unsigned long ret;
 	struct compact_control cc = {
 		.nr_freepages = 0,
 		.nr_migratepages = 0,
@@ -838,12 +838,14 @@ static unsigned long compact_zone_order(
 		.migratetype = allocflags_to_migratetype(gfp_mask),
 		.zone = zone,
 		.sync = sync,
-		.contended = contended,
 	};
 	INIT_LIST_HEAD(&cc.freepages);
 	INIT_LIST_HEAD(&cc.migratepages);
 
-	return compact_zone(zone, &cc);
+	ret = compact_zone(zone, &cc);
+	if (contended)
+		*contended = cc.contended;
+	return ret;
 }
 
 int sysctl_extfrag_threshold = 500;
Index: linux/mm/internal.h
===================================================================
--- linux.orig/mm/internal.h	2012-09-03 15:16:30.566299444 +0800
+++ linux/mm/internal.h	2012-09-10 08:45:41.980866645 +0800
@@ -131,7 +131,7 @@ struct compact_control {
 	int order;			/* order a direct compactor needs */
 	int migratetype;		/* MOVABLE, RECLAIMABLE etc */
 	struct zone *zone;
-	bool *contended;		/* True if a lock was contended */
+	bool contended;			/* True if a lock was contended */
 };
 
 unsigned long

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [patch 1/2 v2]compaction: abort compaction loop if lock is contended or run too long
  2012-09-10  1:18 [patch 1/2 v2]compaction: abort compaction loop if lock is contended or run too long Shaohua Li
@ 2012-09-10  8:11 ` Mel Gorman
  2012-09-11  1:45 ` Minchan Kim
  2012-09-11 23:34 ` Andrew Morton
  2 siblings, 0 replies; 18+ messages in thread
From: Mel Gorman @ 2012-09-10  8:11 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-mm, akpm, aarcange

On Mon, Sep 10, 2012 at 09:18:30AM +0800, Shaohua Li wrote:
> isolate_migratepages_range() might isolate none pages, for example, when
> zone->lru_lock is contended and compaction is async. In this case, we should
> abort compaction, otherwise, compact_zone will run a useless loop and make
> zone->lru_lock is even contended.
> 
> V2:
> only abort the compaction if lock is contended or run too long
> Rearranged the code by Andrea Arcangeli.
> 
> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
> Signed-off-by: Shaohua Li <shli@fusionio.com>

Acked-by: Mel Gorman <mgorman@suse.de>

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [patch 1/2 v2]compaction: abort compaction loop if lock is contended or run too long
  2012-09-10  1:18 [patch 1/2 v2]compaction: abort compaction loop if lock is contended or run too long Shaohua Li
  2012-09-10  8:11 ` Mel Gorman
@ 2012-09-11  1:45 ` Minchan Kim
  2012-09-11  8:29   ` Shaohua Li
  2012-09-11 23:34 ` Andrew Morton
  2 siblings, 1 reply; 18+ messages in thread
From: Minchan Kim @ 2012-09-11  1:45 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-mm, akpm, mgorman, aarcange

On Mon, Sep 10, 2012 at 09:18:30AM +0800, Shaohua Li wrote:
> isolate_migratepages_range() might isolate none pages, for example, when
> zone->lru_lock is contended and compaction is async. In this case, we should
> abort compaction, otherwise, compact_zone will run a useless loop and make
> zone->lru_lock is even contended.
> 

As I read old thread, you have the scenario and number.
Please include them in this description.

> V2:
> only abort the compaction if lock is contended or run too long
> Rearranged the code by Andrea Arcangeli.
> 
> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
> Signed-off-by: Shaohua Li <shli@fusionio.com>

Acked-by: Minchan Kim <minchan@kernel.org>

Other than some nitpicks, looks good to me.

> ---
>  mm/compaction.c |   12 +++++++-----
>  mm/internal.h   |    2 +-
>  2 files changed, 8 insertions(+), 6 deletions(-)
> 
> Index: linux/mm/compaction.c
> ===================================================================
> --- linux.orig/mm/compaction.c	2012-09-06 18:37:52.636413761 +0800
> +++ linux/mm/compaction.c	2012-09-10 08:49:40.377869710 +0800
> @@ -70,8 +70,7 @@ static bool compact_checklock_irqsave(sp
>  
>  		/* async aborts if taking too long or contended */
>  		if (!cc->sync) {
> -			if (cc->contended)
> -				*cc->contended = true;
> +			cc->contended = true;
>  			return false;
>  		}
>  
> @@ -634,7 +633,7 @@ static isolate_migrate_t isolate_migrate
>  
>  	/* Perform the isolation */
>  	low_pfn = isolate_migratepages_range(zone, cc, low_pfn, end_pfn);
> -	if (!low_pfn)
> +	if (!low_pfn || cc->contended)
>  		return ISOLATE_ABORT;
>  
>  	cc->migrate_pfn = low_pfn;
> @@ -831,6 +830,7 @@ static unsigned long compact_zone_order(
>  				 int order, gfp_t gfp_mask,
>  				 bool sync, bool *contended)
>  {
> +	unsigned long ret;
>  	struct compact_control cc = {
>  		.nr_freepages = 0,
>  		.nr_migratepages = 0,
> @@ -838,12 +838,14 @@ static unsigned long compact_zone_order(
>  		.migratetype = allocflags_to_migratetype(gfp_mask),
>  		.zone = zone,
>  		.sync = sync,
> -		.contended = contended,
>  	};
>  	INIT_LIST_HEAD(&cc.freepages);
>  	INIT_LIST_HEAD(&cc.migratepages);
>  
> -	return compact_zone(zone, &cc);
> +	ret = compact_zone(zone, &cc);
> +	if (contended)
> +		*contended = cc.contended;
> +	return ret;
>  }
>  
>  int sysctl_extfrag_threshold = 500;
> Index: linux/mm/internal.h
> ===================================================================
> --- linux.orig/mm/internal.h	2012-09-03 15:16:30.566299444 +0800
> +++ linux/mm/internal.h	2012-09-10 08:45:41.980866645 +0800
> @@ -131,7 +131,7 @@ struct compact_control {
>  	int order;			/* order a direct compactor needs */
>  	int migratetype;		/* MOVABLE, RECLAIMABLE etc */
>  	struct zone *zone;
> -	bool *contended;		/* True if a lock was contended */
> +	bool contended;			/* True if a lock was contended */

I would like to add more specific condition.
/* True if a lock was contended in case of async compaction */

>  };
>  
>  unsigned long
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [patch 1/2 v2]compaction: abort compaction loop if lock is contended or run too long
  2012-09-11  1:45 ` Minchan Kim
@ 2012-09-11  8:29   ` Shaohua Li
  2012-09-11  8:40     ` Minchan Kim
  0 siblings, 1 reply; 18+ messages in thread
From: Shaohua Li @ 2012-09-11  8:29 UTC (permalink / raw)
  To: Minchan Kim; +Cc: linux-mm, akpm, mgorman, aarcange

On Tue, Sep 11, 2012 at 10:45:55AM +0900, Minchan Kim wrote:
> On Mon, Sep 10, 2012 at 09:18:30AM +0800, Shaohua Li wrote:
> > isolate_migratepages_range() might isolate none pages, for example, when
> > zone->lru_lock is contended and compaction is async. In this case, we should
> > abort compaction, otherwise, compact_zone will run a useless loop and make
> > zone->lru_lock is even contended.
> > 
> 
> As I read old thread, you have the scenario and number.
> Please include them in this description.

that is just to show how the contention is, not performance data. I thought
explaining the contention is enough. How did you think?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [patch 1/2 v2]compaction: abort compaction loop if lock is contended or run too long
  2012-09-11  8:29   ` Shaohua Li
@ 2012-09-11  8:40     ` Minchan Kim
  0 siblings, 0 replies; 18+ messages in thread
From: Minchan Kim @ 2012-09-11  8:40 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-mm, akpm, mgorman, aarcange

On Tue, Sep 11, 2012 at 04:29:46PM +0800, Shaohua Li wrote:
> On Tue, Sep 11, 2012 at 10:45:55AM +0900, Minchan Kim wrote:
> > On Mon, Sep 10, 2012 at 09:18:30AM +0800, Shaohua Li wrote:
> > > isolate_migratepages_range() might isolate none pages, for example, when
> > > zone->lru_lock is contended and compaction is async. In this case, we should
> > > abort compaction, otherwise, compact_zone will run a useless loop and make
> > > zone->lru_lock is even contended.
> > > 
> > 
> > As I read old thread, you have the scenario and number.
> > Please include them in this description.
> 
> that is just to show how the contention is, not performance data. I thought
> explaining the contention is enough. How did you think?

If you measured it with perf, you had data which is
percentage of sampling of the lock. It's enough. I meant it, NOT
performance data.

> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [patch 1/2 v2]compaction: abort compaction loop if lock is contended or run too long
  2012-09-10  1:18 [patch 1/2 v2]compaction: abort compaction loop if lock is contended or run too long Shaohua Li
  2012-09-10  8:11 ` Mel Gorman
  2012-09-11  1:45 ` Minchan Kim
@ 2012-09-11 23:34 ` Andrew Morton
  2012-09-12  0:48   ` Andrea Arcangeli
  2012-09-12 11:08   ` Mel Gorman
  2 siblings, 2 replies; 18+ messages in thread
From: Andrew Morton @ 2012-09-11 23:34 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-mm, mgorman, aarcange

On Mon, 10 Sep 2012 09:18:30 +0800
Shaohua Li <shli@kernel.org> wrote:

> isolate_migratepages_range() might isolate none pages, for example, when
> zone->lru_lock is contended and compaction is async. In this case, we should
> abort compaction, otherwise, compact_zone will run a useless loop and make
> zone->lru_lock is even contended.
> 
> ...
>
> @@ -838,12 +838,14 @@ static unsigned long compact_zone_order(
>  		.migratetype = allocflags_to_migratetype(gfp_mask),
>  		.zone = zone,
>  		.sync = sync,
> -		.contended = contended,
>  	};
>  	INIT_LIST_HEAD(&cc.freepages);
>  	INIT_LIST_HEAD(&cc.migratepages);
>  
> -	return compact_zone(zone, &cc);
> +	ret = compact_zone(zone, &cc);
> +	if (contended)
> +		*contended = cc.contended;
> +	return ret;
>  }
>  

>From a quick read, `contended' is never NULL here.  And defining the
interface so that `contended' must be a valid pointer is a good change,
IMO - it results in simpler and faster code.

Alas, try_to_compact_pages()'s kerneldoc altogether forgets to describe
this argument.  Mel's
mm-compaction-capture-a-suitable-high-order-page-immediately-when-it-is-made-available.patch
adds a `pages' arg and forgets to document that as well.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [patch 1/2 v2]compaction: abort compaction loop if lock is contended or run too long
  2012-09-11 23:34 ` Andrew Morton
@ 2012-09-12  0:48   ` Andrea Arcangeli
  2012-09-12 21:20     ` Andrew Morton
  2012-09-12 11:08   ` Mel Gorman
  1 sibling, 1 reply; 18+ messages in thread
From: Andrea Arcangeli @ 2012-09-12  0:48 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Shaohua Li, linux-mm, mgorman

On Tue, Sep 11, 2012 at 04:34:55PM -0700, Andrew Morton wrote:
> On Mon, 10 Sep 2012 09:18:30 +0800
> Shaohua Li <shli@kernel.org> wrote:
> 
> > isolate_migratepages_range() might isolate none pages, for example, when
> > zone->lru_lock is contended and compaction is async. In this case, we should
> > abort compaction, otherwise, compact_zone will run a useless loop and make
> > zone->lru_lock is even contended.
> > 
> > ...
> >
> > @@ -838,12 +838,14 @@ static unsigned long compact_zone_order(
> >  		.migratetype = allocflags_to_migratetype(gfp_mask),
> >  		.zone = zone,
> >  		.sync = sync,
> > -		.contended = contended,
> >  	};
> >  	INIT_LIST_HEAD(&cc.freepages);
> >  	INIT_LIST_HEAD(&cc.migratepages);
> >  
> > -	return compact_zone(zone, &cc);
> > +	ret = compact_zone(zone, &cc);
> > +	if (contended)
> > +		*contended = cc.contended;
> > +	return ret;
> >  }
> >  
> 
> From a quick read, `contended' is never NULL here.  And defining the

"contended" pointer can be null with some caller so the if is
needed. The inner code was checking it before. This is also why we
couldn't use *contended for the loop break bugfix, because contended
could have been null at times. Now cc.contended is always available so
we can use that to fix the bug and it's self documenting as well.

> interface so that `contended' must be a valid pointer is a good change,
> IMO - it results in simpler and faster code.

Agreed.

> Alas, try_to_compact_pages()'s kerneldoc altogether forgets to describe
> this argument.  Mel's
> mm-compaction-capture-a-suitable-high-order-page-immediately-when-it-is-made-available.patch
> adds a `pages' arg and forgets to document that as well.
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [patch 1/2 v2]compaction: abort compaction loop if lock is contended or run too long
  2012-09-11 23:34 ` Andrew Morton
  2012-09-12  0:48   ` Andrea Arcangeli
@ 2012-09-12 11:08   ` Mel Gorman
  1 sibling, 0 replies; 18+ messages in thread
From: Mel Gorman @ 2012-09-12 11:08 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Shaohua Li, linux-mm, aarcange

On Tue, Sep 11, 2012 at 04:34:55PM -0700, Andrew Morton wrote:
> Alas, try_to_compact_pages()'s kerneldoc altogether forgets to describe
> this argument.  Mel's
> mm-compaction-capture-a-suitable-high-order-page-immediately-when-it-is-made-available.patch
> adds a `pages' arg and forgets to document that as well.
> 

*slaps*

This covers both of them.

---8<---
mm: compaction: Update try_to_compact_pages kernel doc comment

Parameters were added without documentation, tut tut.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 mm/compaction.c |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/mm/compaction.c b/mm/compaction.c
index 364e12f..614f18b 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -919,6 +919,8 @@ int sysctl_extfrag_threshold = 500;
  * @gfp_mask: The GFP mask of the current allocation
  * @nodemask: The allowed nodes to allocate from
  * @sync: Whether migration is synchronous or not
+ * @contended: Return value that is true if compaction was aborted due to lock contention
+ * @page: Optionally capture a free page of the requested order during compaction
  *
  * This is the main entry point for direct page compaction.
  */

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [patch 1/2 v2]compaction: abort compaction loop if lock is contended or run too long
  2012-09-12  0:48   ` Andrea Arcangeli
@ 2012-09-12 21:20     ` Andrew Morton
  2012-09-12 23:48       ` Andrea Arcangeli
  2012-09-13 10:03       ` Mel Gorman
  0 siblings, 2 replies; 18+ messages in thread
From: Andrew Morton @ 2012-09-12 21:20 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Shaohua Li, linux-mm, mgorman

On Wed, 12 Sep 2012 02:48:40 +0200
Andrea Arcangeli <aarcange@redhat.com> wrote:

> On Tue, Sep 11, 2012 at 04:34:55PM -0700, Andrew Morton wrote:
> > On Mon, 10 Sep 2012 09:18:30 +0800
> > Shaohua Li <shli@kernel.org> wrote:
> > 
> > > isolate_migratepages_range() might isolate none pages, for example, when
> > > zone->lru_lock is contended and compaction is async. In this case, we should
> > > abort compaction, otherwise, compact_zone will run a useless loop and make
> > > zone->lru_lock is even contended.
> > > 
> > > ...
> > >
> > > @@ -838,12 +838,14 @@ static unsigned long compact_zone_order(
> > >  		.migratetype = allocflags_to_migratetype(gfp_mask),
> > >  		.zone = zone,
> > >  		.sync = sync,
> > > -		.contended = contended,
> > >  	};
> > >  	INIT_LIST_HEAD(&cc.freepages);
> > >  	INIT_LIST_HEAD(&cc.migratepages);
> > >  
> > > -	return compact_zone(zone, &cc);
> > > +	ret = compact_zone(zone, &cc);
> > > +	if (contended)
> > > +		*contended = cc.contended;
> > > +	return ret;
> > >  }
> > >  
> > 
> > From a quick read, `contended' is never NULL here.  And defining the
> 
> "contended" pointer can be null with some caller so the if is
> needed. The inner code was checking it before. This is also why we
> couldn't use *contended for the loop break bugfix, because contended
> could have been null at times.

Confused.  I can see only two call sites:
__alloc_pages_slowpath
->__alloc_pages_direct_compact
  ->try_to_compact_pages
    ->compact_zone_order
and in both cases, `contended' points at valid storage.

> Now cc.contended is always available so
> we can use that to fix the bug and it's self documenting as well.
> 
> > interface so that `contended' must be a valid pointer is a good change,
> > IMO - it results in simpler and faster code.
> 
> Agreed.

OK, I'll slip this in there:

--- a/mm/compaction.c~mm-compaction-abort-compaction-loop-if-lock-is-contended-or-run-too-long-fix
+++ a/mm/compaction.c
@@ -909,8 +909,7 @@ static unsigned long compact_zone_order(
 	INIT_LIST_HEAD(&cc.migratepages);
 
 	ret = compact_zone(zone, &cc);
-	if (contended)
-		*contended = cc.contended;
+	*contended = cc.contended;
 	return ret;
 }
 
> > Alas, try_to_compact_pages()'s kerneldoc altogether forgets to describe
> > this argument.  Mel's
> > mm-compaction-capture-a-suitable-high-order-page-immediately-when-it-is-made-available.patch
> > adds a `pages' arg and forgets to document that as well.

poke poke

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [patch 1/2 v2]compaction: abort compaction loop if lock is contended or run too long
  2012-09-12 21:20     ` Andrew Morton
@ 2012-09-12 23:48       ` Andrea Arcangeli
  2012-09-13  0:47         ` Minchan Kim
  2012-09-13 10:03       ` Mel Gorman
  1 sibling, 1 reply; 18+ messages in thread
From: Andrea Arcangeli @ 2012-09-12 23:48 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Shaohua Li, linux-mm, mgorman

On Wed, Sep 12, 2012 at 02:20:19PM -0700, Andrew Morton wrote:
> OK, I'll slip this in there:
> 
> --- a/mm/compaction.c~mm-compaction-abort-compaction-loop-if-lock-is-contended-or-run-too-long-fix
> +++ a/mm/compaction.c
> @@ -909,8 +909,7 @@ static unsigned long compact_zone_order(
>  	INIT_LIST_HEAD(&cc.migratepages);
>  
>  	ret = compact_zone(zone, &cc);
> -	if (contended)
> -		*contended = cc.contended;
> +	*contended = cc.contended;
>  	return ret;
>  }

Ack the above, thanks.

One more thing, today a bug tripped while building cyanogenmod10 (it
swaps despite so much ram) after I added the cc->contended loop break
patch. The original version of the fix from Shaohua didn't have this
problem because it would only abort compaction if the low_pfn didn't
advance and in turn the list would be guaranteed empty.

Verifying the list is empty before aborting compaction (which takes a
path that ignores the cc->migratelist) should be enough to fix it and
it makes it really equivalent to the previous fix. Both cachelines
should be cache hot so it should be practically zero cost to check it.

Only lightly tested so far.

===

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [patch 1/2 v2]compaction: abort compaction loop if lock is contended or run too long
  2012-09-12 23:48       ` Andrea Arcangeli
@ 2012-09-13  0:47         ` Minchan Kim
  2012-09-13  2:49           ` Shaohua Li
  2012-09-13  9:38           ` Mel Gorman
  0 siblings, 2 replies; 18+ messages in thread
From: Minchan Kim @ 2012-09-13  0:47 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Andrew Morton, Shaohua Li, linux-mm, mgorman

Hi Andrea,

On Thu, Sep 13, 2012 at 01:48:08AM +0200, Andrea Arcangeli wrote:
> On Wed, Sep 12, 2012 at 02:20:19PM -0700, Andrew Morton wrote:
> > OK, I'll slip this in there:
> > 
> > --- a/mm/compaction.c~mm-compaction-abort-compaction-loop-if-lock-is-contended-or-run-too-long-fix
> > +++ a/mm/compaction.c
> > @@ -909,8 +909,7 @@ static unsigned long compact_zone_order(
> >  	INIT_LIST_HEAD(&cc.migratepages);
> >  
> >  	ret = compact_zone(zone, &cc);
> > -	if (contended)
> > -		*contended = cc.contended;
> > +	*contended = cc.contended;
> >  	return ret;
> >  }
> 
> Ack the above, thanks.
> 
> One more thing, today a bug tripped while building cyanogenmod10 (it
> swaps despite so much ram) after I added the cc->contended loop break
> patch. The original version of the fix from Shaohua didn't have this
> problem because it would only abort compaction if the low_pfn didn't
> advance and in turn the list would be guaranteed empty.

Nice catch!

> 
> Verifying the list is empty before aborting compaction (which takes a
> path that ignores the cc->migratelist) should be enough to fix it and
> it makes it really equivalent to the previous fix. Both cachelines
> should be cache hot so it should be practically zero cost to check it.
> 
> Only lightly tested so far.
> 
> ===
> >From b2a50e49d65596d3920773316ad9b7dd54e4acaf Mon Sep 17 00:00:00 2001
> From: Andrea Arcangeli <aarcange@redhat.com>
> Date: Thu, 13 Sep 2012 01:22:03 +0200
> Subject: [PATCH] mm: compaction: fix leak in cc->contended loop breaking
>  logic
> 
> We cannot return ISOLATE_ABORT when cc->contended is true, if we have
> some pages already successfully isolated in the cc->migratepages
> list, or they will be leaked.
> 
> The bug was highlighted by a nice VM_BUG_ON in the async compaction in
> kswapd. So I also added the symmetric VM_BUG_ON to the other caller of
> the function considering it looks a worthwhile VM_BUG_ON.

Fair enough.

> 
> ------------[ cut here ]------------
> kernel BUG at mm/compaction.c:934!
> invalid opcode: 0000 [#1] SMP
> Modules linked in: tun usbhid kvm_intel xhci_hcd kvm snd_hda_codec_realtek ehci_hcd usbcore snd_hda_intel sn
> er crc32c_intel psmouse ghash_clmulni_intel sr_mod snd sg cdrom snd_page_alloc usb_common pcspkr [last unloa
> 
> CPU 0
> Pid: 513, comm: kswapd0 Not tainted 3.6.0-rc4+ #17                  /DH61BE
> RIP: 0010:[<ffffffff8111302c>]  [<ffffffff8111302c>] __compact_pgdat+0x1ac/0x1b0
> RSP: 0018:ffff880216fa5cb0  EFLAGS: 00010283
> RAX: 0000000000000003 RBX: ffff880216fa5d00 RCX: 0000000000000002
> RDX: 00000000000008d7 RSI: 0000000000000002 RDI: ffffffff8195b058
> RBP: ffffffff8195b000 R08: 0000000000000be4 R09: ffffffff8195a9c0
> R10: ffffffff8195b400 R11: ffffffff8195b570 R12: 0000000000000001
> R13: 0000000000000001 R14: ffff880216fa5d10 R15: 0000000000000003
> FS:  0000000000000000(0000) GS:ffff88021fa00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 00007f14d4167000 CR3: 00000000018f1000 CR4: 00000000000407f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process kswapd0 (pid: 513, threadinfo ffff880216fa4000, task ffff880216cfef20)
> Stack:
> ffffffff8195a9c0 ffffffff8195b000 0000000000000320 0000000000000003
> ffffffff8195a9c0 ffffffff8195b640 0000000000000002 0000000000000c80
> 0000000000000001 ffffffff811132f3 ffff880216fa5d00 ffff880216fa5d00
> Call Trace:
> [<ffffffff811132f3>] ? compact_pgdat+0x23/0x30
> [<ffffffff8110503f>] ? kswapd+0x89f/0xac0
> [<ffffffff8106f450>] ? wake_up_bit+0x40/0x40
> [<ffffffff811047a0>] ? shrink_lruvec+0x510/0x510
> [<ffffffff811047a0>] ? shrink_lruvec+0x510/0x510
> [<ffffffff8106ef1e>] ? kthread+0x9e/0xb0
> 
> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
> ---
>  mm/compaction.c |    6 +++++-
>  1 files changed, 5 insertions(+), 1 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 6066a35..0292984 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -633,7 +633,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
>  
>  	/* Perform the isolation */
>  	low_pfn = isolate_migratepages_range(zone, cc, low_pfn, end_pfn);
> -	if (!low_pfn || cc->contended)
> +	if (!low_pfn || (cc->contended && !cc->nr_migratepages))
>  		return ISOLATE_ABORT;

I'm not sure it's best.
As you mentioned, it's same with first version of Shaohua.
But it could mitigate the goal of the patch if lock contention or
need_resched happens in the middle of loop once we isolate a
migratable page.

What do you think about this?

diff --git a/mm/compaction.c b/mm/compaction.c
index 0fbc6b7..7a009dd 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -848,6 +848,10 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
                switch (isolate_migratepages(zone, cc)) {
                case ISOLATE_ABORT:
                        ret = COMPACT_PARTIAL;
+                       if (!list_empty(&cc->migratepages)) {
+                               putback_lru_pages(&cc->migratepages);
+                               cc->nr_migratepages = 0;
+                       }
                        goto out;
                case ISOLATE_NONE:
                        continue;


>  
>  	cc->migrate_pfn = low_pfn;
> @@ -843,6 +843,10 @@ static unsigned long compact_zone_order(struct zone *zone,
>  	INIT_LIST_HEAD(&cc.migratepages);
>  
>  	ret = compact_zone(zone, &cc);
> +
> +	VM_BUG_ON(!list_empty(&cc.freepages));
> +	VM_BUG_ON(!list_empty(&cc.migratepages));
> +
>  	*contended = cc.contended;
>  	return ret;
>  }
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [patch 1/2 v2]compaction: abort compaction loop if lock is contended or run too long
  2012-09-13  0:47         ` Minchan Kim
@ 2012-09-13  2:49           ` Shaohua Li
  2012-09-13  9:38           ` Mel Gorman
  1 sibling, 0 replies; 18+ messages in thread
From: Shaohua Li @ 2012-09-13  2:49 UTC (permalink / raw)
  To: Minchan Kim; +Cc: Andrea Arcangeli, Andrew Morton, linux-mm, mgorman

On Thu, Sep 13, 2012 at 09:47:22AM +0900, Minchan Kim wrote:
> Hi Andrea,
> 
> On Thu, Sep 13, 2012 at 01:48:08AM +0200, Andrea Arcangeli wrote:
> > On Wed, Sep 12, 2012 at 02:20:19PM -0700, Andrew Morton wrote:
> > > OK, I'll slip this in there:
> > > 
> > > --- a/mm/compaction.c~mm-compaction-abort-compaction-loop-if-lock-is-contended-or-run-too-long-fix
> > > +++ a/mm/compaction.c
> > > @@ -909,8 +909,7 @@ static unsigned long compact_zone_order(
> > >  	INIT_LIST_HEAD(&cc.migratepages);
> > >  
> > >  	ret = compact_zone(zone, &cc);
> > > -	if (contended)
> > > -		*contended = cc.contended;
> > > +	*contended = cc.contended;
> > >  	return ret;
> > >  }
> > 
> > Ack the above, thanks.
> > 
> > One more thing, today a bug tripped while building cyanogenmod10 (it
> > swaps despite so much ram) after I added the cc->contended loop break
> > patch. The original version of the fix from Shaohua didn't have this
> > problem because it would only abort compaction if the low_pfn didn't
> > advance and in turn the list would be guaranteed empty.
> 
> Nice catch!
> 
> > 
> > Verifying the list is empty before aborting compaction (which takes a
> > path that ignores the cc->migratelist) should be enough to fix it and
> > it makes it really equivalent to the previous fix. Both cachelines
> > should be cache hot so it should be practically zero cost to check it.
> > 
> > Only lightly tested so far.
> > 
> > ===
> > >From b2a50e49d65596d3920773316ad9b7dd54e4acaf Mon Sep 17 00:00:00 2001
> > From: Andrea Arcangeli <aarcange@redhat.com>
> > Date: Thu, 13 Sep 2012 01:22:03 +0200
> > Subject: [PATCH] mm: compaction: fix leak in cc->contended loop breaking
> >  logic
> > 
> > We cannot return ISOLATE_ABORT when cc->contended is true, if we have
> > some pages already successfully isolated in the cc->migratepages
> > list, or they will be leaked.
> > 
> > The bug was highlighted by a nice VM_BUG_ON in the async compaction in
> > kswapd. So I also added the symmetric VM_BUG_ON to the other caller of
> > the function considering it looks a worthwhile VM_BUG_ON.
> 
> Fair enough.
> 
> > 
> > ------------[ cut here ]------------
> > kernel BUG at mm/compaction.c:934!
> > invalid opcode: 0000 [#1] SMP
> > Modules linked in: tun usbhid kvm_intel xhci_hcd kvm snd_hda_codec_realtek ehci_hcd usbcore snd_hda_intel sn
> > er crc32c_intel psmouse ghash_clmulni_intel sr_mod snd sg cdrom snd_page_alloc usb_common pcspkr [last unloa
> > 
> > CPU 0
> > Pid: 513, comm: kswapd0 Not tainted 3.6.0-rc4+ #17                  /DH61BE
> > RIP: 0010:[<ffffffff8111302c>]  [<ffffffff8111302c>] __compact_pgdat+0x1ac/0x1b0
> > RSP: 0018:ffff880216fa5cb0  EFLAGS: 00010283
> > RAX: 0000000000000003 RBX: ffff880216fa5d00 RCX: 0000000000000002
> > RDX: 00000000000008d7 RSI: 0000000000000002 RDI: ffffffff8195b058
> > RBP: ffffffff8195b000 R08: 0000000000000be4 R09: ffffffff8195a9c0
> > R10: ffffffff8195b400 R11: ffffffff8195b570 R12: 0000000000000001
> > R13: 0000000000000001 R14: ffff880216fa5d10 R15: 0000000000000003
> > FS:  0000000000000000(0000) GS:ffff88021fa00000(0000) knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > CR2: 00007f14d4167000 CR3: 00000000018f1000 CR4: 00000000000407f0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > Process kswapd0 (pid: 513, threadinfo ffff880216fa4000, task ffff880216cfef20)
> > Stack:
> > ffffffff8195a9c0 ffffffff8195b000 0000000000000320 0000000000000003
> > ffffffff8195a9c0 ffffffff8195b640 0000000000000002 0000000000000c80
> > 0000000000000001 ffffffff811132f3 ffff880216fa5d00 ffff880216fa5d00
> > Call Trace:
> > [<ffffffff811132f3>] ? compact_pgdat+0x23/0x30
> > [<ffffffff8110503f>] ? kswapd+0x89f/0xac0
> > [<ffffffff8106f450>] ? wake_up_bit+0x40/0x40
> > [<ffffffff811047a0>] ? shrink_lruvec+0x510/0x510
> > [<ffffffff811047a0>] ? shrink_lruvec+0x510/0x510
> > [<ffffffff8106ef1e>] ? kthread+0x9e/0xb0
> > 
> > Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
> > ---
> >  mm/compaction.c |    6 +++++-
> >  1 files changed, 5 insertions(+), 1 deletions(-)
> > 
> > diff --git a/mm/compaction.c b/mm/compaction.c
> > index 6066a35..0292984 100644
> > --- a/mm/compaction.c
> > +++ b/mm/compaction.c
> > @@ -633,7 +633,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
> >  
> >  	/* Perform the isolation */
> >  	low_pfn = isolate_migratepages_range(zone, cc, low_pfn, end_pfn);
> > -	if (!low_pfn || cc->contended)
> > +	if (!low_pfn || (cc->contended && !cc->nr_migratepages))
> >  		return ISOLATE_ABORT;
> 
> I'm not sure it's best.
> As you mentioned, it's same with first version of Shaohua.
> But it could mitigate the goal of the patch if lock contention or
> need_resched happens in the middle of loop once we isolate a
> migratable page.
> 
> What do you think about this?

Thanks for catching this issue. Both looks sane, but I would vote for Andrea's.
If some pages are isolated, better we use them.

Thanks,
Shaohua

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [patch 1/2 v2]compaction: abort compaction loop if lock is contended or run too long
  2012-09-13  0:47         ` Minchan Kim
  2012-09-13  2:49           ` Shaohua Li
@ 2012-09-13  9:38           ` Mel Gorman
  2012-09-13 10:13             ` Shaohua Li
  2012-09-13 16:04             ` Andrea Arcangeli
  1 sibling, 2 replies; 18+ messages in thread
From: Mel Gorman @ 2012-09-13  9:38 UTC (permalink / raw)
  To: Minchan Kim; +Cc: Andrea Arcangeli, Andrew Morton, Shaohua Li, linux-mm

On Thu, Sep 13, 2012 at 09:47:22AM +0900, Minchan Kim wrote:
> Hi Andrea,
> 
> On Thu, Sep 13, 2012 at 01:48:08AM +0200, Andrea Arcangeli wrote:
> > On Wed, Sep 12, 2012 at 02:20:19PM -0700, Andrew Morton wrote:
> > > OK, I'll slip this in there:
> > > 
> > > --- a/mm/compaction.c~mm-compaction-abort-compaction-loop-if-lock-is-contended-or-run-too-long-fix
> > > +++ a/mm/compaction.c
> > > @@ -909,8 +909,7 @@ static unsigned long compact_zone_order(
> > >  	INIT_LIST_HEAD(&cc.migratepages);
> > >  
> > >  	ret = compact_zone(zone, &cc);
> > > -	if (contended)
> > > -		*contended = cc.contended;
> > > +	*contended = cc.contended;
> > >  	return ret;
> > >  }
> > 
> > Ack the above, thanks.
> > 
> > One more thing, today a bug tripped while building cyanogenmod10 (it
> > swaps despite so much ram) after I added the cc->contended loop break
> > patch. The original version of the fix from Shaohua didn't have this
> > problem because it would only abort compaction if the low_pfn didn't
> > advance and in turn the list would be guaranteed empty.
> 
> Nice catch!
> 
> > 
> > Verifying the list is empty before aborting compaction (which takes a
> > path that ignores the cc->migratelist) should be enough to fix it and
> > it makes it really equivalent to the previous fix. Both cachelines
> > should be cache hot so it should be practically zero cost to check it.
> > 
> > Only lightly tested so far.
> > 
> > ===
> > >From b2a50e49d65596d3920773316ad9b7dd54e4acaf Mon Sep 17 00:00:00 2001
> > From: Andrea Arcangeli <aarcange@redhat.com>
> > Date: Thu, 13 Sep 2012 01:22:03 +0200
> > Subject: [PATCH] mm: compaction: fix leak in cc->contended loop breaking
> >  logic
> > 
> > We cannot return ISOLATE_ABORT when cc->contended is true, if we have
> > some pages already successfully isolated in the cc->migratepages
> > list, or they will be leaked.
> > 
> > The bug was highlighted by a nice VM_BUG_ON in the async compaction in
> > kswapd. So I also added the symmetric VM_BUG_ON to the other caller of
> > the function considering it looks a worthwhile VM_BUG_ON.
> 
> Fair enough.
> 
> > 
> > ------------[ cut here ]------------
> > kernel BUG at mm/compaction.c:934!
> > invalid opcode: 0000 [#1] SMP
> > Modules linked in: tun usbhid kvm_intel xhci_hcd kvm snd_hda_codec_realtek ehci_hcd usbcore snd_hda_intel sn
> > er crc32c_intel psmouse ghash_clmulni_intel sr_mod snd sg cdrom snd_page_alloc usb_common pcspkr [last unloa
> > 
> > CPU 0
> > Pid: 513, comm: kswapd0 Not tainted 3.6.0-rc4+ #17                  /DH61BE
> > RIP: 0010:[<ffffffff8111302c>]  [<ffffffff8111302c>] __compact_pgdat+0x1ac/0x1b0
> > RSP: 0018:ffff880216fa5cb0  EFLAGS: 00010283
> > RAX: 0000000000000003 RBX: ffff880216fa5d00 RCX: 0000000000000002
> > RDX: 00000000000008d7 RSI: 0000000000000002 RDI: ffffffff8195b058
> > RBP: ffffffff8195b000 R08: 0000000000000be4 R09: ffffffff8195a9c0
> > R10: ffffffff8195b400 R11: ffffffff8195b570 R12: 0000000000000001
> > R13: 0000000000000001 R14: ffff880216fa5d10 R15: 0000000000000003
> > FS:  0000000000000000(0000) GS:ffff88021fa00000(0000) knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > CR2: 00007f14d4167000 CR3: 00000000018f1000 CR4: 00000000000407f0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > Process kswapd0 (pid: 513, threadinfo ffff880216fa4000, task ffff880216cfef20)
> > Stack:
> > ffffffff8195a9c0 ffffffff8195b000 0000000000000320 0000000000000003
> > ffffffff8195a9c0 ffffffff8195b640 0000000000000002 0000000000000c80
> > 0000000000000001 ffffffff811132f3 ffff880216fa5d00 ffff880216fa5d00
> > Call Trace:
> > [<ffffffff811132f3>] ? compact_pgdat+0x23/0x30
> > [<ffffffff8110503f>] ? kswapd+0x89f/0xac0
> > [<ffffffff8106f450>] ? wake_up_bit+0x40/0x40
> > [<ffffffff811047a0>] ? shrink_lruvec+0x510/0x510
> > [<ffffffff811047a0>] ? shrink_lruvec+0x510/0x510
> > [<ffffffff8106ef1e>] ? kthread+0x9e/0xb0
> > 
> > Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
> > ---
> >  mm/compaction.c |    6 +++++-
> >  1 files changed, 5 insertions(+), 1 deletions(-)
> > 
> > diff --git a/mm/compaction.c b/mm/compaction.c
> > index 6066a35..0292984 100644
> > --- a/mm/compaction.c
> > +++ b/mm/compaction.c
> > @@ -633,7 +633,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
> >  
> >  	/* Perform the isolation */
> >  	low_pfn = isolate_migratepages_range(zone, cc, low_pfn, end_pfn);
> > -	if (!low_pfn || cc->contended)
> > +	if (!low_pfn || (cc->contended && !cc->nr_migratepages))
> >  		return ISOLATE_ABORT;
> 
> I'm not sure it's best.
> As you mentioned, it's same with first version of Shaohua.
> But it could mitigate the goal of the patch if lock contention or
> need_resched happens in the middle of loop once we isolate a
> migratable page.
> 
> What do you think about this?
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 0fbc6b7..7a009dd 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -848,6 +848,10 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
>                 switch (isolate_migratepages(zone, cc)) {
>                 case ISOLATE_ABORT:
>                         ret = COMPACT_PARTIAL;
> +                       if (!list_empty(&cc->migratepages)) {
> +                               putback_lru_pages(&cc->migratepages);
> +                               cc->nr_migratepages = 0;
> +                       }
>                         goto out;
>                 case ISOLATE_NONE:
>                         continue;
> 

I agree with Minchan. Andrea's patch ignores the fact that free page
isolation might have aborted due to lock contention. It's not necessarily
going to be isolating the pages it needs for migration.

> 
> >  
> >  	cc->migrate_pfn = low_pfn;
> > @@ -843,6 +843,10 @@ static unsigned long compact_zone_order(struct zone *zone,
> >  	INIT_LIST_HEAD(&cc.migratepages);
> >  
> >  	ret = compact_zone(zone, &cc);
> > +
> > +	VM_BUG_ON(!list_empty(&cc.freepages));
> > +	VM_BUG_ON(!list_empty(&cc.migratepages));
> > +
> >  	*contended = cc.contended;
> >  	return ret;
> >  }
> > 

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [patch 1/2 v2]compaction: abort compaction loop if lock is contended or run too long
  2012-09-12 21:20     ` Andrew Morton
  2012-09-12 23:48       ` Andrea Arcangeli
@ 2012-09-13 10:03       ` Mel Gorman
  1 sibling, 0 replies; 18+ messages in thread
From: Mel Gorman @ 2012-09-13 10:03 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Andrea Arcangeli, Shaohua Li, linux-mm, minchan, Richard Davies

On Wed, Sep 12, 2012 at 02:20:19PM -0700, Andrew Morton wrote:
> On Wed, 12 Sep 2012 02:48:40 +0200
> Andrea Arcangeli <aarcange@redhat.com> wrote:
> 
> > On Tue, Sep 11, 2012 at 04:34:55PM -0700, Andrew Morton wrote:
> > > On Mon, 10 Sep 2012 09:18:30 +0800
> > > Shaohua Li <shli@kernel.org> wrote:
> > > 
> > > > isolate_migratepages_range() might isolate none pages, for example, when
> > > > zone->lru_lock is contended and compaction is async. In this case, we should
> > > > abort compaction, otherwise, compact_zone will run a useless loop and make
> > > > zone->lru_lock is even contended.
> > > > 
> > > > ...
> > > >
> > > > @@ -838,12 +838,14 @@ static unsigned long compact_zone_order(
> > > >  		.migratetype = allocflags_to_migratetype(gfp_mask),
> > > >  		.zone = zone,
> > > >  		.sync = sync,
> > > > -		.contended = contended,
> > > >  	};
> > > >  	INIT_LIST_HEAD(&cc.freepages);
> > > >  	INIT_LIST_HEAD(&cc.migratepages);
> > > >  
> > > > -	return compact_zone(zone, &cc);
> > > > +	ret = compact_zone(zone, &cc);
> > > > +	if (contended)
> > > > +		*contended = cc.contended;
> > > > +	return ret;
> > > >  }
> > > >  
> > > 
> > > From a quick read, `contended' is never NULL here.  And defining the
> > 
> > "contended" pointer can be null with some caller so the if is
> > needed. The inner code was checking it before. This is also why we
> > couldn't use *contended for the loop break bugfix, because contended
> > could have been null at times.
> 
> Confused.  I can see only two call sites:
> __alloc_pages_slowpath
> ->__alloc_pages_direct_compact
>   ->try_to_compact_pages
>     ->compact_zone_order
> and in both cases, `contended' points at valid storage.
> 

Andrea encountered out an additional bug as well but I preferred
a variant of Minchan's fix for it. Can you replace patches
mm-compaction-abort-compaction-loop-if-lock-is-contended-or-run-too-long.patch
mm-compaction-abort-compaction-loop-if-lock-is-contended-or-run-too-long-fix.patch
with this version please? FWIW, I've asked Richard Davies (added to cc)
to test with this version of the patch as he is also reporting contention
problems when booting a windows guest under qemu.

I see you already picked up the documentation patch so I'll ignore the
additional "poke poke" in your mail :)

---8<---
From: Shaohua Li <shli@fusionio.com>
Subject: [PATCH] compaction: abort compaction loop if lock is contended or run too long

Changelog since V2
o Fix BUG_ON triggered due to pages left on cc.migratepages
o Make compact_zone_order() require non-NULL arg `contended'

Changelog since V1
o only abort the compaction if lock is contended or run too long
o Rearranged the code by Andrea Arcangeli.

isolate_migratepages_range() might isolate no pages if for example when
zone->lru_lock is contended and running asynchronous compaction. In this
case, we should abort compaction, otherwise, compact_zone will run a
useless loop and make zone->lru_lock is even contended. An additional
check is added to ensure that cc.migratepages and cc.freepages get
properly drained whan compaction is aborted.

[minchan@kernel.org: Putback pages isolated for migration if aborting]
[akpm@linux-foundation.org: compact_zone_order requires non-NULL arg contended]
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Shaohua Li <shli@fusionio.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 mm/compaction.c |   17 ++++++++++++-----
 mm/internal.h   |    2 +-
 2 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 7fcd3a5..a8de20d 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -70,8 +70,7 @@ static bool compact_checklock_irqsave(spinlock_t *lock, unsigned long *flags,
 
 		/* async aborts if taking too long or contended */
 		if (!cc->sync) {
-			if (cc->contended)
-				*cc->contended = true;
+			cc->contended = true;
 			return false;
 		}
 
@@ -634,7 +633,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
 
 	/* Perform the isolation */
 	low_pfn = isolate_migratepages_range(zone, cc, low_pfn, end_pfn);
-	if (!low_pfn)
+	if (!low_pfn || cc->contended)
 		return ISOLATE_ABORT;
 
 	cc->migrate_pfn = low_pfn;
@@ -787,6 +786,8 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
 		switch (isolate_migratepages(zone, cc)) {
 		case ISOLATE_ABORT:
 			ret = COMPACT_PARTIAL;
+			putback_lru_pages(&cc->migratepages);
+			cc->nr_migratepages = 0;
 			goto out;
 		case ISOLATE_NONE:
 			continue;
@@ -831,6 +832,7 @@ static unsigned long compact_zone_order(struct zone *zone,
 				 int order, gfp_t gfp_mask,
 				 bool sync, bool *contended)
 {
+	unsigned long ret;
 	struct compact_control cc = {
 		.nr_freepages = 0,
 		.nr_migratepages = 0,
@@ -838,12 +840,17 @@ static unsigned long compact_zone_order(struct zone *zone,
 		.migratetype = allocflags_to_migratetype(gfp_mask),
 		.zone = zone,
 		.sync = sync,
-		.contended = contended,
 	};
 	INIT_LIST_HEAD(&cc.freepages);
 	INIT_LIST_HEAD(&cc.migratepages);
 
-	return compact_zone(zone, &cc);
+	ret = compact_zone(zone, &cc);
+
+	VM_BUG_ON(!list_empty(&cc.freepages));
+	VM_BUG_ON(!list_empty(&cc.migratepages));
+
+	*contended = cc.contended;
+	return ret;
 }
 
 int sysctl_extfrag_threshold = 500;
diff --git a/mm/internal.h b/mm/internal.h
index b8c91b3..4bd7c0e 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -130,7 +130,7 @@ struct compact_control {
 	int order;			/* order a direct compactor needs */
 	int migratetype;		/* MOVABLE, RECLAIMABLE etc */
 	struct zone *zone;
-	bool *contended;		/* True if a lock was contended */
+	bool contended;			/* True if a lock was contended */
 };
 
 unsigned long

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [patch 1/2 v2]compaction: abort compaction loop if lock is contended or run too long
  2012-09-13  9:38           ` Mel Gorman
@ 2012-09-13 10:13             ` Shaohua Li
  2012-09-13 16:04             ` Andrea Arcangeli
  1 sibling, 0 replies; 18+ messages in thread
From: Shaohua Li @ 2012-09-13 10:13 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Minchan Kim, Andrea Arcangeli, Andrew Morton, linux-mm

On Thu, Sep 13, 2012 at 10:38:26AM +0100, Mel Gorman wrote:
> On Thu, Sep 13, 2012 at 09:47:22AM +0900, Minchan Kim wrote:
> > Hi Andrea,
> > 
> > On Thu, Sep 13, 2012 at 01:48:08AM +0200, Andrea Arcangeli wrote:
> > > On Wed, Sep 12, 2012 at 02:20:19PM -0700, Andrew Morton wrote:
> > > > OK, I'll slip this in there:
> > > > 
> > > > --- a/mm/compaction.c~mm-compaction-abort-compaction-loop-if-lock-is-contended-or-run-too-long-fix
> > > > +++ a/mm/compaction.c
> > > > @@ -909,8 +909,7 @@ static unsigned long compact_zone_order(
> > > >  	INIT_LIST_HEAD(&cc.migratepages);
> > > >  
> > > >  	ret = compact_zone(zone, &cc);
> > > > -	if (contended)
> > > > -		*contended = cc.contended;
> > > > +	*contended = cc.contended;
> > > >  	return ret;
> > > >  }
> > > 
> > > Ack the above, thanks.
> > > 
> > > One more thing, today a bug tripped while building cyanogenmod10 (it
> > > swaps despite so much ram) after I added the cc->contended loop break
> > > patch. The original version of the fix from Shaohua didn't have this
> > > problem because it would only abort compaction if the low_pfn didn't
> > > advance and in turn the list would be guaranteed empty.
> > 
> > Nice catch!
> > 
> > > 
> > > Verifying the list is empty before aborting compaction (which takes a
> > > path that ignores the cc->migratelist) should be enough to fix it and
> > > it makes it really equivalent to the previous fix. Both cachelines
> > > should be cache hot so it should be practically zero cost to check it.
> > > 
> > > Only lightly tested so far.
> > > 
> > > ===
> > > >From b2a50e49d65596d3920773316ad9b7dd54e4acaf Mon Sep 17 00:00:00 2001
> > > From: Andrea Arcangeli <aarcange@redhat.com>
> > > Date: Thu, 13 Sep 2012 01:22:03 +0200
> > > Subject: [PATCH] mm: compaction: fix leak in cc->contended loop breaking
> > >  logic
> > > 
> > > We cannot return ISOLATE_ABORT when cc->contended is true, if we have
> > > some pages already successfully isolated in the cc->migratepages
> > > list, or they will be leaked.
> > > 
> > > The bug was highlighted by a nice VM_BUG_ON in the async compaction in
> > > kswapd. So I also added the symmetric VM_BUG_ON to the other caller of
> > > the function considering it looks a worthwhile VM_BUG_ON.
> > 
> > Fair enough.
> > 
> > > 
> > > ------------[ cut here ]------------
> > > kernel BUG at mm/compaction.c:934!
> > > invalid opcode: 0000 [#1] SMP
> > > Modules linked in: tun usbhid kvm_intel xhci_hcd kvm snd_hda_codec_realtek ehci_hcd usbcore snd_hda_intel sn
> > > er crc32c_intel psmouse ghash_clmulni_intel sr_mod snd sg cdrom snd_page_alloc usb_common pcspkr [last unloa
> > > 
> > > CPU 0
> > > Pid: 513, comm: kswapd0 Not tainted 3.6.0-rc4+ #17                  /DH61BE
> > > RIP: 0010:[<ffffffff8111302c>]  [<ffffffff8111302c>] __compact_pgdat+0x1ac/0x1b0
> > > RSP: 0018:ffff880216fa5cb0  EFLAGS: 00010283
> > > RAX: 0000000000000003 RBX: ffff880216fa5d00 RCX: 0000000000000002
> > > RDX: 00000000000008d7 RSI: 0000000000000002 RDI: ffffffff8195b058
> > > RBP: ffffffff8195b000 R08: 0000000000000be4 R09: ffffffff8195a9c0
> > > R10: ffffffff8195b400 R11: ffffffff8195b570 R12: 0000000000000001
> > > R13: 0000000000000001 R14: ffff880216fa5d10 R15: 0000000000000003
> > > FS:  0000000000000000(0000) GS:ffff88021fa00000(0000) knlGS:0000000000000000
> > > CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > > CR2: 00007f14d4167000 CR3: 00000000018f1000 CR4: 00000000000407f0
> > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > > Process kswapd0 (pid: 513, threadinfo ffff880216fa4000, task ffff880216cfef20)
> > > Stack:
> > > ffffffff8195a9c0 ffffffff8195b000 0000000000000320 0000000000000003
> > > ffffffff8195a9c0 ffffffff8195b640 0000000000000002 0000000000000c80
> > > 0000000000000001 ffffffff811132f3 ffff880216fa5d00 ffff880216fa5d00
> > > Call Trace:
> > > [<ffffffff811132f3>] ? compact_pgdat+0x23/0x30
> > > [<ffffffff8110503f>] ? kswapd+0x89f/0xac0
> > > [<ffffffff8106f450>] ? wake_up_bit+0x40/0x40
> > > [<ffffffff811047a0>] ? shrink_lruvec+0x510/0x510
> > > [<ffffffff811047a0>] ? shrink_lruvec+0x510/0x510
> > > [<ffffffff8106ef1e>] ? kthread+0x9e/0xb0
> > > 
> > > Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
> > > ---
> > >  mm/compaction.c |    6 +++++-
> > >  1 files changed, 5 insertions(+), 1 deletions(-)
> > > 
> > > diff --git a/mm/compaction.c b/mm/compaction.c
> > > index 6066a35..0292984 100644
> > > --- a/mm/compaction.c
> > > +++ b/mm/compaction.c
> > > @@ -633,7 +633,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
> > >  
> > >  	/* Perform the isolation */
> > >  	low_pfn = isolate_migratepages_range(zone, cc, low_pfn, end_pfn);
> > > -	if (!low_pfn || cc->contended)
> > > +	if (!low_pfn || (cc->contended && !cc->nr_migratepages))
> > >  		return ISOLATE_ABORT;
> > 
> > I'm not sure it's best.
> > As you mentioned, it's same with first version of Shaohua.
> > But it could mitigate the goal of the patch if lock contention or
> > need_resched happens in the middle of loop once we isolate a
> > migratable page.
> > 
> > What do you think about this?
> > 
> > diff --git a/mm/compaction.c b/mm/compaction.c
> > index 0fbc6b7..7a009dd 100644
> > --- a/mm/compaction.c
> > +++ b/mm/compaction.c
> > @@ -848,6 +848,10 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
> >                 switch (isolate_migratepages(zone, cc)) {
> >                 case ISOLATE_ABORT:
> >                         ret = COMPACT_PARTIAL;
> > +                       if (!list_empty(&cc->migratepages)) {
> > +                               putback_lru_pages(&cc->migratepages);
> > +                               cc->nr_migratepages = 0;
> > +                       }
> >                         goto out;
> >                 case ISOLATE_NONE:
> >                         continue;
> > 
> 
> I agree with Minchan. Andrea's patch ignores the fact that free page
> isolation might have aborted due to lock contention. It's not necessarily
> going to be isolating the pages it needs for migration.

So it's the free page isolation abort, I misunderstood Minchan's point. Thanks
for clarifying. 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [patch 1/2 v2]compaction: abort compaction loop if lock is contended or run too long
  2012-09-13  9:38           ` Mel Gorman
  2012-09-13 10:13             ` Shaohua Li
@ 2012-09-13 16:04             ` Andrea Arcangeli
  2012-09-13 16:31               ` Mel Gorman
  1 sibling, 1 reply; 18+ messages in thread
From: Andrea Arcangeli @ 2012-09-13 16:04 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Minchan Kim, Andrew Morton, Shaohua Li, linux-mm

On Thu, Sep 13, 2012 at 10:38:26AM +0100, Mel Gorman wrote:
> I agree with Minchan. Andrea's patch ignores the fact that free page
> isolation might have aborted due to lock contention. It's not necessarily
> going to be isolating the pages it needs for migration.

Actually I thought of calling putback_lru_pages first, but then I
thought it was better to just complete the current slice.

Note that putback_lru_pages can take the lru_lock immediately too when
the pagevec gets full which won't work any better than if the
cc->contended was set by the freepages isolation and we do
migrate_pages.

There's no way to abort lockless from that point, so I think it's
better to take the last locks to finish the current slice of work and
then abort if it's still contended (which confirms we're really
trashing).

Skipping isolated pages without rewinding low_pfn would also reduce
compaction reliability so that should be evaluated as well. And
rewinding with the putback_lru_pages would risk livelocks.

I agree Minchan's patch would fix the problem too, and this should be
a fairly uncommon path so either ways shouldn't make a noticeable
difference.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [patch 1/2 v2]compaction: abort compaction loop if lock is contended or run too long
  2012-09-13 16:04             ` Andrea Arcangeli
@ 2012-09-13 16:31               ` Mel Gorman
  2012-09-13 17:11                 ` Andrea Arcangeli
  0 siblings, 1 reply; 18+ messages in thread
From: Mel Gorman @ 2012-09-13 16:31 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Minchan Kim, Andrew Morton, Shaohua Li, linux-mm

On Thu, Sep 13, 2012 at 06:04:32PM +0200, Andrea Arcangeli wrote:
> On Thu, Sep 13, 2012 at 10:38:26AM +0100, Mel Gorman wrote:
> > I agree with Minchan. Andrea's patch ignores the fact that free page
> > isolation might have aborted due to lock contention. It's not necessarily
> > going to be isolating the pages it needs for migration.
> 
> Actually I thought of calling putback_lru_pages first, but then I
> thought it was better to just complete the current slice.
> 

Unfortunately that will end up calling compaction_alloc() ->
isolate_freepages and probably end up contending again. 

> Note that putback_lru_pages can take the lru_lock immediately too when

True, but in that case there is no choice in the matter. We can't just
leak the pages.

> the pagevec gets full which won't work any better than if the
> cc->contended was set by the freepages isolation and we do
> migrate_pages.
> 
> There's no way to abort lockless from that point, so I think it's
> better to take the last locks to finish the current slice of work and
> then abort if it's still contended (which confirms we're really
> trashing).
> 

To me, that will just contend more than we have to. We're aborting compaction
and finishing off the current slice will not make any meaningful difference
to whether tha allocation succeeds or not.

> Skipping isolated pages without rewinding low_pfn would also reduce
> compaction reliability so that should be evaluated as well. And
> rewinding with the putback_lru_pages would risk livelocks.
> 
> I agree Minchan's patch would fix the problem too, and this should be
> a fairly uncommon path so either ways shouldn't make a noticeable
> difference.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [patch 1/2 v2]compaction: abort compaction loop if lock is contended or run too long
  2012-09-13 16:31               ` Mel Gorman
@ 2012-09-13 17:11                 ` Andrea Arcangeli
  0 siblings, 0 replies; 18+ messages in thread
From: Andrea Arcangeli @ 2012-09-13 17:11 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Minchan Kim, Andrew Morton, Shaohua Li, linux-mm

On Thu, Sep 13, 2012 at 05:31:51PM +0100, Mel Gorman wrote:
> On Thu, Sep 13, 2012 at 06:04:32PM +0200, Andrea Arcangeli wrote:
> > On Thu, Sep 13, 2012 at 10:38:26AM +0100, Mel Gorman wrote:
> > > I agree with Minchan. Andrea's patch ignores the fact that free page
> > > isolation might have aborted due to lock contention. It's not necessarily
> > > going to be isolating the pages it needs for migration.
> > 
> > Actually I thought of calling putback_lru_pages first, but then I
> > thought it was better to just complete the current slice.
> > 
> 
> Unfortunately that will end up calling compaction_alloc() ->
> isolate_freepages and probably end up contending again. 
> 
> > Note that putback_lru_pages can take the lru_lock immediately too when
> 
> True, but in that case there is no choice in the matter. We can't just
> leak the pages.

This is why in that case (if the contention was generated by the
lru_lock) we would be better off to go ahead and do migrate_pages.

We could track contended_lru_lock and contended_zone_lock separately
to know if it's that case or not, but then I doubt it matters that much.

> To me, that will just contend more than we have to. We're aborting compaction
> and finishing off the current slice will not make any meaningful difference
> to whether tha allocation succeeds or not.

If you prefer the putback_lru_pages I'm fine, I only wanted to clarify
neither of the two solutions is going to do the optimal thing at all
times.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2012-09-13 17:11 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-09-10  1:18 [patch 1/2 v2]compaction: abort compaction loop if lock is contended or run too long Shaohua Li
2012-09-10  8:11 ` Mel Gorman
2012-09-11  1:45 ` Minchan Kim
2012-09-11  8:29   ` Shaohua Li
2012-09-11  8:40     ` Minchan Kim
2012-09-11 23:34 ` Andrew Morton
2012-09-12  0:48   ` Andrea Arcangeli
2012-09-12 21:20     ` Andrew Morton
2012-09-12 23:48       ` Andrea Arcangeli
2012-09-13  0:47         ` Minchan Kim
2012-09-13  2:49           ` Shaohua Li
2012-09-13  9:38           ` Mel Gorman
2012-09-13 10:13             ` Shaohua Li
2012-09-13 16:04             ` Andrea Arcangeli
2012-09-13 16:31               ` Mel Gorman
2012-09-13 17:11                 ` Andrea Arcangeli
2012-09-13 10:03       ` Mel Gorman
2012-09-12 11:08   ` Mel Gorman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).