* [patch 1/2]compaction: check migrated page number @ 2012-09-06 10:44 Shaohua Li 2012-09-06 12:17 ` Mel Gorman 0 siblings, 1 reply; 7+ messages in thread From: Shaohua Li @ 2012-09-06 10:44 UTC (permalink / raw) To: linux-mm; +Cc: akpm, mgorman, aarcange isolate_migratepages_range() might isolate none pages, for example, when zone->lru_lock is contended and compaction is async. In this case, we should abort compaction, otherwise, compact_zone will run a useless loop and make zone->lru_lock is even contended. Signed-off-by: Shaohua Li <shli@fusionio.com> --- mm/compaction.c | 2 ++ 1 file changed, 2 insertions(+) Index: linux/mm/compaction.c =================================================================== --- linux.orig/mm/compaction.c 2012-08-22 09:51:39.295322268 +0800 +++ linux/mm/compaction.c 2012-09-06 14:46:13.923144263 +0800 @@ -402,6 +402,8 @@ isolate_migratepages_range(struct zone * trace_mm_compaction_isolate_migratepages(nr_scanned, nr_isolated); + if (!nr_isolated) + return 0; return low_pfn; } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [patch 1/2]compaction: check migrated page number 2012-09-06 10:44 [patch 1/2]compaction: check migrated page number Shaohua Li @ 2012-09-06 12:17 ` Mel Gorman 2012-09-06 12:55 ` Shaohua Li 0 siblings, 1 reply; 7+ messages in thread From: Mel Gorman @ 2012-09-06 12:17 UTC (permalink / raw) To: Shaohua Li; +Cc: linux-mm, akpm, aarcange On Thu, Sep 06, 2012 at 06:44:04PM +0800, Shaohua Li wrote: > > isolate_migratepages_range() might isolate none pages, for example, when > zone->lru_lock is contended and compaction is async. In this case, we should > abort compaction, otherwise, compact_zone will run a useless loop and make > zone->lru_lock is even contended. > It might also isolate no pages because the range was 100% allocated and there were no free pages to isolate. This is perfectly normal and I suspect this patch effectively disables compaction. What problem did you observe that this patch is aimed at? -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [patch 1/2]compaction: check migrated page number 2012-09-06 12:17 ` Mel Gorman @ 2012-09-06 12:55 ` Shaohua Li 2012-09-06 13:25 ` Mel Gorman 0 siblings, 1 reply; 7+ messages in thread From: Shaohua Li @ 2012-09-06 12:55 UTC (permalink / raw) To: Mel Gorman; +Cc: linux-mm, akpm, aarcange On Thu, Sep 06, 2012 at 01:17:25PM +0100, Mel Gorman wrote: > On Thu, Sep 06, 2012 at 06:44:04PM +0800, Shaohua Li wrote: > > > > isolate_migratepages_range() might isolate none pages, for example, when > > zone->lru_lock is contended and compaction is async. In this case, we should > > abort compaction, otherwise, compact_zone will run a useless loop and make > > zone->lru_lock is even contended. > > > > It might also isolate no pages because the range was 100% allocated and > there were no free pages to isolate. This is perfectly normal and I suspect > this patch effectively disables compaction. What problem did you observe > that this patch is aimed at? I'm running a random swapin/out workload. When memory is fragmented enough, I saw 100% cpu usage. perf shows zone->lru_lock is heavily contended in isolate_migratepages_range. I'm using slub(I didn't see the problem with slab), the allocation is for radix_tree_node slab, which needs 4 pages. Even If I just apply the second patch, the system is still in 100% cpu usage. The spin_is_contended check can't cure the problem completely. Trace shows compact_zone will run a useless loop and each loop contend the lru_lock. With this patch, the cpu usage becomes normal (about 20% utilization). -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [patch 1/2]compaction: check migrated page number 2012-09-06 12:55 ` Shaohua Li @ 2012-09-06 13:25 ` Mel Gorman 2012-09-07 4:12 ` Shaohua Li 0 siblings, 1 reply; 7+ messages in thread From: Mel Gorman @ 2012-09-06 13:25 UTC (permalink / raw) To: Shaohua Li; +Cc: linux-mm, akpm, aarcange On Thu, Sep 06, 2012 at 08:55:26PM +0800, Shaohua Li wrote: > On Thu, Sep 06, 2012 at 01:17:25PM +0100, Mel Gorman wrote: > > On Thu, Sep 06, 2012 at 06:44:04PM +0800, Shaohua Li wrote: > > > > > > isolate_migratepages_range() might isolate none pages, for example, when > > > zone->lru_lock is contended and compaction is async. In this case, we should > > > abort compaction, otherwise, compact_zone will run a useless loop and make > > > zone->lru_lock is even contended. > > > > > > > It might also isolate no pages because the range was 100% allocated and > > there were no free pages to isolate. This is perfectly normal and I suspect > > this patch effectively disables compaction. What problem did you observe > > that this patch is aimed at? > > I'm running a random swapin/out workload. When memory is fragmented enough, I > saw 100% cpu usage. perf shows zone->lru_lock is heavily contended in > isolate_migratepages_range. I'm using slub(I didn't see the problem with slab), > the allocation is for radix_tree_node slab, which needs 4 pages. Ok, the fragmentaiton is due to high-order unmovable kernel allocations from SLUB which will have diminishing returns over time. One option to address this is to check if it's a high-order kernel allocation that can fail and not compact in that case. SLUB will fall back to using order-0 instead. > Even If I just > apply the second patch, the system is still in 100% cpu usage. The > spin_is_contended check can't cure the problem completely. Are you sure it's really contention in that case and not just a lot of time is spent in compaction trying to satisfy the radix_tree_node allocation requests? > Trace shows > compact_zone will run a useless loop and each loop contend the lru_lock. With > this patch, the cpu usage becomes normal (about 20% utilization). I suspect the reason why this patch has an effect is because compaction is no longer running. It finds a 100% full pageblock quickly and then aborts and that is not the right fix. Can you try something like this instead please? diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 8f6eea3..bd5bd6d 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2114,6 +2114,10 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order, return NULL; } + /* Do not compact for high-order kernel allocations that can fail */ + if ((gfp_mask & (__GFP_NORETRY | __GFP_MOVABLE)) == __GFP_NORETRY) + return NULL; + current->flags |= PF_MEMALLOC; *did_some_progress = try_to_compact_pages(zonelist, order, gfp_mask, nodemask, sync_migration, -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [patch 1/2]compaction: check migrated page number 2012-09-06 13:25 ` Mel Gorman @ 2012-09-07 4:12 ` Shaohua Li 2012-09-07 15:52 ` Andrea Arcangeli 0 siblings, 1 reply; 7+ messages in thread From: Shaohua Li @ 2012-09-07 4:12 UTC (permalink / raw) To: Mel Gorman; +Cc: linux-mm, akpm, aarcange On Thu, Sep 06, 2012 at 02:25:51PM +0100, Mel Gorman wrote: > On Thu, Sep 06, 2012 at 08:55:26PM +0800, Shaohua Li wrote: > > On Thu, Sep 06, 2012 at 01:17:25PM +0100, Mel Gorman wrote: > > > On Thu, Sep 06, 2012 at 06:44:04PM +0800, Shaohua Li wrote: > > > > > > > > isolate_migratepages_range() might isolate none pages, for example, when > > > > zone->lru_lock is contended and compaction is async. In this case, we should > > > > abort compaction, otherwise, compact_zone will run a useless loop and make > > > > zone->lru_lock is even contended. > > > > > > > > > > It might also isolate no pages because the range was 100% allocated and > > > there were no free pages to isolate. This is perfectly normal and I suspect > > > this patch effectively disables compaction. What problem did you observe > > > that this patch is aimed at? > > > > I'm running a random swapin/out workload. When memory is fragmented enough, I > > saw 100% cpu usage. perf shows zone->lru_lock is heavily contended in > > isolate_migratepages_range. I'm using slub(I didn't see the problem with slab), > > the allocation is for radix_tree_node slab, which needs 4 pages. > > Ok, the fragmentaiton is due to high-order unmovable kernel allocations from > SLUB which will have diminishing returns over time. One option to address > this is to check if it's a high-order kernel allocation that can fail and > not compact in that case. SLUB will fall back to using order-0 instead. I tried actually, and it doesn't help. The problem is compact_zone keeps running isolate_migratepages_range, which does nothing except doing a lock/unlock. > > Even If I just > > apply the second patch, the system is still in 100% cpu usage. The > > spin_is_contended check can't cure the problem completely. > > Are you sure it's really contention in that case and not just a lot of > time is spent in compaction trying to satisfy the radix_tree_node > allocation requests? certainly it's the contention. > > Trace shows > > compact_zone will run a useless loop and each loop contend the lru_lock. With > > this patch, the cpu usage becomes normal (about 20% utilization). > > I suspect the reason why this patch has an effect is because compaction is > no longer running. It finds a 100% full pageblock quickly and then aborts and > that is not the right fix. Can you try something like this instead please? That debug patch doesn't help. My system just hang. I thought your worry is valid, we shouldn't abort if 100% full pageblock is found. How about this one? With it, the cpu usage is normal in my workload. Occassionally I saw cpu usage reaches high (up to 80%), but recovered immediately. Without the patch, the cpu usage keeps in 100%. Thanks, Shaohua Subject: compaction: check migrated page number isolate_migratepages_range() might isolate none pages, for example, when zone->lru_lock is contended and compaction is async. In this case, we should abort compaction, otherwise, compact_zone will run a useless loop and make zone->lru_lock is even contended. Signed-off-by: Shaohua Li <shli@fusionio.com> --- mm/compaction.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) Index: linux/mm/compaction.c =================================================================== --- linux.orig/mm/compaction.c 2012-09-06 18:37:52.636413761 +0800 +++ linux/mm/compaction.c 2012-09-07 10:51:16.734081959 +0800 @@ -618,7 +618,7 @@ typedef enum { static isolate_migrate_t isolate_migratepages(struct zone *zone, struct compact_control *cc) { - unsigned long low_pfn, end_pfn; + unsigned long low_pfn, end_pfn, old_low_pfn; /* Do not scan outside zone boundaries */ low_pfn = max(cc->migrate_pfn, zone->zone_start_pfn); @@ -633,8 +633,9 @@ static isolate_migrate_t isolate_migrate } /* Perform the isolation */ + old_low_pfn = low_pfn; low_pfn = isolate_migratepages_range(zone, cc, low_pfn, end_pfn); - if (!low_pfn) + if (!low_pfn || old_low_pfn == low_pfn) return ISOLATE_ABORT; cc->migrate_pfn = low_pfn; -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [patch 1/2]compaction: check migrated page number 2012-09-07 4:12 ` Shaohua Li @ 2012-09-07 15:52 ` Andrea Arcangeli 2012-09-10 1:05 ` Shaohua Li 0 siblings, 1 reply; 7+ messages in thread From: Andrea Arcangeli @ 2012-09-07 15:52 UTC (permalink / raw) To: Shaohua Li; +Cc: Mel Gorman, linux-mm, akpm On Fri, Sep 07, 2012 at 12:12:12PM +0800, Shaohua Li wrote: > Subject: compaction: check migrated page number > > isolate_migratepages_range() might isolate none pages, for example, when > zone->lru_lock is contended and compaction is async. In this case, we should > abort compaction, otherwise, compact_zone will run a useless loop and make > zone->lru_lock is even contended. > > Signed-off-by: Shaohua Li <shli@fusionio.com> > --- > mm/compaction.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > Index: linux/mm/compaction.c > =================================================================== > --- linux.orig/mm/compaction.c 2012-09-06 18:37:52.636413761 +0800 > +++ linux/mm/compaction.c 2012-09-07 10:51:16.734081959 +0800 > @@ -618,7 +618,7 @@ typedef enum { > static isolate_migrate_t isolate_migratepages(struct zone *zone, > struct compact_control *cc) > { > - unsigned long low_pfn, end_pfn; > + unsigned long low_pfn, end_pfn, old_low_pfn; > > /* Do not scan outside zone boundaries */ > low_pfn = max(cc->migrate_pfn, zone->zone_start_pfn); > @@ -633,8 +633,9 @@ static isolate_migrate_t isolate_migrate > } > > /* Perform the isolation */ > + old_low_pfn = low_pfn; > low_pfn = isolate_migratepages_range(zone, cc, low_pfn, end_pfn); > - if (!low_pfn) > + if (!low_pfn || old_low_pfn == low_pfn) > return ISOLATE_ABORT; > > cc->migrate_pfn = low_pfn; Looks good to me. This other below approach should also work: diff --git a/mm/compaction.c b/mm/compaction.c index 7fcd3a5..aefb712 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -70,8 +70,7 @@ static bool compact_checklock_irqsave(spinlock_t *lock, unsigned long *flags, /* async aborts if taking too long or contended */ if (!cc->sync) { - if (cc->contended) - *cc->contended = true; + cc->contended = true; return false; } @@ -634,7 +633,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone, /* Perform the isolation */ low_pfn = isolate_migratepages_range(zone, cc, low_pfn, end_pfn); - if (!low_pfn) + if (!low_pfn || cc->contended) return ISOLATE_ABORT; cc->migrate_pfn = low_pfn; @@ -831,6 +830,7 @@ static unsigned long compact_zone_order(struct zone *zone, int order, gfp_t gfp_mask, bool sync, bool *contended) { + unsigned long ret; struct compact_control cc = { .nr_freepages = 0, .nr_migratepages = 0, @@ -838,12 +838,14 @@ static unsigned long compact_zone_order(struct zone *zone, .migratetype = allocflags_to_migratetype(gfp_mask), .zone = zone, .sync = sync, - .contended = contended, }; INIT_LIST_HEAD(&cc.freepages); INIT_LIST_HEAD(&cc.migratepages); - return compact_zone(zone, &cc); + ret = compact_zone(zone, &cc); + if (contended) + *contended = cc.contended; + return ret; } int sysctl_extfrag_threshold = 500; diff --git a/mm/internal.h b/mm/internal.h index 53418cd..dbb32ff 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -130,7 +130,7 @@ struct compact_control { int order; /* order a direct compactor needs */ int migratetype; /* MOVABLE, RECLAIMABLE etc */ struct zone *zone; - bool *contended; /* True if a lock was contended */ + bool contended; /* True if a lock was contended */ }; unsigned long -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [patch 1/2]compaction: check migrated page number 2012-09-07 15:52 ` Andrea Arcangeli @ 2012-09-10 1:05 ` Shaohua Li 0 siblings, 0 replies; 7+ messages in thread From: Shaohua Li @ 2012-09-10 1:05 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Mel Gorman, linux-mm, akpm On Fri, Sep 07, 2012 at 05:52:43PM +0200, Andrea Arcangeli wrote: > On Fri, Sep 07, 2012 at 12:12:12PM +0800, Shaohua Li wrote: > > Subject: compaction: check migrated page number > > > > isolate_migratepages_range() might isolate none pages, for example, when > > zone->lru_lock is contended and compaction is async. In this case, we should > > abort compaction, otherwise, compact_zone will run a useless loop and make > > zone->lru_lock is even contended. > > > > Signed-off-by: Shaohua Li <shli@fusionio.com> > > --- > > mm/compaction.c | 5 +++-- > > 1 file changed, 3 insertions(+), 2 deletions(-) > > > > Index: linux/mm/compaction.c > > =================================================================== > > --- linux.orig/mm/compaction.c 2012-09-06 18:37:52.636413761 +0800 > > +++ linux/mm/compaction.c 2012-09-07 10:51:16.734081959 +0800 > > @@ -618,7 +618,7 @@ typedef enum { > > static isolate_migrate_t isolate_migratepages(struct zone *zone, > > struct compact_control *cc) > > { > > - unsigned long low_pfn, end_pfn; > > + unsigned long low_pfn, end_pfn, old_low_pfn; > > > > /* Do not scan outside zone boundaries */ > > low_pfn = max(cc->migrate_pfn, zone->zone_start_pfn); > > @@ -633,8 +633,9 @@ static isolate_migrate_t isolate_migrate > > } > > > > /* Perform the isolation */ > > + old_low_pfn = low_pfn; > > low_pfn = isolate_migratepages_range(zone, cc, low_pfn, end_pfn); > > - if (!low_pfn) > > + if (!low_pfn || old_low_pfn == low_pfn) > > return ISOLATE_ABORT; > > > > cc->migrate_pfn = low_pfn; > > Looks good to me. > > This other below approach should also work: Yep, the logic is the same. But your code looks prettier, thanks! I'll send a formal path. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2012-09-10 1:05 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-09-06 10:44 [patch 1/2]compaction: check migrated page number Shaohua Li 2012-09-06 12:17 ` Mel Gorman 2012-09-06 12:55 ` Shaohua Li 2012-09-06 13:25 ` Mel Gorman 2012-09-07 4:12 ` Shaohua Li 2012-09-07 15:52 ` Andrea Arcangeli 2012-09-10 1:05 ` Shaohua Li
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).