From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx169.postini.com [74.125.245.169]) by kanga.kvack.org (Postfix) with SMTP id 9D0FC6B0309 for ; Mon, 25 Jun 2012 00:59:30 -0400 (EDT) From: Minchan Kim Subject: [PATCH 0/2] fix livelock because of kswapd stop Date: Mon, 25 Jun 2012 13:59:25 +0900 Message-Id: <1340600367-23620-1-git-send-email-minchan@kernel.org> Sender: owner-linux-mm@kvack.org List-ID: To: akpm@linux-foundation.org Cc: KOSAKI Motohiro , Mel Gorman , KAMEZAWA Hiroyuki , Aaditya Kumar , LKML , linux-mm , Minchan Kim When hotplug offlining happens on zone A, it starts to mark freed page as MIGRATE_ISOLATE type in buddy for preventing further allocation. (MIGRATE_ISOLATE is very irony type because it's apparently on buddy but we can't allocate them). When the memory shortage happens during hotplug offlining, current task starts to reclaim, then wake up kswapd. Kswapd checks watermark, then go sleep because current zone_watermark_ok_safe doesn't consider MIGRATE_ISOLATE freed page count. Current task continue to reclaim in direct reclaim path without kswapd's helping. The problem is that zone->all_unreclaimable is set by only kswapd so that current task would be looping forever like below. __alloc_pages_slowpath restart: wake_all_kswapd rebalance: __alloc_pages_direct_reclaim do_try_to_free_pages if global_reclaim && !all_unreclaimable return 1; /* It means we did did_some_progress */ skip __alloc_pages_may_oom should_alloc_retry goto rebalance; [1/2] factor out memory-isolation functions from page_alloc.c to mm/page_isolation.c This patch can be merged regardless of [2/2]. [2/2] fix this problem. Aaditya, Could you confirm this patch can solve your problem? Minchan Kim (2): mm: Factor out memory isolate functions memory-hotplug: fix kswapd looping forever problem drivers/base/Kconfig | 1 + include/linux/mmzone.h | 8 +++ include/linux/page-isolation.h | 8 +-- mm/Kconfig | 5 ++ mm/Makefile | 4 +- mm/page_alloc.c | 107 +++++++++++++------------------------- mm/page_isolation.c | 110 ++++++++++++++++++++++++++++++++++++++++ 7 files changed, 166 insertions(+), 77 deletions(-) -- 1.7.9.5 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx169.postini.com [74.125.245.169]) by kanga.kvack.org (Postfix) with SMTP id 529C46B030A for ; Mon, 25 Jun 2012 00:59:32 -0400 (EDT) From: Minchan Kim Subject: [PATCH 1/2] mm: Factor out memory isolate functions Date: Mon, 25 Jun 2012 13:59:26 +0900 Message-Id: <1340600367-23620-2-git-send-email-minchan@kernel.org> In-Reply-To: <1340600367-23620-1-git-send-email-minchan@kernel.org> References: <1340600367-23620-1-git-send-email-minchan@kernel.org> Sender: owner-linux-mm@kvack.org List-ID: To: akpm@linux-foundation.org Cc: KOSAKI Motohiro , Mel Gorman , KAMEZAWA Hiroyuki , Aaditya Kumar , LKML , linux-mm , Minchan Kim , Andi Kleen , Marek Szyprowski Now mm/page_alloc.c has some memory isolation functions but they are used oly when we enable CONFIG_{CMA|MEMORY_HOTPLUG|MEMORY_FAILURE}. So let's make it configurable by new CONFIG_MEMORY_ISOLATION so that it can reduce binary size and we can check it simple by CONFIG_MEMORY_ISOLATION, not if defined CONFIG_{CMA|MEMORY_HOTPLUG|MEMORY_FAILURE}. This patch is based on next-20120621. Cc: Andi Kleen Cc: Marek Szyprowski Cc: KAMEZAWA Hiroyuki Signed-off-by: Minchan Kim --- Andrew, I know this patch is conflict with current mmotm totally but I don't know what tree I should use. I hope it's a git tree instead of quit-based version. Anyway, If you feel it's inconvenient, let me know it and I will resend it on tree you mention. Thanks. drivers/base/Kconfig | 1 + include/linux/page-isolation.h | 8 ++--- mm/Kconfig | 5 +++ mm/Makefile | 4 +-- mm/page_alloc.c | 71 ---------------------------------------- mm/page_isolation.c | 71 ++++++++++++++++++++++++++++++++++++++++ 6 files changed, 83 insertions(+), 77 deletions(-) diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig index 9b21469..08b4c52 100644 --- a/drivers/base/Kconfig +++ b/drivers/base/Kconfig @@ -196,6 +196,7 @@ config CMA bool "Contiguous Memory Allocator (EXPERIMENTAL)" depends on HAVE_DMA_CONTIGUOUS && HAVE_MEMBLOCK && EXPERIMENTAL select MIGRATION + select MEMORY_ISOLATION help This enables the Contiguous Memory Allocator which allows drivers to allocate big physically-contiguous blocks of memory for use with diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h index 3bdcab3..0569a3e 100644 --- a/include/linux/page-isolation.h +++ b/include/linux/page-isolation.h @@ -10,7 +10,7 @@ * free all pages in the range. test_page_isolated() can be used for * test it. */ -extern int +int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn, unsigned migratetype); @@ -18,7 +18,7 @@ start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn, * Changes MIGRATE_ISOLATE to MIGRATE_MOVABLE. * target range is [start_pfn, end_pfn) */ -extern int +int undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn, unsigned migratetype); @@ -30,8 +30,8 @@ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn); /* * Internal functions. Changes pageblock's migrate type. */ -extern int set_migratetype_isolate(struct page *page); -extern void unset_migratetype_isolate(struct page *page, unsigned migratetype); +int set_migratetype_isolate(struct page *page); +void unset_migratetype_isolate(struct page *page, unsigned migratetype); #endif diff --git a/mm/Kconfig b/mm/Kconfig index 82fed4e..d5c8019 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -140,9 +140,13 @@ config ARCH_DISCARD_MEMBLOCK config NO_BOOTMEM boolean +config MEMORY_ISOLATION + boolean + # eventually, we can have this option just 'select SPARSEMEM' config MEMORY_HOTPLUG bool "Allow for memory hot-add" + select MEMORY_ISOLATION depends on SPARSEMEM || X86_64_ACPI_NUMA depends on HOTPLUG && ARCH_ENABLE_MEMORY_HOTPLUG depends on (IA64 || X86 || PPC_BOOK3S_64 || SUPERH || S390) @@ -272,6 +276,7 @@ config MEMORY_FAILURE depends on MMU depends on ARCH_SUPPORTS_MEMORY_FAILURE bool "Enable recovery from hardware memory errors" + select MEMORY_ISOLATION help Enables code to recover from some memory failures on systems with MCA recovery. This allows a system to continue running diff --git a/mm/Makefile b/mm/Makefile index 25e8002..a0c7725 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -15,8 +15,7 @@ obj-y := filemap.o mempool.o oom_kill.o fadvise.o \ maccess.o page_alloc.o page-writeback.o \ readahead.o swap.o truncate.o vmscan.o shmem.o \ prio_tree.o util.o mmzone.o vmstat.o backing-dev.o \ - page_isolation.o mm_init.o mmu_context.o percpu.o \ - compaction.o $(mmu-y) + mm_init.o mmu_context.o percpu.o compaction.o $(mmu-y) obj-y += init-mm.o ifdef CONFIG_NO_BOOTMEM @@ -55,3 +54,4 @@ obj-$(CONFIG_HWPOISON_INJECT) += hwpoison-inject.o obj-$(CONFIG_DEBUG_KMEMLEAK) += kmemleak.o obj-$(CONFIG_DEBUG_KMEMLEAK_TEST) += kmemleak-test.o obj-$(CONFIG_CLEANCACHE) += cleancache.o +obj-$(CONFIG_MEMORY_ISOLATION) += page_isolation.o diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 2c29b1c..c175fa9 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -51,7 +51,6 @@ #include #include #include -#include #include #include #include @@ -5555,76 +5554,6 @@ bool is_pageblock_removable_nolock(struct page *page) return __count_immobile_pages(zone, page, 0); } -int set_migratetype_isolate(struct page *page) -{ - struct zone *zone; - unsigned long flags, pfn; - struct memory_isolate_notify arg; - int notifier_ret; - int ret = -EBUSY; - - zone = page_zone(page); - - spin_lock_irqsave(&zone->lock, flags); - - pfn = page_to_pfn(page); - arg.start_pfn = pfn; - arg.nr_pages = pageblock_nr_pages; - arg.pages_found = 0; - - /* - * It may be possible to isolate a pageblock even if the - * migratetype is not MIGRATE_MOVABLE. The memory isolation - * notifier chain is used by balloon drivers to return the - * number of pages in a range that are held by the balloon - * driver to shrink memory. If all the pages are accounted for - * by balloons, are free, or on the LRU, isolation can continue. - * Later, for example, when memory hotplug notifier runs, these - * pages reported as "can be isolated" should be isolated(freed) - * by the balloon driver through the memory notifier chain. - */ - notifier_ret = memory_isolate_notify(MEM_ISOLATE_COUNT, &arg); - notifier_ret = notifier_to_errno(notifier_ret); - if (notifier_ret) - goto out; - /* - * FIXME: Now, memory hotplug doesn't call shrink_slab() by itself. - * We just check MOVABLE pages. - */ - if (__count_immobile_pages(zone, page, arg.pages_found)) - ret = 0; - - /* - * immobile means "not-on-lru" paes. If immobile is larger than - * removable-by-driver pages reported by notifier, we'll fail. - */ - -out: - if (!ret) { - set_pageblock_migratetype(page, MIGRATE_ISOLATE); - move_freepages_block(zone, page, MIGRATE_ISOLATE); - } - - spin_unlock_irqrestore(&zone->lock, flags); - if (!ret) - drain_all_pages(); - return ret; -} - -void unset_migratetype_isolate(struct page *page, unsigned migratetype) -{ - struct zone *zone; - unsigned long flags; - zone = page_zone(page); - spin_lock_irqsave(&zone->lock, flags); - if (get_pageblock_migratetype(page) != MIGRATE_ISOLATE) - goto out; - set_pageblock_migratetype(page, migratetype); - move_freepages_block(zone, page, migratetype); -out: - spin_unlock_irqrestore(&zone->lock, flags); -} - #ifdef CONFIG_CMA static unsigned long pfn_max_align_down(unsigned long pfn) diff --git a/mm/page_isolation.c b/mm/page_isolation.c index c9f0477..1a9cb36 100644 --- a/mm/page_isolation.c +++ b/mm/page_isolation.c @@ -5,8 +5,79 @@ #include #include #include +#include #include "internal.h" +int set_migratetype_isolate(struct page *page) +{ + struct zone *zone; + unsigned long flags, pfn; + struct memory_isolate_notify arg; + int notifier_ret; + int ret = -EBUSY; + + zone = page_zone(page); + + spin_lock_irqsave(&zone->lock, flags); + + pfn = page_to_pfn(page); + arg.start_pfn = pfn; + arg.nr_pages = pageblock_nr_pages; + arg.pages_found = 0; + + /* + * It may be possible to isolate a pageblock even if the + * migratetype is not MIGRATE_MOVABLE. The memory isolation + * notifier chain is used by balloon drivers to return the + * number of pages in a range that are held by the balloon + * driver to shrink memory. If all the pages are accounted for + * by balloons, are free, or on the LRU, isolation can continue. + * Later, for example, when memory hotplug notifier runs, these + * pages reported as "can be isolated" should be isolated(freed) + * by the balloon driver through the memory notifier chain. + */ + notifier_ret = memory_isolate_notify(MEM_ISOLATE_COUNT, &arg); + notifier_ret = notifier_to_errno(notifier_ret); + if (notifier_ret) + goto out; + /* + * FIXME: Now, memory hotplug doesn't call shrink_slab() by itself. + * We just check MOVABLE pages. + */ + if (__count_immobile_pages(zone, page, arg.pages_found)) + ret = 0; + + /* + * immobile means "not-on-lru" paes. If immobile is larger than + * removable-by-driver pages reported by notifier, we'll fail. + */ + +out: + if (!ret) { + set_pageblock_migratetype(page, MIGRATE_ISOLATE); + move_freepages_block(zone, page, MIGRATE_ISOLATE); + } + + spin_unlock_irqrestore(&zone->lock, flags); + if (!ret) + drain_all_pages(); + return ret; +} + +void unset_migratetype_isolate(struct page *page, unsigned migratetype) +{ + struct zone *zone; + unsigned long flags; + zone = page_zone(page); + spin_lock_irqsave(&zone->lock, flags); + if (get_pageblock_migratetype(page) != MIGRATE_ISOLATE) + goto out; + set_pageblock_migratetype(page, migratetype); + move_freepages_block(zone, page, migratetype); +out: + spin_unlock_irqrestore(&zone->lock, flags); +} + static inline struct page * __first_valid_page(unsigned long pfn, unsigned long nr_pages) { -- 1.7.9.5 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx169.postini.com [74.125.245.169]) by kanga.kvack.org (Postfix) with SMTP id 870716B030A for ; Mon, 25 Jun 2012 00:59:35 -0400 (EDT) From: Minchan Kim Subject: [PATCH 2/2] memory-hotplug: fix kswapd looping forever problem Date: Mon, 25 Jun 2012 13:59:27 +0900 Message-Id: <1340600367-23620-3-git-send-email-minchan@kernel.org> In-Reply-To: <1340600367-23620-1-git-send-email-minchan@kernel.org> References: <1340600367-23620-1-git-send-email-minchan@kernel.org> Sender: owner-linux-mm@kvack.org List-ID: To: akpm@linux-foundation.org Cc: KOSAKI Motohiro , Mel Gorman , KAMEZAWA Hiroyuki , Aaditya Kumar , LKML , linux-mm , Minchan Kim , Mel Gorman When hotplug offlining happens on zone A, it starts to mark freed page as MIGRATE_ISOLATE type in buddy for preventing further allocation. (MIGRATE_ISOLATE is very irony type because it's apparently on buddy but we can't allocate them). When the memory shortage happens during hotplug offlining, current task starts to reclaim, then wake up kswapd. Kswapd checks watermark, then go sleep because current zone_watermark_ok_safe doesn't consider MIGRATE_ISOLATE freed page count. Current task continue to reclaim in direct reclaim path without kswapd's helping. The problem is that zone->all_unreclaimable is set by only kswapd so that current task would be looping forever like below. __alloc_pages_slowpath restart: wake_all_kswapd rebalance: __alloc_pages_direct_reclaim do_try_to_free_pages if global_reclaim && !all_unreclaimable return 1; /* It means we did did_some_progress */ skip __alloc_pages_may_oom should_alloc_retry goto rebalance; If we apply KOSAKI's patch[1] which doesn't depends on kswapd about setting zone->all_unreclaimable, we can solve this problem by killing some task in direct reclaim path. But it doesn't wake up kswapd, still. It could be a problem still if other subsystem needs GFP_ATOMIC request. So kswapd should consider MIGRATE_ISOLATE when it calculate free pages BEFORE going sleep. This patch counts the number of MIGRATE_ISOLATE page block and zone_watermark_ok_safe will consider it if the system has such blocks (fortunately, it's very rare so no problem in POV overhead and kswapd is never hotpath). [1] http://lkml.org/lkml/2012/6/14/74 Suggested-by: KOSAKI Motohiro Cc: KAMEZAWA Hiroyuki Cc: Aaditya Kumar Cc: Mel Gorman Signed-off-by: Minchan Kim --- Aaditya, coul you confirm this patch solve your problem and make sure nr_migrate_isolate is zero after hotplug end? Thanks! include/linux/mmzone.h | 8 ++++++++ mm/page_alloc.c | 36 ++++++++++++++++++++++++++++++++++++ mm/page_isolation.c | 43 +++++++++++++++++++++++++++++++++++++++++-- 3 files changed, 85 insertions(+), 2 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index bf3404e..290e186 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -474,6 +474,14 @@ struct zone { * rarely used fields: */ const char *name; +#ifdef CONFIG_MEMORY_ISOLATION + /* + * the number of MIGRATE_ISOLATE pageblock + * We need this for accurate free page counting. + * It's protected by zone->lock + */ + atomic_t nr_migrate_isolate; +#endif } ____cacheline_internodealigned_in_smp; typedef enum { diff --git a/mm/page_alloc.c b/mm/page_alloc.c index c175fa9..626f877 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -218,6 +218,11 @@ EXPORT_SYMBOL(nr_online_nodes); int page_group_by_mobility_disabled __read_mostly; +/* + * NOTE: + * Don't use set_pageblock_migratetype(page, MIGRATE_ISOLATE) direclty. + * Instead, use {un}set_pageblock_isolate. + */ void set_pageblock_migratetype(struct page *page, int migratetype) { if (unlikely(page_group_by_mobility_disabled)) @@ -1614,6 +1619,28 @@ static bool __zone_watermark_ok(struct zone *z, int order, unsigned long mark, return true; } +static unsigned long migrate_isolate_pages(struct zone *zone) +{ + unsigned long nr_pages = 0; + + if (unlikely(atomic_read(&zone->nr_migrate_isolate))) { + unsigned long flags; + int order; + spin_lock_irqsave(&zone->lock, flags); + for (order = 0; order < MAX_ORDER; order++) { + struct free_area *area = &zone->free_area[order]; + long count = 0; + struct list_head *curr; + + list_for_each(curr, &area->free_list[MIGRATE_ISOLATE]) + count++; + nr_pages += (count << order); + } + spin_unlock_irqrestore(&zone->lock, flags); + } + return nr_pages; +} + bool zone_watermark_ok(struct zone *z, int order, unsigned long mark, int classzone_idx, int alloc_flags) { @@ -1629,6 +1656,14 @@ bool zone_watermark_ok_safe(struct zone *z, int order, unsigned long mark, if (z->percpu_drift_mark && free_pages < z->percpu_drift_mark) free_pages = zone_page_state_snapshot(z, NR_FREE_PAGES); + /* + * If the zone has MIGRATE_ISOLATE type free page, + * we should consider it, too. Otherwise, kswapd can sleep forever. + */ + free_pages -= migrate_isolate_pages(z); + if (free_pages < 0) + free_pages = 0; + return __zone_watermark_ok(z, order, mark, classzone_idx, alloc_flags, free_pages); } @@ -4407,6 +4442,7 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat, lruvec_init(&zone->lruvec, zone); zap_zone_vm_stats(zone); zone->flags = 0; + atomic_set(&zone->nr_migrate_isolate, 0); if (!size) continue; diff --git a/mm/page_isolation.c b/mm/page_isolation.c index 1a9cb36..e95a792 100644 --- a/mm/page_isolation.c +++ b/mm/page_isolation.c @@ -8,6 +8,45 @@ #include #include "internal.h" +static void set_pageblock_isolate(struct zone *zone, struct page *page) +{ + int old_migratetype; + assert_spin_locked(&zone->lock); + + if (unlikely(page_group_by_mobility_disabled)) { + set_pageblock_flags_group(page, MIGRATE_UNMOVABLE, + PB_migrate, PB_migrate_end); + return; + } + + old_migratetype = get_pageblock_migratetype(page); + set_pageblock_flags_group(page, MIGRATE_ISOLATE, + PB_migrate, PB_migrate_end); + + if (old_migratetype != MIGRATE_ISOLATE) + atomic_inc(&zone->nr_migrate_isolate); +} + +static void unset_pageblock_isolate(struct zone *zone, struct page *page, + unsigned long migratetype) +{ + assert_spin_locked(&zone->lock); + + if (unlikely(page_group_by_mobility_disabled)) { + set_pageblock_flags_group(page, migratetype, + PB_migrate, PB_migrate_end); + return; + } + + BUG_ON(get_pageblock_migratetype(page) != MIGRATE_ISOLATE); + BUG_ON(migratetype == MIGRATE_ISOLATE); + + set_pageblock_flags_group(page, migratetype, + PB_migrate, PB_migrate_end); + atomic_dec(&zone->nr_migrate_isolate); + BUG_ON(atomic_read(&zone->nr_migrate_isolate) < 0); +} + int set_migratetype_isolate(struct page *page) { struct zone *zone; @@ -54,7 +93,7 @@ int set_migratetype_isolate(struct page *page) out: if (!ret) { - set_pageblock_migratetype(page, MIGRATE_ISOLATE); + set_pageblock_isolate(zone, page); move_freepages_block(zone, page, MIGRATE_ISOLATE); } @@ -72,8 +111,8 @@ void unset_migratetype_isolate(struct page *page, unsigned migratetype) spin_lock_irqsave(&zone->lock, flags); if (get_pageblock_migratetype(page) != MIGRATE_ISOLATE) goto out; - set_pageblock_migratetype(page, migratetype); move_freepages_block(zone, page, migratetype); + unset_pageblock_isolate(zone, page, migratetype); out: spin_unlock_irqrestore(&zone->lock, flags); } -- 1.7.9.5 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx162.postini.com [74.125.245.162]) by kanga.kvack.org (Postfix) with SMTP id 39B906B032B for ; Mon, 25 Jun 2012 06:02:51 -0400 (EDT) Date: Mon, 25 Jun 2012 11:02:47 +0100 From: Mel Gorman Subject: Re: [PATCH 2/2] memory-hotplug: fix kswapd looping forever problem Message-ID: <20120625100247.GC8103@csn.ul.ie> References: <1340600367-23620-1-git-send-email-minchan@kernel.org> <1340600367-23620-3-git-send-email-minchan@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <1340600367-23620-3-git-send-email-minchan@kernel.org> Sender: owner-linux-mm@kvack.org List-ID: To: Minchan Kim Cc: akpm@linux-foundation.org, KOSAKI Motohiro , KAMEZAWA Hiroyuki , Aaditya Kumar , LKML , linux-mm , Mel Gorman On Mon, Jun 25, 2012 at 01:59:27PM +0900, Minchan Kim wrote: > When hotplug offlining happens on zone A, it starts to mark freed page > as MIGRATE_ISOLATE type in buddy for preventing further allocation. > (MIGRATE_ISOLATE is very irony type because it's apparently on buddy > but we can't allocate them). > When the memory shortage happens during hotplug offlining, > current task starts to reclaim, then wake up kswapd. > Kswapd checks watermark, then go sleep because current zone_watermark_ok_safe > doesn't consider MIGRATE_ISOLATE freed page count. > Current task continue to reclaim in direct reclaim path without kswapd's helping. > The problem is that zone->all_unreclaimable is set by only kswapd > so that current task would be looping forever like below. > > __alloc_pages_slowpath > restart: > wake_all_kswapd > rebalance: > __alloc_pages_direct_reclaim > do_try_to_free_pages > if global_reclaim && !all_unreclaimable > return 1; /* It means we did did_some_progress */ > skip __alloc_pages_may_oom > should_alloc_retry > goto rebalance; > > If we apply KOSAKI's patch[1] which doesn't depends on kswapd > about setting zone->all_unreclaimable, we can solve this problem > by killing some task in direct reclaim path. But it doesn't wake up kswapd, still. > It could be a problem still if other subsystem needs GFP_ATOMIC request. > So kswapd should consider MIGRATE_ISOLATE when it calculate free pages > BEFORE going sleep. > > This patch counts the number of MIGRATE_ISOLATE page block and > zone_watermark_ok_safe will consider it if the system has such blocks > (fortunately, it's very rare so no problem in POV overhead and kswapd is never > hotpath). > > [1] http://lkml.org/lkml/2012/6/14/74 > I have not been following this discussion at all but reading through the patch I wonder again why memory hotplug is not "allocating" the pageblocks when they are fully isolated like a balloon driver. This would keep the free space accounting as it is currently without introducing something memory hotplug specific to kswapd. I think historically that memory hotplug did not allocate pages because it would be difficult to detect if a pageblock was isolated or if part of some balloon. Allocating just full pageblocks would work around this. However, it would play very badly with CMA. It'd be worth mentioning this in the changelog in case someone tries to "fix" it. > Suggested-by: KOSAKI Motohiro > Cc: KAMEZAWA Hiroyuki > Cc: Aaditya Kumar > Cc: Mel Gorman > Signed-off-by: Minchan Kim > --- > > Aaditya, coul you confirm this patch solve your problem and > make sure nr_migrate_isolate is zero after hotplug end? > > Thanks! > > include/linux/mmzone.h | 8 ++++++++ > mm/page_alloc.c | 36 ++++++++++++++++++++++++++++++++++++ > mm/page_isolation.c | 43 +++++++++++++++++++++++++++++++++++++++++-- > 3 files changed, 85 insertions(+), 2 deletions(-) > > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > index bf3404e..290e186 100644 > --- a/include/linux/mmzone.h > +++ b/include/linux/mmzone.h > @@ -474,6 +474,14 @@ struct zone { > * rarely used fields: > */ > const char *name; > +#ifdef CONFIG_MEMORY_ISOLATION > + /* > + * the number of MIGRATE_ISOLATE pageblock > + * We need this for accurate free page counting. > + * It's protected by zone->lock > + */ > + atomic_t nr_migrate_isolate; > +#endif If it's protected by the zone->lock then it does not need to be atomic. > } ____cacheline_internodealigned_in_smp; > > typedef enum { > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index c175fa9..626f877 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -218,6 +218,11 @@ EXPORT_SYMBOL(nr_online_nodes); > > int page_group_by_mobility_disabled __read_mostly; > > +/* > + * NOTE: > + * Don't use set_pageblock_migratetype(page, MIGRATE_ISOLATE) direclty. s/direclty/directly/ > + * Instead, use {un}set_pageblock_isolate. > + */ > void set_pageblock_migratetype(struct page *page, int migratetype) > { > if (unlikely(page_group_by_mobility_disabled)) > @@ -1614,6 +1619,28 @@ static bool __zone_watermark_ok(struct zone *z, int order, unsigned long mark, > return true; > } > > +static unsigned long migrate_isolate_pages(struct zone *zone) > +{ This name is very misleading as nothing is being migrated. The other zone-based counters are stored in vmstat[] and accessed with zone_page_state(). I doubt you want to use vmstat but nr_isolated_zone_pages() would have been a better name. > + unsigned long nr_pages = 0; > + > + if (unlikely(atomic_read(&zone->nr_migrate_isolate))) { > + unsigned long flags; > + int order; > + spin_lock_irqsave(&zone->lock, flags); > + for (order = 0; order < MAX_ORDER; order++) { > + struct free_area *area = &zone->free_area[order]; > + long count = 0; > + struct list_head *curr; > + > + list_for_each(curr, &area->free_list[MIGRATE_ISOLATE]) > + count++; > + nr_pages += (count << order); > + } > + spin_unlock_irqrestore(&zone->lock, flags); > + } We have a zone->nr_migrate_isolate counter but have to search the buddy lists to count how many pages are isolated? Don't bother. If the pageblocks really have been isolated just assume that they are fully isolated for the purposes of figuring out if kswapd should wake up or not. The consequences of an inaccurate count is that kswapd wakes up when it potentially could have stayed asleep but for memory hotplug that is desirable. > + return nr_pages; > +} > + > bool zone_watermark_ok(struct zone *z, int order, unsigned long mark, > int classzone_idx, int alloc_flags) > { > @@ -1629,6 +1656,14 @@ bool zone_watermark_ok_safe(struct zone *z, int order, unsigned long mark, > if (z->percpu_drift_mark && free_pages < z->percpu_drift_mark) > free_pages = zone_page_state_snapshot(z, NR_FREE_PAGES); > > + /* > + * If the zone has MIGRATE_ISOLATE type free page, > + * we should consider it, too. Otherwise, kswapd can sleep forever. > + */ > + free_pages -= migrate_isolate_pages(z); > + if (free_pages < 0) > + free_pages = 0; > + You are already taking into account that the numbet of isolated pages may be inaccurate so an exact count in migrate_isolate_pages is unnecessary. > return __zone_watermark_ok(z, order, mark, classzone_idx, alloc_flags, > free_pages); > } > @@ -4407,6 +4442,7 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat, > lruvec_init(&zone->lruvec, zone); > zap_zone_vm_stats(zone); > zone->flags = 0; > + atomic_set(&zone->nr_migrate_isolate, 0); > if (!size) > continue; > > diff --git a/mm/page_isolation.c b/mm/page_isolation.c > index 1a9cb36..e95a792 100644 > --- a/mm/page_isolation.c > +++ b/mm/page_isolation.c > @@ -8,6 +8,45 @@ > #include > #include "internal.h" > > +static void set_pageblock_isolate(struct zone *zone, struct page *page) > +{ > + int old_migratetype; > + assert_spin_locked(&zone->lock); > + > + if (unlikely(page_group_by_mobility_disabled)) { > + set_pageblock_flags_group(page, MIGRATE_UNMOVABLE, > + PB_migrate, PB_migrate_end); > + return; > + } Not sure why this is necessary. > + > + old_migratetype = get_pageblock_migratetype(page); > + set_pageblock_flags_group(page, MIGRATE_ISOLATE, > + PB_migrate, PB_migrate_end); > + > + if (old_migratetype != MIGRATE_ISOLATE) > + atomic_inc(&zone->nr_migrate_isolate); > +} If the old type was MIGRATE_ISOLATE then it was also unnecessary to call set_pageblock_flags_group() or anything else. It's also unnecessary to pass in zone because you can figure it out from the page but no harm. This could have been a lot easier to read with something like; static void set_pageblock_isolate(struct zone *zone, struct page *page) { BUG_ON(page_zone(page) ! = zone); if (get_pageblock_migratetype(page) == MIGRATE_ISOLATE) return; set_pageblock_migratetype(page, MIGRATE_ISOLATE); zone->nr_pageblock_isolate++; } If set_migratetype_isolate is the only caller then it barely warrents a helper functions because it simply looks like if (get_pageblock_migratetype(page) != MIGRATE_ISOLATE) { set_pageblock_migratetype(page, MIGRATE_ISOLATE); zone->nr_pageblock_isolate++; } > + > +static void unset_pageblock_isolate(struct zone *zone, struct page *page, > + unsigned long migratetype) > +{ The word "unset" in this context would normally refer to a boolean but you're actually restoring an old value. Hence reset_pageblock_isolate or restore_pageblock_migratetype would have been more appropriate. Oh, I see you are matching the naming of unset_migratetype_isolate(). That sucks, hope it was not me that suggested that name originally :/ migratetype is almost always int too, not sure where that unsigned long came out of. > + assert_spin_locked(&zone->lock); > + > + if (unlikely(page_group_by_mobility_disabled)) { > + set_pageblock_flags_group(page, migratetype, > + PB_migrate, PB_migrate_end); > + return; > + } > + > + BUG_ON(get_pageblock_migratetype(page) != MIGRATE_ISOLATE); > + BUG_ON(migratetype == MIGRATE_ISOLATE); > + > + set_pageblock_flags_group(page, migratetype, > + PB_migrate, PB_migrate_end); > + atomic_dec(&zone->nr_migrate_isolate); > + BUG_ON(atomic_read(&zone->nr_migrate_isolate) < 0); > +} > + static void reset_pageblock_isolate(struct zone *zone, struct page *page, int migratetype) { if (WARN_ON(get_pageblock_migratetype(page) != MIGRATE_ISOLATE)) return; BUG_ON(zone->nr_pageblock_isolate == 0); set_pageblock_migratetype(page, migratetype); zone->nr_pageblock_isolate--; } > int set_migratetype_isolate(struct page *page) > { > struct zone *zone; > @@ -54,7 +93,7 @@ int set_migratetype_isolate(struct page *page) > > out: > if (!ret) { > - set_pageblock_migratetype(page, MIGRATE_ISOLATE); > + set_pageblock_isolate(zone, page); > move_freepages_block(zone, page, MIGRATE_ISOLATE); > } > > @@ -72,8 +111,8 @@ void unset_migratetype_isolate(struct page *page, unsigned migratetype) > spin_lock_irqsave(&zone->lock, flags); > if (get_pageblock_migratetype(page) != MIGRATE_ISOLATE) > goto out; > - set_pageblock_migratetype(page, migratetype); > move_freepages_block(zone, page, migratetype); > + unset_pageblock_isolate(zone, page, migratetype); > out: > spin_unlock_irqrestore(&zone->lock, flags); > } While this patch looks like it would work as advertised I also think that it is more complicated than it needs to be. The use of atomics and exact counts of isolated pages look unnecessary. set_migeratetype_isolate and unset_pageblock_isolate could have been a lot easier to read. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx180.postini.com [74.125.245.180]) by kanga.kvack.org (Postfix) with SMTP id 01FCC6B014E for ; Mon, 25 Jun 2012 20:10:23 -0400 (EDT) Message-ID: <4FE8FDEF.7030306@kernel.org> Date: Tue, 26 Jun 2012 09:10:23 +0900 From: Minchan Kim MIME-Version: 1.0 Subject: Re: [PATCH 2/2] memory-hotplug: fix kswapd looping forever problem References: <1340600367-23620-1-git-send-email-minchan@kernel.org> <1340600367-23620-3-git-send-email-minchan@kernel.org> <20120625100247.GC8103@csn.ul.ie> In-Reply-To: <20120625100247.GC8103@csn.ul.ie> Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Mel Gorman Cc: akpm@linux-foundation.org, KOSAKI Motohiro , KAMEZAWA Hiroyuki , Aaditya Kumar , LKML , linux-mm , Mel Gorman Hi Mel, On 06/25/2012 07:02 PM, Mel Gorman wrote: > On Mon, Jun 25, 2012 at 01:59:27PM +0900, Minchan Kim wrote: >> When hotplug offlining happens on zone A, it starts to mark freed page >> as MIGRATE_ISOLATE type in buddy for preventing further allocation. >> (MIGRATE_ISOLATE is very irony type because it's apparently on buddy >> but we can't allocate them). >> When the memory shortage happens during hotplug offlining, >> current task starts to reclaim, then wake up kswapd. >> Kswapd checks watermark, then go sleep because current zone_watermark_ok_safe >> doesn't consider MIGRATE_ISOLATE freed page count. >> Current task continue to reclaim in direct reclaim path without kswapd's helping. >> The problem is that zone->all_unreclaimable is set by only kswapd >> so that current task would be looping forever like below. >> >> __alloc_pages_slowpath >> restart: >> wake_all_kswapd >> rebalance: >> __alloc_pages_direct_reclaim >> do_try_to_free_pages >> if global_reclaim && !all_unreclaimable >> return 1; /* It means we did did_some_progress */ >> skip __alloc_pages_may_oom >> should_alloc_retry >> goto rebalance; >> >> If we apply KOSAKI's patch[1] which doesn't depends on kswapd >> about setting zone->all_unreclaimable, we can solve this problem >> by killing some task in direct reclaim path. But it doesn't wake up kswapd, still. >> It could be a problem still if other subsystem needs GFP_ATOMIC request. >> So kswapd should consider MIGRATE_ISOLATE when it calculate free pages >> BEFORE going sleep. >> >> This patch counts the number of MIGRATE_ISOLATE page block and >> zone_watermark_ok_safe will consider it if the system has such blocks >> (fortunately, it's very rare so no problem in POV overhead and kswapd is never >> hotpath). >> >> [1] http://lkml.org/lkml/2012/6/14/74 >> > > I have not been following this discussion at all but reading through the > patch I wonder again why memory hotplug is not "allocating" the pageblocks > when they are fully isolated like a balloon driver. This would keep the > free space accounting as it is currently without introducing something > memory hotplug specific to kswapd. > > I think historically that memory hotplug did not allocate pages because > it would be difficult to detect if a pageblock was isolated or if part of > some balloon. Allocating just full pageblocks would work around this. > However, it would play very badly with CMA. > > It'd be worth mentioning this in the changelog in case someone tries to > "fix" it. > >> Suggested-by: KOSAKI Motohiro >> Cc: KAMEZAWA Hiroyuki >> Cc: Aaditya Kumar >> Cc: Mel Gorman >> Signed-off-by: Minchan Kim >> --- >> >> Aaditya, coul you confirm this patch solve your problem and >> make sure nr_migrate_isolate is zero after hotplug end? >> >> Thanks! >> >> include/linux/mmzone.h | 8 ++++++++ >> mm/page_alloc.c | 36 ++++++++++++++++++++++++++++++++++++ >> mm/page_isolation.c | 43 +++++++++++++++++++++++++++++++++++++++++-- >> 3 files changed, 85 insertions(+), 2 deletions(-) >> >> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h >> index bf3404e..290e186 100644 >> --- a/include/linux/mmzone.h >> +++ b/include/linux/mmzone.h >> @@ -474,6 +474,14 @@ struct zone { >> * rarely used fields: >> */ >> const char *name; >> +#ifdef CONFIG_MEMORY_ISOLATION >> + /* >> + * the number of MIGRATE_ISOLATE pageblock >> + * We need this for accurate free page counting. >> + * It's protected by zone->lock >> + */ >> + atomic_t nr_migrate_isolate; >> +#endif > > If it's protected by the zone->lock then it does not need to be atomic. Slaps self. :-( > >> } ____cacheline_internodealigned_in_smp; >> >> typedef enum { >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> index c175fa9..626f877 100644 >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -218,6 +218,11 @@ EXPORT_SYMBOL(nr_online_nodes); >> >> int page_group_by_mobility_disabled __read_mostly; >> >> +/* >> + * NOTE: >> + * Don't use set_pageblock_migratetype(page, MIGRATE_ISOLATE) direclty. > > s/direclty/directly/ Will fix. > >> + * Instead, use {un}set_pageblock_isolate. >> + */ >> void set_pageblock_migratetype(struct page *page, int migratetype) >> { >> if (unlikely(page_group_by_mobility_disabled)) >> @@ -1614,6 +1619,28 @@ static bool __zone_watermark_ok(struct zone *z, int order, unsigned long mark, >> return true; >> } >> >> +static unsigned long migrate_isolate_pages(struct zone *zone) >> +{ > > This name is very misleading as nothing is being migrated. The Agree. > other zone-based counters are stored in vmstat[] and accessed > with zone_page_state(). I doubt you want to use vmstat but > nr_isolated_zone_pages() would have been a better name. I feel it's kinda NR_ISOLATE_{ANON + FILE}. So I think it would be better to use nr_zone_isolate_freepages. > >> + unsigned long nr_pages = 0; >> + >> + if (unlikely(atomic_read(&zone->nr_migrate_isolate))) { >> + unsigned long flags; >> + int order; >> + spin_lock_irqsave(&zone->lock, flags); >> + for (order = 0; order < MAX_ORDER; order++) { >> + struct free_area *area = &zone->free_area[order]; >> + long count = 0; >> + struct list_head *curr; >> + >> + list_for_each(curr, &area->free_list[MIGRATE_ISOLATE]) >> + count++; >> + nr_pages += (count << order); >> + } >> + spin_unlock_irqrestore(&zone->lock, flags); >> + } > > We have a zone->nr_migrate_isolate counter but have to search the buddy > lists to count how many pages are isolated? Don't bother. If the pageblocks > really have been isolated just assume that they are fully isolated for the > purposes of figuring out if kswapd should wake up or not. The consequences > of an inaccurate count is that kswapd wakes up when it potentially could > have stayed asleep but for memory hotplug that is desirable. Fair enough. > >> + return nr_pages; >> +} >> + >> bool zone_watermark_ok(struct zone *z, int order, unsigned long mark, >> int classzone_idx, int alloc_flags) >> { >> @@ -1629,6 +1656,14 @@ bool zone_watermark_ok_safe(struct zone *z, int order, unsigned long mark, >> if (z->percpu_drift_mark && free_pages < z->percpu_drift_mark) >> free_pages = zone_page_state_snapshot(z, NR_FREE_PAGES); >> >> + /* >> + * If the zone has MIGRATE_ISOLATE type free page, >> + * we should consider it, too. Otherwise, kswapd can sleep forever. >> + */ >> + free_pages -= migrate_isolate_pages(z); >> + if (free_pages < 0) >> + free_pages = 0; >> + > > You are already taking into account that the numbet of isolated pages may > be inaccurate so an exact count in migrate_isolate_pages is unnecessary. I will use following code instead of migrate_isolate_pages's accurate counter. int migrate_isolate_pages(struct zone *z) { return z->nr_migrate_isolate * pageblock_nr_pages; } free_pages -= migrate_isolate_pages(z); if (free_pages < 0) free_pages = 0; Otherwise, simply, we can always wake up kswapd during hot-plug because it's temporal action and would end sooner or later. It is likely to make unnecessary aging and CPU consumption but it would be very rare. zone_watermark_ok_safe() { if (z->nr_migrate_isolate > 0) return false; .. .. } > >> return __zone_watermark_ok(z, order, mark, classzone_idx, alloc_flags, >> free_pages); >> } >> @@ -4407,6 +4442,7 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat, >> lruvec_init(&zone->lruvec, zone); >> zap_zone_vm_stats(zone); >> zone->flags = 0; >> + atomic_set(&zone->nr_migrate_isolate, 0); >> if (!size) >> continue; >> >> diff --git a/mm/page_isolation.c b/mm/page_isolation.c >> index 1a9cb36..e95a792 100644 >> --- a/mm/page_isolation.c >> +++ b/mm/page_isolation.c >> @@ -8,6 +8,45 @@ >> #include >> #include "internal.h" >> >> +static void set_pageblock_isolate(struct zone *zone, struct page *page) >> +{ >> + int old_migratetype; >> + assert_spin_locked(&zone->lock); >> + >> + if (unlikely(page_group_by_mobility_disabled)) { >> + set_pageblock_flags_group(page, MIGRATE_UNMOVABLE, >> + PB_migrate, PB_migrate_end); >> + return; >> + } > > Not sure why this is necessary. Yeb. hotplug should work regardless of small memory system(KOSAKI already pointed out) but current hotplug code have used set_pageblock_migratetype(page, MIGRATE_ISOLATE) which already check page_group_by_mobility_disabled. It's like buggy. But I think it should be fixed as another patch. > >> + >> + old_migratetype = get_pageblock_migratetype(page); >> + set_pageblock_flags_group(page, MIGRATE_ISOLATE, >> + PB_migrate, PB_migrate_end); >> + >> + if (old_migratetype != MIGRATE_ISOLATE) >> + atomic_inc(&zone->nr_migrate_isolate); >> +} > > If the old type was MIGRATE_ISOLATE then it was also unnecessary to call > set_pageblock_flags_group() or anything else. It's also unnecessary to > pass in zone because you can figure it out from the page but no harm. > > This could have been a lot easier to read with something like; > > static void set_pageblock_isolate(struct zone *zone, struct page *page) > { > BUG_ON(page_zone(page) ! = zone); > if (get_pageblock_migratetype(page) == MIGRATE_ISOLATE) > return; > > set_pageblock_migratetype(page, MIGRATE_ISOLATE); > zone->nr_pageblock_isolate++; > } > > If set_migratetype_isolate is the only caller then it barely warrents a > helper functions because it simply looks like > > if (get_pageblock_migratetype(page) != MIGRATE_ISOLATE) { > set_pageblock_migratetype(page, MIGRATE_ISOLATE); > zone->nr_pageblock_isolate++; > } > > >> + >> +static void unset_pageblock_isolate(struct zone *zone, struct page *page, >> + unsigned long migratetype) >> +{ > > The word "unset" in this context would normally refer to a boolean but > you're actually restoring an old value. Hence reset_pageblock_isolate or > restore_pageblock_migratetype would have been more appropriate. > > Oh, I see you are matching the naming of unset_migratetype_isolate(). > That sucks, hope it was not me that suggested that name originally :/ Naming/comment is very hard part rather than coding on non-native speakers. Will fix. > > migratetype is almost always int too, not sure where that unsigned long > came out of. Will fix. > >> + assert_spin_locked(&zone->lock); >> + >> + if (unlikely(page_group_by_mobility_disabled)) { >> + set_pageblock_flags_group(page, migratetype, >> + PB_migrate, PB_migrate_end); >> + return; >> + } >> + >> + BUG_ON(get_pageblock_migratetype(page) != MIGRATE_ISOLATE); >> + BUG_ON(migratetype == MIGRATE_ISOLATE); >> + >> + set_pageblock_flags_group(page, migratetype, >> + PB_migrate, PB_migrate_end); >> + atomic_dec(&zone->nr_migrate_isolate); >> + BUG_ON(atomic_read(&zone->nr_migrate_isolate) < 0); >> +} >> + > > static void reset_pageblock_isolate(struct zone *zone, struct page *page, > int migratetype) > { > if (WARN_ON(get_pageblock_migratetype(page) != MIGRATE_ISOLATE)) > return; > > BUG_ON(zone->nr_pageblock_isolate == 0); > set_pageblock_migratetype(page, migratetype); > zone->nr_pageblock_isolate--; > } > > >> int set_migratetype_isolate(struct page *page) >> { >> struct zone *zone; >> @@ -54,7 +93,7 @@ int set_migratetype_isolate(struct page *page) >> >> out: >> if (!ret) { >> - set_pageblock_migratetype(page, MIGRATE_ISOLATE); >> + set_pageblock_isolate(zone, page); >> move_freepages_block(zone, page, MIGRATE_ISOLATE); >> } >> >> @@ -72,8 +111,8 @@ void unset_migratetype_isolate(struct page *page, unsigned migratetype) >> spin_lock_irqsave(&zone->lock, flags); >> if (get_pageblock_migratetype(page) != MIGRATE_ISOLATE) >> goto out; >> - set_pageblock_migratetype(page, migratetype); >> move_freepages_block(zone, page, migratetype); >> + unset_pageblock_isolate(zone, page, migratetype); >> out: >> spin_unlock_irqrestore(&zone->lock, flags); >> } > > While this patch looks like it would work as advertised I also think that > it is more complicated than it needs to be. The use of atomics and exact > counts of isolated pages look unnecessary. set_migeratetype_isolate and > unset_pageblock_isolate could have been a lot easier to read. > Thanks for the good review, Mel! -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753667Ab2FYE7f (ORCPT ); Mon, 25 Jun 2012 00:59:35 -0400 Received: from LGEMRELSE1Q.lge.com ([156.147.1.111]:50487 "EHLO LGEMRELSE1Q.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751791Ab2FYE7e (ORCPT ); Mon, 25 Jun 2012 00:59:34 -0400 X-AuditID: 9c93016f-b7cbdae0000024ac-18-4fe7f030b5d1 From: Minchan Kim To: akpm@linux-foundation.org Cc: KOSAKI Motohiro , Mel Gorman , KAMEZAWA Hiroyuki , Aaditya Kumar , LKML , linux-mm , Minchan Kim Subject: [PATCH 0/2] fix livelock because of kswapd stop Date: Mon, 25 Jun 2012 13:59:25 +0900 Message-Id: <1340600367-23620-1-git-send-email-minchan@kernel.org> X-Mailer: git-send-email 1.7.9.5 X-Brightmail-Tracker: AAAAAA== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org When hotplug offlining happens on zone A, it starts to mark freed page as MIGRATE_ISOLATE type in buddy for preventing further allocation. (MIGRATE_ISOLATE is very irony type because it's apparently on buddy but we can't allocate them). When the memory shortage happens during hotplug offlining, current task starts to reclaim, then wake up kswapd. Kswapd checks watermark, then go sleep because current zone_watermark_ok_safe doesn't consider MIGRATE_ISOLATE freed page count. Current task continue to reclaim in direct reclaim path without kswapd's helping. The problem is that zone->all_unreclaimable is set by only kswapd so that current task would be looping forever like below. __alloc_pages_slowpath restart: wake_all_kswapd rebalance: __alloc_pages_direct_reclaim do_try_to_free_pages if global_reclaim && !all_unreclaimable return 1; /* It means we did did_some_progress */ skip __alloc_pages_may_oom should_alloc_retry goto rebalance; [1/2] factor out memory-isolation functions from page_alloc.c to mm/page_isolation.c This patch can be merged regardless of [2/2]. [2/2] fix this problem. Aaditya, Could you confirm this patch can solve your problem? Minchan Kim (2): mm: Factor out memory isolate functions memory-hotplug: fix kswapd looping forever problem drivers/base/Kconfig | 1 + include/linux/mmzone.h | 8 +++ include/linux/page-isolation.h | 8 +-- mm/Kconfig | 5 ++ mm/Makefile | 4 +- mm/page_alloc.c | 107 +++++++++++++------------------------- mm/page_isolation.c | 110 ++++++++++++++++++++++++++++++++++++++++ 7 files changed, 166 insertions(+), 77 deletions(-) -- 1.7.9.5 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754180Ab2FYE7i (ORCPT ); Mon, 25 Jun 2012 00:59:38 -0400 Received: from LGEMRELSE1Q.lge.com ([156.147.1.111]:50487 "EHLO LGEMRELSE1Q.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752411Ab2FYE7f (ORCPT ); Mon, 25 Jun 2012 00:59:35 -0400 X-AuditID: 9c93016f-b7cbdae0000024ac-21-4fe7f031aa05 From: Minchan Kim To: akpm@linux-foundation.org Cc: KOSAKI Motohiro , Mel Gorman , KAMEZAWA Hiroyuki , Aaditya Kumar , LKML , linux-mm , Minchan Kim , Andi Kleen , Marek Szyprowski Subject: [PATCH 1/2] mm: Factor out memory isolate functions Date: Mon, 25 Jun 2012 13:59:26 +0900 Message-Id: <1340600367-23620-2-git-send-email-minchan@kernel.org> X-Mailer: git-send-email 1.7.9.5 In-Reply-To: <1340600367-23620-1-git-send-email-minchan@kernel.org> References: <1340600367-23620-1-git-send-email-minchan@kernel.org> X-Brightmail-Tracker: AAAAAA== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Now mm/page_alloc.c has some memory isolation functions but they are used oly when we enable CONFIG_{CMA|MEMORY_HOTPLUG|MEMORY_FAILURE}. So let's make it configurable by new CONFIG_MEMORY_ISOLATION so that it can reduce binary size and we can check it simple by CONFIG_MEMORY_ISOLATION, not if defined CONFIG_{CMA|MEMORY_HOTPLUG|MEMORY_FAILURE}. This patch is based on next-20120621. Cc: Andi Kleen Cc: Marek Szyprowski Cc: KAMEZAWA Hiroyuki Signed-off-by: Minchan Kim --- Andrew, I know this patch is conflict with current mmotm totally but I don't know what tree I should use. I hope it's a git tree instead of quit-based version. Anyway, If you feel it's inconvenient, let me know it and I will resend it on tree you mention. Thanks. drivers/base/Kconfig | 1 + include/linux/page-isolation.h | 8 ++--- mm/Kconfig | 5 +++ mm/Makefile | 4 +-- mm/page_alloc.c | 71 ---------------------------------------- mm/page_isolation.c | 71 ++++++++++++++++++++++++++++++++++++++++ 6 files changed, 83 insertions(+), 77 deletions(-) diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig index 9b21469..08b4c52 100644 --- a/drivers/base/Kconfig +++ b/drivers/base/Kconfig @@ -196,6 +196,7 @@ config CMA bool "Contiguous Memory Allocator (EXPERIMENTAL)" depends on HAVE_DMA_CONTIGUOUS && HAVE_MEMBLOCK && EXPERIMENTAL select MIGRATION + select MEMORY_ISOLATION help This enables the Contiguous Memory Allocator which allows drivers to allocate big physically-contiguous blocks of memory for use with diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h index 3bdcab3..0569a3e 100644 --- a/include/linux/page-isolation.h +++ b/include/linux/page-isolation.h @@ -10,7 +10,7 @@ * free all pages in the range. test_page_isolated() can be used for * test it. */ -extern int +int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn, unsigned migratetype); @@ -18,7 +18,7 @@ start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn, * Changes MIGRATE_ISOLATE to MIGRATE_MOVABLE. * target range is [start_pfn, end_pfn) */ -extern int +int undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn, unsigned migratetype); @@ -30,8 +30,8 @@ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn); /* * Internal functions. Changes pageblock's migrate type. */ -extern int set_migratetype_isolate(struct page *page); -extern void unset_migratetype_isolate(struct page *page, unsigned migratetype); +int set_migratetype_isolate(struct page *page); +void unset_migratetype_isolate(struct page *page, unsigned migratetype); #endif diff --git a/mm/Kconfig b/mm/Kconfig index 82fed4e..d5c8019 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -140,9 +140,13 @@ config ARCH_DISCARD_MEMBLOCK config NO_BOOTMEM boolean +config MEMORY_ISOLATION + boolean + # eventually, we can have this option just 'select SPARSEMEM' config MEMORY_HOTPLUG bool "Allow for memory hot-add" + select MEMORY_ISOLATION depends on SPARSEMEM || X86_64_ACPI_NUMA depends on HOTPLUG && ARCH_ENABLE_MEMORY_HOTPLUG depends on (IA64 || X86 || PPC_BOOK3S_64 || SUPERH || S390) @@ -272,6 +276,7 @@ config MEMORY_FAILURE depends on MMU depends on ARCH_SUPPORTS_MEMORY_FAILURE bool "Enable recovery from hardware memory errors" + select MEMORY_ISOLATION help Enables code to recover from some memory failures on systems with MCA recovery. This allows a system to continue running diff --git a/mm/Makefile b/mm/Makefile index 25e8002..a0c7725 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -15,8 +15,7 @@ obj-y := filemap.o mempool.o oom_kill.o fadvise.o \ maccess.o page_alloc.o page-writeback.o \ readahead.o swap.o truncate.o vmscan.o shmem.o \ prio_tree.o util.o mmzone.o vmstat.o backing-dev.o \ - page_isolation.o mm_init.o mmu_context.o percpu.o \ - compaction.o $(mmu-y) + mm_init.o mmu_context.o percpu.o compaction.o $(mmu-y) obj-y += init-mm.o ifdef CONFIG_NO_BOOTMEM @@ -55,3 +54,4 @@ obj-$(CONFIG_HWPOISON_INJECT) += hwpoison-inject.o obj-$(CONFIG_DEBUG_KMEMLEAK) += kmemleak.o obj-$(CONFIG_DEBUG_KMEMLEAK_TEST) += kmemleak-test.o obj-$(CONFIG_CLEANCACHE) += cleancache.o +obj-$(CONFIG_MEMORY_ISOLATION) += page_isolation.o diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 2c29b1c..c175fa9 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -51,7 +51,6 @@ #include #include #include -#include #include #include #include @@ -5555,76 +5554,6 @@ bool is_pageblock_removable_nolock(struct page *page) return __count_immobile_pages(zone, page, 0); } -int set_migratetype_isolate(struct page *page) -{ - struct zone *zone; - unsigned long flags, pfn; - struct memory_isolate_notify arg; - int notifier_ret; - int ret = -EBUSY; - - zone = page_zone(page); - - spin_lock_irqsave(&zone->lock, flags); - - pfn = page_to_pfn(page); - arg.start_pfn = pfn; - arg.nr_pages = pageblock_nr_pages; - arg.pages_found = 0; - - /* - * It may be possible to isolate a pageblock even if the - * migratetype is not MIGRATE_MOVABLE. The memory isolation - * notifier chain is used by balloon drivers to return the - * number of pages in a range that are held by the balloon - * driver to shrink memory. If all the pages are accounted for - * by balloons, are free, or on the LRU, isolation can continue. - * Later, for example, when memory hotplug notifier runs, these - * pages reported as "can be isolated" should be isolated(freed) - * by the balloon driver through the memory notifier chain. - */ - notifier_ret = memory_isolate_notify(MEM_ISOLATE_COUNT, &arg); - notifier_ret = notifier_to_errno(notifier_ret); - if (notifier_ret) - goto out; - /* - * FIXME: Now, memory hotplug doesn't call shrink_slab() by itself. - * We just check MOVABLE pages. - */ - if (__count_immobile_pages(zone, page, arg.pages_found)) - ret = 0; - - /* - * immobile means "not-on-lru" paes. If immobile is larger than - * removable-by-driver pages reported by notifier, we'll fail. - */ - -out: - if (!ret) { - set_pageblock_migratetype(page, MIGRATE_ISOLATE); - move_freepages_block(zone, page, MIGRATE_ISOLATE); - } - - spin_unlock_irqrestore(&zone->lock, flags); - if (!ret) - drain_all_pages(); - return ret; -} - -void unset_migratetype_isolate(struct page *page, unsigned migratetype) -{ - struct zone *zone; - unsigned long flags; - zone = page_zone(page); - spin_lock_irqsave(&zone->lock, flags); - if (get_pageblock_migratetype(page) != MIGRATE_ISOLATE) - goto out; - set_pageblock_migratetype(page, migratetype); - move_freepages_block(zone, page, migratetype); -out: - spin_unlock_irqrestore(&zone->lock, flags); -} - #ifdef CONFIG_CMA static unsigned long pfn_max_align_down(unsigned long pfn) diff --git a/mm/page_isolation.c b/mm/page_isolation.c index c9f0477..1a9cb36 100644 --- a/mm/page_isolation.c +++ b/mm/page_isolation.c @@ -5,8 +5,79 @@ #include #include #include +#include #include "internal.h" +int set_migratetype_isolate(struct page *page) +{ + struct zone *zone; + unsigned long flags, pfn; + struct memory_isolate_notify arg; + int notifier_ret; + int ret = -EBUSY; + + zone = page_zone(page); + + spin_lock_irqsave(&zone->lock, flags); + + pfn = page_to_pfn(page); + arg.start_pfn = pfn; + arg.nr_pages = pageblock_nr_pages; + arg.pages_found = 0; + + /* + * It may be possible to isolate a pageblock even if the + * migratetype is not MIGRATE_MOVABLE. The memory isolation + * notifier chain is used by balloon drivers to return the + * number of pages in a range that are held by the balloon + * driver to shrink memory. If all the pages are accounted for + * by balloons, are free, or on the LRU, isolation can continue. + * Later, for example, when memory hotplug notifier runs, these + * pages reported as "can be isolated" should be isolated(freed) + * by the balloon driver through the memory notifier chain. + */ + notifier_ret = memory_isolate_notify(MEM_ISOLATE_COUNT, &arg); + notifier_ret = notifier_to_errno(notifier_ret); + if (notifier_ret) + goto out; + /* + * FIXME: Now, memory hotplug doesn't call shrink_slab() by itself. + * We just check MOVABLE pages. + */ + if (__count_immobile_pages(zone, page, arg.pages_found)) + ret = 0; + + /* + * immobile means "not-on-lru" paes. If immobile is larger than + * removable-by-driver pages reported by notifier, we'll fail. + */ + +out: + if (!ret) { + set_pageblock_migratetype(page, MIGRATE_ISOLATE); + move_freepages_block(zone, page, MIGRATE_ISOLATE); + } + + spin_unlock_irqrestore(&zone->lock, flags); + if (!ret) + drain_all_pages(); + return ret; +} + +void unset_migratetype_isolate(struct page *page, unsigned migratetype) +{ + struct zone *zone; + unsigned long flags; + zone = page_zone(page); + spin_lock_irqsave(&zone->lock, flags); + if (get_pageblock_migratetype(page) != MIGRATE_ISOLATE) + goto out; + set_pageblock_migratetype(page, migratetype); + move_freepages_block(zone, page, migratetype); +out: + spin_unlock_irqrestore(&zone->lock, flags); +} + static inline struct page * __first_valid_page(unsigned long pfn, unsigned long nr_pages) { -- 1.7.9.5 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754299Ab2FYE7p (ORCPT ); Mon, 25 Jun 2012 00:59:45 -0400 Received: from LGEMRELSE1Q.lge.com ([156.147.1.111]:50487 "EHLO LGEMRELSE1Q.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753737Ab2FYE7h (ORCPT ); Mon, 25 Jun 2012 00:59:37 -0400 X-AuditID: 9c93016f-b7cbdae0000024ac-26-4fe7f0313c0f From: Minchan Kim To: akpm@linux-foundation.org Cc: KOSAKI Motohiro , Mel Gorman , KAMEZAWA Hiroyuki , Aaditya Kumar , LKML , linux-mm , Minchan Kim , Mel Gorman Subject: [PATCH 2/2] memory-hotplug: fix kswapd looping forever problem Date: Mon, 25 Jun 2012 13:59:27 +0900 Message-Id: <1340600367-23620-3-git-send-email-minchan@kernel.org> X-Mailer: git-send-email 1.7.9.5 In-Reply-To: <1340600367-23620-1-git-send-email-minchan@kernel.org> References: <1340600367-23620-1-git-send-email-minchan@kernel.org> X-Brightmail-Tracker: AAAAAA== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org When hotplug offlining happens on zone A, it starts to mark freed page as MIGRATE_ISOLATE type in buddy for preventing further allocation. (MIGRATE_ISOLATE is very irony type because it's apparently on buddy but we can't allocate them). When the memory shortage happens during hotplug offlining, current task starts to reclaim, then wake up kswapd. Kswapd checks watermark, then go sleep because current zone_watermark_ok_safe doesn't consider MIGRATE_ISOLATE freed page count. Current task continue to reclaim in direct reclaim path without kswapd's helping. The problem is that zone->all_unreclaimable is set by only kswapd so that current task would be looping forever like below. __alloc_pages_slowpath restart: wake_all_kswapd rebalance: __alloc_pages_direct_reclaim do_try_to_free_pages if global_reclaim && !all_unreclaimable return 1; /* It means we did did_some_progress */ skip __alloc_pages_may_oom should_alloc_retry goto rebalance; If we apply KOSAKI's patch[1] which doesn't depends on kswapd about setting zone->all_unreclaimable, we can solve this problem by killing some task in direct reclaim path. But it doesn't wake up kswapd, still. It could be a problem still if other subsystem needs GFP_ATOMIC request. So kswapd should consider MIGRATE_ISOLATE when it calculate free pages BEFORE going sleep. This patch counts the number of MIGRATE_ISOLATE page block and zone_watermark_ok_safe will consider it if the system has such blocks (fortunately, it's very rare so no problem in POV overhead and kswapd is never hotpath). [1] http://lkml.org/lkml/2012/6/14/74 Suggested-by: KOSAKI Motohiro Cc: KAMEZAWA Hiroyuki Cc: Aaditya Kumar Cc: Mel Gorman Signed-off-by: Minchan Kim --- Aaditya, coul you confirm this patch solve your problem and make sure nr_migrate_isolate is zero after hotplug end? Thanks! include/linux/mmzone.h | 8 ++++++++ mm/page_alloc.c | 36 ++++++++++++++++++++++++++++++++++++ mm/page_isolation.c | 43 +++++++++++++++++++++++++++++++++++++++++-- 3 files changed, 85 insertions(+), 2 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index bf3404e..290e186 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -474,6 +474,14 @@ struct zone { * rarely used fields: */ const char *name; +#ifdef CONFIG_MEMORY_ISOLATION + /* + * the number of MIGRATE_ISOLATE pageblock + * We need this for accurate free page counting. + * It's protected by zone->lock + */ + atomic_t nr_migrate_isolate; +#endif } ____cacheline_internodealigned_in_smp; typedef enum { diff --git a/mm/page_alloc.c b/mm/page_alloc.c index c175fa9..626f877 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -218,6 +218,11 @@ EXPORT_SYMBOL(nr_online_nodes); int page_group_by_mobility_disabled __read_mostly; +/* + * NOTE: + * Don't use set_pageblock_migratetype(page, MIGRATE_ISOLATE) direclty. + * Instead, use {un}set_pageblock_isolate. + */ void set_pageblock_migratetype(struct page *page, int migratetype) { if (unlikely(page_group_by_mobility_disabled)) @@ -1614,6 +1619,28 @@ static bool __zone_watermark_ok(struct zone *z, int order, unsigned long mark, return true; } +static unsigned long migrate_isolate_pages(struct zone *zone) +{ + unsigned long nr_pages = 0; + + if (unlikely(atomic_read(&zone->nr_migrate_isolate))) { + unsigned long flags; + int order; + spin_lock_irqsave(&zone->lock, flags); + for (order = 0; order < MAX_ORDER; order++) { + struct free_area *area = &zone->free_area[order]; + long count = 0; + struct list_head *curr; + + list_for_each(curr, &area->free_list[MIGRATE_ISOLATE]) + count++; + nr_pages += (count << order); + } + spin_unlock_irqrestore(&zone->lock, flags); + } + return nr_pages; +} + bool zone_watermark_ok(struct zone *z, int order, unsigned long mark, int classzone_idx, int alloc_flags) { @@ -1629,6 +1656,14 @@ bool zone_watermark_ok_safe(struct zone *z, int order, unsigned long mark, if (z->percpu_drift_mark && free_pages < z->percpu_drift_mark) free_pages = zone_page_state_snapshot(z, NR_FREE_PAGES); + /* + * If the zone has MIGRATE_ISOLATE type free page, + * we should consider it, too. Otherwise, kswapd can sleep forever. + */ + free_pages -= migrate_isolate_pages(z); + if (free_pages < 0) + free_pages = 0; + return __zone_watermark_ok(z, order, mark, classzone_idx, alloc_flags, free_pages); } @@ -4407,6 +4442,7 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat, lruvec_init(&zone->lruvec, zone); zap_zone_vm_stats(zone); zone->flags = 0; + atomic_set(&zone->nr_migrate_isolate, 0); if (!size) continue; diff --git a/mm/page_isolation.c b/mm/page_isolation.c index 1a9cb36..e95a792 100644 --- a/mm/page_isolation.c +++ b/mm/page_isolation.c @@ -8,6 +8,45 @@ #include #include "internal.h" +static void set_pageblock_isolate(struct zone *zone, struct page *page) +{ + int old_migratetype; + assert_spin_locked(&zone->lock); + + if (unlikely(page_group_by_mobility_disabled)) { + set_pageblock_flags_group(page, MIGRATE_UNMOVABLE, + PB_migrate, PB_migrate_end); + return; + } + + old_migratetype = get_pageblock_migratetype(page); + set_pageblock_flags_group(page, MIGRATE_ISOLATE, + PB_migrate, PB_migrate_end); + + if (old_migratetype != MIGRATE_ISOLATE) + atomic_inc(&zone->nr_migrate_isolate); +} + +static void unset_pageblock_isolate(struct zone *zone, struct page *page, + unsigned long migratetype) +{ + assert_spin_locked(&zone->lock); + + if (unlikely(page_group_by_mobility_disabled)) { + set_pageblock_flags_group(page, migratetype, + PB_migrate, PB_migrate_end); + return; + } + + BUG_ON(get_pageblock_migratetype(page) != MIGRATE_ISOLATE); + BUG_ON(migratetype == MIGRATE_ISOLATE); + + set_pageblock_flags_group(page, migratetype, + PB_migrate, PB_migrate_end); + atomic_dec(&zone->nr_migrate_isolate); + BUG_ON(atomic_read(&zone->nr_migrate_isolate) < 0); +} + int set_migratetype_isolate(struct page *page) { struct zone *zone; @@ -54,7 +93,7 @@ int set_migratetype_isolate(struct page *page) out: if (!ret) { - set_pageblock_migratetype(page, MIGRATE_ISOLATE); + set_pageblock_isolate(zone, page); move_freepages_block(zone, page, MIGRATE_ISOLATE); } @@ -72,8 +111,8 @@ void unset_migratetype_isolate(struct page *page, unsigned migratetype) spin_lock_irqsave(&zone->lock, flags); if (get_pageblock_migratetype(page) != MIGRATE_ISOLATE) goto out; - set_pageblock_migratetype(page, migratetype); move_freepages_block(zone, page, migratetype); + unset_pageblock_isolate(zone, page, migratetype); out: spin_unlock_irqrestore(&zone->lock, flags); } -- 1.7.9.5 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755744Ab2FYKCw (ORCPT ); Mon, 25 Jun 2012 06:02:52 -0400 Received: from gir.skynet.ie ([193.1.99.77]:49046 "EHLO gir.skynet.ie" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755467Ab2FYKCu (ORCPT ); Mon, 25 Jun 2012 06:02:50 -0400 Date: Mon, 25 Jun 2012 11:02:47 +0100 From: Mel Gorman To: Minchan Kim Cc: akpm@linux-foundation.org, KOSAKI Motohiro , KAMEZAWA Hiroyuki , Aaditya Kumar , LKML , linux-mm , Mel Gorman Subject: Re: [PATCH 2/2] memory-hotplug: fix kswapd looping forever problem Message-ID: <20120625100247.GC8103@csn.ul.ie> References: <1340600367-23620-1-git-send-email-minchan@kernel.org> <1340600367-23620-3-git-send-email-minchan@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <1340600367-23620-3-git-send-email-minchan@kernel.org> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jun 25, 2012 at 01:59:27PM +0900, Minchan Kim wrote: > When hotplug offlining happens on zone A, it starts to mark freed page > as MIGRATE_ISOLATE type in buddy for preventing further allocation. > (MIGRATE_ISOLATE is very irony type because it's apparently on buddy > but we can't allocate them). > When the memory shortage happens during hotplug offlining, > current task starts to reclaim, then wake up kswapd. > Kswapd checks watermark, then go sleep because current zone_watermark_ok_safe > doesn't consider MIGRATE_ISOLATE freed page count. > Current task continue to reclaim in direct reclaim path without kswapd's helping. > The problem is that zone->all_unreclaimable is set by only kswapd > so that current task would be looping forever like below. > > __alloc_pages_slowpath > restart: > wake_all_kswapd > rebalance: > __alloc_pages_direct_reclaim > do_try_to_free_pages > if global_reclaim && !all_unreclaimable > return 1; /* It means we did did_some_progress */ > skip __alloc_pages_may_oom > should_alloc_retry > goto rebalance; > > If we apply KOSAKI's patch[1] which doesn't depends on kswapd > about setting zone->all_unreclaimable, we can solve this problem > by killing some task in direct reclaim path. But it doesn't wake up kswapd, still. > It could be a problem still if other subsystem needs GFP_ATOMIC request. > So kswapd should consider MIGRATE_ISOLATE when it calculate free pages > BEFORE going sleep. > > This patch counts the number of MIGRATE_ISOLATE page block and > zone_watermark_ok_safe will consider it if the system has such blocks > (fortunately, it's very rare so no problem in POV overhead and kswapd is never > hotpath). > > [1] http://lkml.org/lkml/2012/6/14/74 > I have not been following this discussion at all but reading through the patch I wonder again why memory hotplug is not "allocating" the pageblocks when they are fully isolated like a balloon driver. This would keep the free space accounting as it is currently without introducing something memory hotplug specific to kswapd. I think historically that memory hotplug did not allocate pages because it would be difficult to detect if a pageblock was isolated or if part of some balloon. Allocating just full pageblocks would work around this. However, it would play very badly with CMA. It'd be worth mentioning this in the changelog in case someone tries to "fix" it. > Suggested-by: KOSAKI Motohiro > Cc: KAMEZAWA Hiroyuki > Cc: Aaditya Kumar > Cc: Mel Gorman > Signed-off-by: Minchan Kim > --- > > Aaditya, coul you confirm this patch solve your problem and > make sure nr_migrate_isolate is zero after hotplug end? > > Thanks! > > include/linux/mmzone.h | 8 ++++++++ > mm/page_alloc.c | 36 ++++++++++++++++++++++++++++++++++++ > mm/page_isolation.c | 43 +++++++++++++++++++++++++++++++++++++++++-- > 3 files changed, 85 insertions(+), 2 deletions(-) > > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > index bf3404e..290e186 100644 > --- a/include/linux/mmzone.h > +++ b/include/linux/mmzone.h > @@ -474,6 +474,14 @@ struct zone { > * rarely used fields: > */ > const char *name; > +#ifdef CONFIG_MEMORY_ISOLATION > + /* > + * the number of MIGRATE_ISOLATE pageblock > + * We need this for accurate free page counting. > + * It's protected by zone->lock > + */ > + atomic_t nr_migrate_isolate; > +#endif If it's protected by the zone->lock then it does not need to be atomic. > } ____cacheline_internodealigned_in_smp; > > typedef enum { > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index c175fa9..626f877 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -218,6 +218,11 @@ EXPORT_SYMBOL(nr_online_nodes); > > int page_group_by_mobility_disabled __read_mostly; > > +/* > + * NOTE: > + * Don't use set_pageblock_migratetype(page, MIGRATE_ISOLATE) direclty. s/direclty/directly/ > + * Instead, use {un}set_pageblock_isolate. > + */ > void set_pageblock_migratetype(struct page *page, int migratetype) > { > if (unlikely(page_group_by_mobility_disabled)) > @@ -1614,6 +1619,28 @@ static bool __zone_watermark_ok(struct zone *z, int order, unsigned long mark, > return true; > } > > +static unsigned long migrate_isolate_pages(struct zone *zone) > +{ This name is very misleading as nothing is being migrated. The other zone-based counters are stored in vmstat[] and accessed with zone_page_state(). I doubt you want to use vmstat but nr_isolated_zone_pages() would have been a better name. > + unsigned long nr_pages = 0; > + > + if (unlikely(atomic_read(&zone->nr_migrate_isolate))) { > + unsigned long flags; > + int order; > + spin_lock_irqsave(&zone->lock, flags); > + for (order = 0; order < MAX_ORDER; order++) { > + struct free_area *area = &zone->free_area[order]; > + long count = 0; > + struct list_head *curr; > + > + list_for_each(curr, &area->free_list[MIGRATE_ISOLATE]) > + count++; > + nr_pages += (count << order); > + } > + spin_unlock_irqrestore(&zone->lock, flags); > + } We have a zone->nr_migrate_isolate counter but have to search the buddy lists to count how many pages are isolated? Don't bother. If the pageblocks really have been isolated just assume that they are fully isolated for the purposes of figuring out if kswapd should wake up or not. The consequences of an inaccurate count is that kswapd wakes up when it potentially could have stayed asleep but for memory hotplug that is desirable. > + return nr_pages; > +} > + > bool zone_watermark_ok(struct zone *z, int order, unsigned long mark, > int classzone_idx, int alloc_flags) > { > @@ -1629,6 +1656,14 @@ bool zone_watermark_ok_safe(struct zone *z, int order, unsigned long mark, > if (z->percpu_drift_mark && free_pages < z->percpu_drift_mark) > free_pages = zone_page_state_snapshot(z, NR_FREE_PAGES); > > + /* > + * If the zone has MIGRATE_ISOLATE type free page, > + * we should consider it, too. Otherwise, kswapd can sleep forever. > + */ > + free_pages -= migrate_isolate_pages(z); > + if (free_pages < 0) > + free_pages = 0; > + You are already taking into account that the numbet of isolated pages may be inaccurate so an exact count in migrate_isolate_pages is unnecessary. > return __zone_watermark_ok(z, order, mark, classzone_idx, alloc_flags, > free_pages); > } > @@ -4407,6 +4442,7 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat, > lruvec_init(&zone->lruvec, zone); > zap_zone_vm_stats(zone); > zone->flags = 0; > + atomic_set(&zone->nr_migrate_isolate, 0); > if (!size) > continue; > > diff --git a/mm/page_isolation.c b/mm/page_isolation.c > index 1a9cb36..e95a792 100644 > --- a/mm/page_isolation.c > +++ b/mm/page_isolation.c > @@ -8,6 +8,45 @@ > #include > #include "internal.h" > > +static void set_pageblock_isolate(struct zone *zone, struct page *page) > +{ > + int old_migratetype; > + assert_spin_locked(&zone->lock); > + > + if (unlikely(page_group_by_mobility_disabled)) { > + set_pageblock_flags_group(page, MIGRATE_UNMOVABLE, > + PB_migrate, PB_migrate_end); > + return; > + } Not sure why this is necessary. > + > + old_migratetype = get_pageblock_migratetype(page); > + set_pageblock_flags_group(page, MIGRATE_ISOLATE, > + PB_migrate, PB_migrate_end); > + > + if (old_migratetype != MIGRATE_ISOLATE) > + atomic_inc(&zone->nr_migrate_isolate); > +} If the old type was MIGRATE_ISOLATE then it was also unnecessary to call set_pageblock_flags_group() or anything else. It's also unnecessary to pass in zone because you can figure it out from the page but no harm. This could have been a lot easier to read with something like; static void set_pageblock_isolate(struct zone *zone, struct page *page) { BUG_ON(page_zone(page) ! = zone); if (get_pageblock_migratetype(page) == MIGRATE_ISOLATE) return; set_pageblock_migratetype(page, MIGRATE_ISOLATE); zone->nr_pageblock_isolate++; } If set_migratetype_isolate is the only caller then it barely warrents a helper functions because it simply looks like if (get_pageblock_migratetype(page) != MIGRATE_ISOLATE) { set_pageblock_migratetype(page, MIGRATE_ISOLATE); zone->nr_pageblock_isolate++; } > + > +static void unset_pageblock_isolate(struct zone *zone, struct page *page, > + unsigned long migratetype) > +{ The word "unset" in this context would normally refer to a boolean but you're actually restoring an old value. Hence reset_pageblock_isolate or restore_pageblock_migratetype would have been more appropriate. Oh, I see you are matching the naming of unset_migratetype_isolate(). That sucks, hope it was not me that suggested that name originally :/ migratetype is almost always int too, not sure where that unsigned long came out of. > + assert_spin_locked(&zone->lock); > + > + if (unlikely(page_group_by_mobility_disabled)) { > + set_pageblock_flags_group(page, migratetype, > + PB_migrate, PB_migrate_end); > + return; > + } > + > + BUG_ON(get_pageblock_migratetype(page) != MIGRATE_ISOLATE); > + BUG_ON(migratetype == MIGRATE_ISOLATE); > + > + set_pageblock_flags_group(page, migratetype, > + PB_migrate, PB_migrate_end); > + atomic_dec(&zone->nr_migrate_isolate); > + BUG_ON(atomic_read(&zone->nr_migrate_isolate) < 0); > +} > + static void reset_pageblock_isolate(struct zone *zone, struct page *page, int migratetype) { if (WARN_ON(get_pageblock_migratetype(page) != MIGRATE_ISOLATE)) return; BUG_ON(zone->nr_pageblock_isolate == 0); set_pageblock_migratetype(page, migratetype); zone->nr_pageblock_isolate--; } > int set_migratetype_isolate(struct page *page) > { > struct zone *zone; > @@ -54,7 +93,7 @@ int set_migratetype_isolate(struct page *page) > > out: > if (!ret) { > - set_pageblock_migratetype(page, MIGRATE_ISOLATE); > + set_pageblock_isolate(zone, page); > move_freepages_block(zone, page, MIGRATE_ISOLATE); > } > > @@ -72,8 +111,8 @@ void unset_migratetype_isolate(struct page *page, unsigned migratetype) > spin_lock_irqsave(&zone->lock, flags); > if (get_pageblock_migratetype(page) != MIGRATE_ISOLATE) > goto out; > - set_pageblock_migratetype(page, migratetype); > move_freepages_block(zone, page, migratetype); > + unset_pageblock_isolate(zone, page, migratetype); > out: > spin_unlock_irqrestore(&zone->lock, flags); > } While this patch looks like it would work as advertised I also think that it is more complicated than it needs to be. The use of atomics and exact counts of isolated pages look unnecessary. set_migeratetype_isolate and unset_pageblock_isolate could have been a lot easier to read. -- Mel Gorman SUSE Labs From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758018Ab2FZAKZ (ORCPT ); Mon, 25 Jun 2012 20:10:25 -0400 Received: from LGEMRELSE7Q.lge.com ([156.147.1.151]:45277 "EHLO LGEMRELSE7Q.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757244Ab2FZAKX (ORCPT ); Mon, 25 Jun 2012 20:10:23 -0400 X-AuditID: 9c930197-b7c94ae0000037ff-5a-4fe8fded5249 Message-ID: <4FE8FDEF.7030306@kernel.org> Date: Tue, 26 Jun 2012 09:10:23 +0900 From: Minchan Kim User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:12.0) Gecko/20120430 Thunderbird/12.0.1 MIME-Version: 1.0 Newsgroups: gmane.linux.kernel.mm,gmane.linux.kernel To: Mel Gorman CC: akpm@linux-foundation.org, KOSAKI Motohiro , KAMEZAWA Hiroyuki , Aaditya Kumar , LKML , linux-mm , Mel Gorman Subject: Re: [PATCH 2/2] memory-hotplug: fix kswapd looping forever problem References: <1340600367-23620-1-git-send-email-minchan@kernel.org> <1340600367-23620-3-git-send-email-minchan@kernel.org> <20120625100247.GC8103@csn.ul.ie> In-Reply-To: <20120625100247.GC8103@csn.ul.ie> Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: 7bit X-Brightmail-Tracker: AAAAAA== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Mel, On 06/25/2012 07:02 PM, Mel Gorman wrote: > On Mon, Jun 25, 2012 at 01:59:27PM +0900, Minchan Kim wrote: >> When hotplug offlining happens on zone A, it starts to mark freed page >> as MIGRATE_ISOLATE type in buddy for preventing further allocation. >> (MIGRATE_ISOLATE is very irony type because it's apparently on buddy >> but we can't allocate them). >> When the memory shortage happens during hotplug offlining, >> current task starts to reclaim, then wake up kswapd. >> Kswapd checks watermark, then go sleep because current zone_watermark_ok_safe >> doesn't consider MIGRATE_ISOLATE freed page count. >> Current task continue to reclaim in direct reclaim path without kswapd's helping. >> The problem is that zone->all_unreclaimable is set by only kswapd >> so that current task would be looping forever like below. >> >> __alloc_pages_slowpath >> restart: >> wake_all_kswapd >> rebalance: >> __alloc_pages_direct_reclaim >> do_try_to_free_pages >> if global_reclaim && !all_unreclaimable >> return 1; /* It means we did did_some_progress */ >> skip __alloc_pages_may_oom >> should_alloc_retry >> goto rebalance; >> >> If we apply KOSAKI's patch[1] which doesn't depends on kswapd >> about setting zone->all_unreclaimable, we can solve this problem >> by killing some task in direct reclaim path. But it doesn't wake up kswapd, still. >> It could be a problem still if other subsystem needs GFP_ATOMIC request. >> So kswapd should consider MIGRATE_ISOLATE when it calculate free pages >> BEFORE going sleep. >> >> This patch counts the number of MIGRATE_ISOLATE page block and >> zone_watermark_ok_safe will consider it if the system has such blocks >> (fortunately, it's very rare so no problem in POV overhead and kswapd is never >> hotpath). >> >> [1] http://lkml.org/lkml/2012/6/14/74 >> > > I have not been following this discussion at all but reading through the > patch I wonder again why memory hotplug is not "allocating" the pageblocks > when they are fully isolated like a balloon driver. This would keep the > free space accounting as it is currently without introducing something > memory hotplug specific to kswapd. > > I think historically that memory hotplug did not allocate pages because > it would be difficult to detect if a pageblock was isolated or if part of > some balloon. Allocating just full pageblocks would work around this. > However, it would play very badly with CMA. > > It'd be worth mentioning this in the changelog in case someone tries to > "fix" it. > >> Suggested-by: KOSAKI Motohiro >> Cc: KAMEZAWA Hiroyuki >> Cc: Aaditya Kumar >> Cc: Mel Gorman >> Signed-off-by: Minchan Kim >> --- >> >> Aaditya, coul you confirm this patch solve your problem and >> make sure nr_migrate_isolate is zero after hotplug end? >> >> Thanks! >> >> include/linux/mmzone.h | 8 ++++++++ >> mm/page_alloc.c | 36 ++++++++++++++++++++++++++++++++++++ >> mm/page_isolation.c | 43 +++++++++++++++++++++++++++++++++++++++++-- >> 3 files changed, 85 insertions(+), 2 deletions(-) >> >> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h >> index bf3404e..290e186 100644 >> --- a/include/linux/mmzone.h >> +++ b/include/linux/mmzone.h >> @@ -474,6 +474,14 @@ struct zone { >> * rarely used fields: >> */ >> const char *name; >> +#ifdef CONFIG_MEMORY_ISOLATION >> + /* >> + * the number of MIGRATE_ISOLATE pageblock >> + * We need this for accurate free page counting. >> + * It's protected by zone->lock >> + */ >> + atomic_t nr_migrate_isolate; >> +#endif > > If it's protected by the zone->lock then it does not need to be atomic. Slaps self. :-( > >> } ____cacheline_internodealigned_in_smp; >> >> typedef enum { >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> index c175fa9..626f877 100644 >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -218,6 +218,11 @@ EXPORT_SYMBOL(nr_online_nodes); >> >> int page_group_by_mobility_disabled __read_mostly; >> >> +/* >> + * NOTE: >> + * Don't use set_pageblock_migratetype(page, MIGRATE_ISOLATE) direclty. > > s/direclty/directly/ Will fix. > >> + * Instead, use {un}set_pageblock_isolate. >> + */ >> void set_pageblock_migratetype(struct page *page, int migratetype) >> { >> if (unlikely(page_group_by_mobility_disabled)) >> @@ -1614,6 +1619,28 @@ static bool __zone_watermark_ok(struct zone *z, int order, unsigned long mark, >> return true; >> } >> >> +static unsigned long migrate_isolate_pages(struct zone *zone) >> +{ > > This name is very misleading as nothing is being migrated. The Agree. > other zone-based counters are stored in vmstat[] and accessed > with zone_page_state(). I doubt you want to use vmstat but > nr_isolated_zone_pages() would have been a better name. I feel it's kinda NR_ISOLATE_{ANON + FILE}. So I think it would be better to use nr_zone_isolate_freepages. > >> + unsigned long nr_pages = 0; >> + >> + if (unlikely(atomic_read(&zone->nr_migrate_isolate))) { >> + unsigned long flags; >> + int order; >> + spin_lock_irqsave(&zone->lock, flags); >> + for (order = 0; order < MAX_ORDER; order++) { >> + struct free_area *area = &zone->free_area[order]; >> + long count = 0; >> + struct list_head *curr; >> + >> + list_for_each(curr, &area->free_list[MIGRATE_ISOLATE]) >> + count++; >> + nr_pages += (count << order); >> + } >> + spin_unlock_irqrestore(&zone->lock, flags); >> + } > > We have a zone->nr_migrate_isolate counter but have to search the buddy > lists to count how many pages are isolated? Don't bother. If the pageblocks > really have been isolated just assume that they are fully isolated for the > purposes of figuring out if kswapd should wake up or not. The consequences > of an inaccurate count is that kswapd wakes up when it potentially could > have stayed asleep but for memory hotplug that is desirable. Fair enough. > >> + return nr_pages; >> +} >> + >> bool zone_watermark_ok(struct zone *z, int order, unsigned long mark, >> int classzone_idx, int alloc_flags) >> { >> @@ -1629,6 +1656,14 @@ bool zone_watermark_ok_safe(struct zone *z, int order, unsigned long mark, >> if (z->percpu_drift_mark && free_pages < z->percpu_drift_mark) >> free_pages = zone_page_state_snapshot(z, NR_FREE_PAGES); >> >> + /* >> + * If the zone has MIGRATE_ISOLATE type free page, >> + * we should consider it, too. Otherwise, kswapd can sleep forever. >> + */ >> + free_pages -= migrate_isolate_pages(z); >> + if (free_pages < 0) >> + free_pages = 0; >> + > > You are already taking into account that the numbet of isolated pages may > be inaccurate so an exact count in migrate_isolate_pages is unnecessary. I will use following code instead of migrate_isolate_pages's accurate counter. int migrate_isolate_pages(struct zone *z) { return z->nr_migrate_isolate * pageblock_nr_pages; } free_pages -= migrate_isolate_pages(z); if (free_pages < 0) free_pages = 0; Otherwise, simply, we can always wake up kswapd during hot-plug because it's temporal action and would end sooner or later. It is likely to make unnecessary aging and CPU consumption but it would be very rare. zone_watermark_ok_safe() { if (z->nr_migrate_isolate > 0) return false; .. .. } > >> return __zone_watermark_ok(z, order, mark, classzone_idx, alloc_flags, >> free_pages); >> } >> @@ -4407,6 +4442,7 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat, >> lruvec_init(&zone->lruvec, zone); >> zap_zone_vm_stats(zone); >> zone->flags = 0; >> + atomic_set(&zone->nr_migrate_isolate, 0); >> if (!size) >> continue; >> >> diff --git a/mm/page_isolation.c b/mm/page_isolation.c >> index 1a9cb36..e95a792 100644 >> --- a/mm/page_isolation.c >> +++ b/mm/page_isolation.c >> @@ -8,6 +8,45 @@ >> #include >> #include "internal.h" >> >> +static void set_pageblock_isolate(struct zone *zone, struct page *page) >> +{ >> + int old_migratetype; >> + assert_spin_locked(&zone->lock); >> + >> + if (unlikely(page_group_by_mobility_disabled)) { >> + set_pageblock_flags_group(page, MIGRATE_UNMOVABLE, >> + PB_migrate, PB_migrate_end); >> + return; >> + } > > Not sure why this is necessary. Yeb. hotplug should work regardless of small memory system(KOSAKI already pointed out) but current hotplug code have used set_pageblock_migratetype(page, MIGRATE_ISOLATE) which already check page_group_by_mobility_disabled. It's like buggy. But I think it should be fixed as another patch. > >> + >> + old_migratetype = get_pageblock_migratetype(page); >> + set_pageblock_flags_group(page, MIGRATE_ISOLATE, >> + PB_migrate, PB_migrate_end); >> + >> + if (old_migratetype != MIGRATE_ISOLATE) >> + atomic_inc(&zone->nr_migrate_isolate); >> +} > > If the old type was MIGRATE_ISOLATE then it was also unnecessary to call > set_pageblock_flags_group() or anything else. It's also unnecessary to > pass in zone because you can figure it out from the page but no harm. > > This could have been a lot easier to read with something like; > > static void set_pageblock_isolate(struct zone *zone, struct page *page) > { > BUG_ON(page_zone(page) ! = zone); > if (get_pageblock_migratetype(page) == MIGRATE_ISOLATE) > return; > > set_pageblock_migratetype(page, MIGRATE_ISOLATE); > zone->nr_pageblock_isolate++; > } > > If set_migratetype_isolate is the only caller then it barely warrents a > helper functions because it simply looks like > > if (get_pageblock_migratetype(page) != MIGRATE_ISOLATE) { > set_pageblock_migratetype(page, MIGRATE_ISOLATE); > zone->nr_pageblock_isolate++; > } > > >> + >> +static void unset_pageblock_isolate(struct zone *zone, struct page *page, >> + unsigned long migratetype) >> +{ > > The word "unset" in this context would normally refer to a boolean but > you're actually restoring an old value. Hence reset_pageblock_isolate or > restore_pageblock_migratetype would have been more appropriate. > > Oh, I see you are matching the naming of unset_migratetype_isolate(). > That sucks, hope it was not me that suggested that name originally :/ Naming/comment is very hard part rather than coding on non-native speakers. Will fix. > > migratetype is almost always int too, not sure where that unsigned long > came out of. Will fix. > >> + assert_spin_locked(&zone->lock); >> + >> + if (unlikely(page_group_by_mobility_disabled)) { >> + set_pageblock_flags_group(page, migratetype, >> + PB_migrate, PB_migrate_end); >> + return; >> + } >> + >> + BUG_ON(get_pageblock_migratetype(page) != MIGRATE_ISOLATE); >> + BUG_ON(migratetype == MIGRATE_ISOLATE); >> + >> + set_pageblock_flags_group(page, migratetype, >> + PB_migrate, PB_migrate_end); >> + atomic_dec(&zone->nr_migrate_isolate); >> + BUG_ON(atomic_read(&zone->nr_migrate_isolate) < 0); >> +} >> + > > static void reset_pageblock_isolate(struct zone *zone, struct page *page, > int migratetype) > { > if (WARN_ON(get_pageblock_migratetype(page) != MIGRATE_ISOLATE)) > return; > > BUG_ON(zone->nr_pageblock_isolate == 0); > set_pageblock_migratetype(page, migratetype); > zone->nr_pageblock_isolate--; > } > > >> int set_migratetype_isolate(struct page *page) >> { >> struct zone *zone; >> @@ -54,7 +93,7 @@ int set_migratetype_isolate(struct page *page) >> >> out: >> if (!ret) { >> - set_pageblock_migratetype(page, MIGRATE_ISOLATE); >> + set_pageblock_isolate(zone, page); >> move_freepages_block(zone, page, MIGRATE_ISOLATE); >> } >> >> @@ -72,8 +111,8 @@ void unset_migratetype_isolate(struct page *page, unsigned migratetype) >> spin_lock_irqsave(&zone->lock, flags); >> if (get_pageblock_migratetype(page) != MIGRATE_ISOLATE) >> goto out; >> - set_pageblock_migratetype(page, migratetype); >> move_freepages_block(zone, page, migratetype); >> + unset_pageblock_isolate(zone, page, migratetype); >> out: >> spin_unlock_irqrestore(&zone->lock, flags); >> } > > While this patch looks like it would work as advertised I also think that > it is more complicated than it needs to be. The use of atomics and exact > counts of isolated pages look unnecessary. set_migeratetype_isolate and > unset_pageblock_isolate could have been a lot easier to read. > Thanks for the good review, Mel! -- Kind regards, Minchan Kim