From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f49.google.com (mail-pa0-f49.google.com [209.85.220.49]) by kanga.kvack.org (Postfix) with ESMTP id B73366B0031 for ; Thu, 9 Jan 2014 02:04:33 -0500 (EST) Received: by mail-pa0-f49.google.com with SMTP id kx10so2916759pab.36 for ; Wed, 08 Jan 2014 23:04:33 -0800 (PST) Received: from LGEMRELSE7Q.lge.com (LGEMRELSE7Q.lge.com. [156.147.1.151]) by mx.google.com with ESMTP id eb3si2955614pbd.257.2014.01.08.23.04.30 for ; Wed, 08 Jan 2014 23:04:32 -0800 (PST) From: Joonsoo Kim Subject: [PATCH 0/7] improve robustness on handling migratetype Date: Thu, 9 Jan 2014 16:04:40 +0900 Message-Id: <1389251087-10224-1-git-send-email-iamjoonsoo.kim@lge.com> Sender: owner-linux-mm@kvack.org List-ID: To: Andrew Morton Cc: "Kirill A. Shutemov" , Rik van Riel , Jiang Liu , Mel Gorman , Cody P Schafer , Johannes Weiner , Michal Hocko , Minchan Kim , Michal Nazarewicz , Andi Kleen , Wei Yongjun , Tang Chen , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Joonsoo Kim , Joonsoo Kim Hello, I found some weaknesses on handling migratetype during code review and testing CMA. First, we don't have any synchronization method on get/set pageblock migratetype. When we change migratetype, we hold the zone lock. So writer-writer race doesn't exist. But while someone changes migratetype, others can get migratetype. This may introduce totally unintended value as migratetype. Although I haven't heard of any problem report about that, it is better to protect properly. Second, (get/set)_freepage_migrate isn't used properly. I guess that it would be introduced for per cpu page(pcp) performance, but, it is also used by memory isolation, now. For that case, the information isn't enough to use, so we need to fix it. Third, there is the problem on buddy allocator. It doesn't consider migratetype when merging buddy, so pages from cma or isolate region can be moved to other migratetype freelist. It makes CMA failed over and over. To prevent it, the buddy allocator should consider migratetype if CMA/ISOLATE is enabled. This patchset is aimed at fixing these problems and based on v3.13-rc7. Thanks. Joonsoo Kim (7): mm/page_alloc: synchronize get/set pageblock mm/cma: fix cma free page accounting mm/page_alloc: move set_freepage_migratetype() to better place mm/isolation: remove invalid check condition mm/page_alloc: separate interface to set/get migratetype of freepage mm/page_alloc: store freelist migratetype to the page on buddy properly mm/page_alloc: don't merge MIGRATE_(CMA|ISOLATE) pages on buddy include/linux/mm.h | 35 +++++++++++++++++++++--- include/linux/mmzone.h | 2 ++ include/linux/page-isolation.h | 1 - mm/page_alloc.c | 59 ++++++++++++++++++++++++++-------------- mm/page_isolation.c | 5 +--- 5 files changed, 73 insertions(+), 29 deletions(-) -- 1.7.9.5 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f47.google.com (mail-pa0-f47.google.com [209.85.220.47]) by kanga.kvack.org (Postfix) with ESMTP id 2209D6B0037 for ; Thu, 9 Jan 2014 02:04:34 -0500 (EST) Received: by mail-pa0-f47.google.com with SMTP id kq14so2901365pab.6 for ; Wed, 08 Jan 2014 23:04:33 -0800 (PST) Received: from LGEMRELSE7Q.lge.com (LGEMRELSE7Q.lge.com. [156.147.1.151]) by mx.google.com with ESMTP id sj5si2938579pab.342.2014.01.08.23.04.31 for ; Wed, 08 Jan 2014 23:04:32 -0800 (PST) From: Joonsoo Kim Subject: [PATCH 1/7] mm/page_alloc: synchronize get/set pageblock Date: Thu, 9 Jan 2014 16:04:41 +0900 Message-Id: <1389251087-10224-2-git-send-email-iamjoonsoo.kim@lge.com> In-Reply-To: <1389251087-10224-1-git-send-email-iamjoonsoo.kim@lge.com> References: <1389251087-10224-1-git-send-email-iamjoonsoo.kim@lge.com> Sender: owner-linux-mm@kvack.org List-ID: To: Andrew Morton Cc: "Kirill A. Shutemov" , Rik van Riel , Jiang Liu , Mel Gorman , Cody P Schafer , Johannes Weiner , Michal Hocko , Minchan Kim , Michal Nazarewicz , Andi Kleen , Wei Yongjun , Tang Chen , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Joonsoo Kim , Joonsoo Kim Now get/set pageblock is done without any syncronization. Therefore there is race condition and migratetype can be unintended value. Sometime we move some pageblocks from one migratetype to the other type, and, at the sametime, some page in this pageblock could be freed. In this case, we can get totally unintended value, since get/set pageblock don't get/set atomically. Instead, it is accessed in bit unit. Since set pageblock isn't used frequently rather than get pageblock, I think that seqlock is proper method to synchronize it. This type of lock has minimum overhead if there are a lot of readers and few of writers. So it fits to this situation. Signed-off-by: Joonsoo Kim diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index bd791e4..feaa607 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -79,6 +79,7 @@ static inline int get_pageblock_migratetype(struct page *page) { return get_pageblock_flags_group(page, PB_migrate, PB_migrate_end); } +void set_pageblock_migratetype(struct page *page, int migratetype); struct free_area { struct list_head free_list[MIGRATE_TYPES]; @@ -367,6 +368,7 @@ struct zone { #endif struct free_area free_area[MAX_ORDER]; + seqlock_t pageblock_seqlock; #ifndef CONFIG_SPARSEMEM /* * Flags for a pageblock_nr_pages block. See pageblock-flags.h. diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h index 3fff8e7..58e2a89 100644 --- a/include/linux/page-isolation.h +++ b/include/linux/page-isolation.h @@ -23,7 +23,6 @@ static inline bool is_migrate_isolate(int migratetype) bool has_unmovable_pages(struct zone *zone, struct page *page, int count, bool skip_hwpoisoned_pages); -void set_pageblock_migratetype(struct page *page, int migratetype); int move_freepages_block(struct zone *zone, struct page *page, int migratetype); int move_freepages(struct zone *zone, diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 5248fe0..b36aa5a 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -4788,6 +4788,7 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat, spin_lock_init(&zone->lock); spin_lock_init(&zone->lru_lock); zone_seqlock_init(zone); + seqlock_init(&zone->pageblock_seqlock); zone->zone_pgdat = pgdat; zone_pcp_init(zone); @@ -5927,15 +5928,19 @@ unsigned long get_pageblock_flags_group(struct page *page, unsigned long pfn, bitidx; unsigned long flags = 0; unsigned long value = 1; + unsigned int seq; zone = page_zone(page); pfn = page_to_pfn(page); bitmap = get_pageblock_bitmap(zone, pfn); bitidx = pfn_to_bitidx(zone, pfn); - for (; start_bitidx <= end_bitidx; start_bitidx++, value <<= 1) - if (test_bit(bitidx + start_bitidx, bitmap)) - flags |= value; + do { + seq = read_seqbegin(&zone->pageblock_seqlock); + for (; start_bitidx <= end_bitidx; start_bitidx++, value <<= 1) + if (test_bit(bitidx + start_bitidx, bitmap)) + flags |= value; + } while (read_seqretry(&zone->pageblock_seqlock, seq)); return flags; } @@ -5954,6 +5959,7 @@ void set_pageblock_flags_group(struct page *page, unsigned long flags, unsigned long *bitmap; unsigned long pfn, bitidx; unsigned long value = 1; + unsigned long irq_flags; zone = page_zone(page); pfn = page_to_pfn(page); @@ -5961,11 +5967,13 @@ void set_pageblock_flags_group(struct page *page, unsigned long flags, bitidx = pfn_to_bitidx(zone, pfn); VM_BUG_ON(!zone_spans_pfn(zone, pfn)); + write_seqlock_irqsave(&zone->pageblock_seqlock, irq_flags); for (; start_bitidx <= end_bitidx; start_bitidx++, value <<= 1) if (flags & value) __set_bit(bitidx + start_bitidx, bitmap); else __clear_bit(bitidx + start_bitidx, bitmap); + write_sequnlock_irqrestore(&zone->pageblock_seqlock, irq_flags); } /* -- 1.7.9.5 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f51.google.com (mail-pa0-f51.google.com [209.85.220.51]) by kanga.kvack.org (Postfix) with ESMTP id A6D906B0037 for ; Thu, 9 Jan 2014 02:04:35 -0500 (EST) Received: by mail-pa0-f51.google.com with SMTP id fa1so2925491pad.38 for ; Wed, 08 Jan 2014 23:04:35 -0800 (PST) Received: from LGEMRELSE7Q.lge.com (LGEMRELSE7Q.lge.com. [156.147.1.151]) by mx.google.com with ESMTP id eb3si2955614pbd.257.2014.01.08.23.04.32 for ; Wed, 08 Jan 2014 23:04:34 -0800 (PST) From: Joonsoo Kim Subject: [PATCH 2/7] mm/cma: fix cma free page accounting Date: Thu, 9 Jan 2014 16:04:42 +0900 Message-Id: <1389251087-10224-3-git-send-email-iamjoonsoo.kim@lge.com> In-Reply-To: <1389251087-10224-1-git-send-email-iamjoonsoo.kim@lge.com> References: <1389251087-10224-1-git-send-email-iamjoonsoo.kim@lge.com> Sender: owner-linux-mm@kvack.org List-ID: To: Andrew Morton Cc: "Kirill A. Shutemov" , Rik van Riel , Jiang Liu , Mel Gorman , Cody P Schafer , Johannes Weiner , Michal Hocko , Minchan Kim , Michal Nazarewicz , Andi Kleen , Wei Yongjun , Tang Chen , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Joonsoo Kim , Joonsoo Kim Cma pages can be allocated by not only order 0 request but also high order request. So, we should consider to account free cma page in the both places. Signed-off-by: Joonsoo Kim diff --git a/mm/page_alloc.c b/mm/page_alloc.c index b36aa5a..1489c301 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1091,6 +1091,12 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype) start_migratetype, migratetype); + /* CMA pages cannot be stolen */ + if (is_migrate_cma(migratetype)) { + __mod_zone_page_state(zone, + NR_FREE_CMA_PAGES, -(1 << order)); + } + /* Remove the page from the freelists */ list_del(&page->lru); rmv_page_order(page); @@ -1175,9 +1181,6 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order, } set_freepage_migratetype(page, mt); list = &page->lru; - if (is_migrate_cma(mt)) - __mod_zone_page_state(zone, NR_FREE_CMA_PAGES, - -(1 << order)); } __mod_zone_page_state(zone, NR_FREE_PAGES, -(i << order)); spin_unlock(&zone->lock); -- 1.7.9.5 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pd0-f171.google.com (mail-pd0-f171.google.com [209.85.192.171]) by kanga.kvack.org (Postfix) with ESMTP id 229E16B0039 for ; Thu, 9 Jan 2014 02:04:36 -0500 (EST) Received: by mail-pd0-f171.google.com with SMTP id z10so2788159pdj.30 for ; Wed, 08 Jan 2014 23:04:35 -0800 (PST) Received: from LGEMRELSE7Q.lge.com (LGEMRELSE7Q.lge.com. [156.147.1.151]) by mx.google.com with ESMTP id sj5si2938579pab.342.2014.01.08.23.04.33 for ; Wed, 08 Jan 2014 23:04:34 -0800 (PST) From: Joonsoo Kim Subject: [PATCH 3/7] mm/page_alloc: move set_freepage_migratetype() to better place Date: Thu, 9 Jan 2014 16:04:43 +0900 Message-Id: <1389251087-10224-4-git-send-email-iamjoonsoo.kim@lge.com> In-Reply-To: <1389251087-10224-1-git-send-email-iamjoonsoo.kim@lge.com> References: <1389251087-10224-1-git-send-email-iamjoonsoo.kim@lge.com> Sender: owner-linux-mm@kvack.org List-ID: To: Andrew Morton Cc: "Kirill A. Shutemov" , Rik van Riel , Jiang Liu , Mel Gorman , Cody P Schafer , Johannes Weiner , Michal Hocko , Minchan Kim , Michal Nazarewicz , Andi Kleen , Wei Yongjun , Tang Chen , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Joonsoo Kim , Joonsoo Kim set_freepage_migratetype() inform us of the buddy freelist where the page should be linked when it goes to buddy freelist. Now, it has done in rmqueue_bulk() so that we should call get_pageblock_migratetype() to know it's migratetype exactly if CONFIG_CMA is enabled. That function has some overhead so that removing it is preferable. To remove it, we move set_freepage_migratetype() to __rmqueue_fallback() and __rmqueue_smallest(). In those functions, we can know migratetype easily so that we don't need to call get_pageblock_migratetype(). Removing is_migrate_isolate() is safe since what we want to ensure is that the page from cma will not go to other migratetype freelist. Signed-off-by: Joonsoo Kim diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 1489c301..4913829 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -903,6 +903,7 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order, rmv_page_order(page); area->nr_free--; expand(zone, page, order, current_order, area, migratetype); + set_freepage_migratetype(page, migratetype); return page; } @@ -1093,8 +1094,12 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype) /* CMA pages cannot be stolen */ if (is_migrate_cma(migratetype)) { + set_freepage_migratetype(page, migratetype); __mod_zone_page_state(zone, NR_FREE_CMA_PAGES, -(1 << order)); + } else { + set_freepage_migratetype(page, + start_migratetype); } /* Remove the page from the freelists */ @@ -1153,7 +1158,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order, unsigned long count, struct list_head *list, int migratetype, int cold) { - int mt = migratetype, i; + int i; spin_lock(&zone->lock); for (i = 0; i < count; ++i) { @@ -1174,12 +1179,6 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order, list_add(&page->lru, list); else list_add_tail(&page->lru, list); - if (IS_ENABLED(CONFIG_CMA)) { - mt = get_pageblock_migratetype(page); - if (!is_migrate_cma(mt) && !is_migrate_isolate(mt)) - mt = migratetype; - } - set_freepage_migratetype(page, mt); list = &page->lru; } __mod_zone_page_state(zone, NR_FREE_PAGES, -(i << order)); -- 1.7.9.5 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f48.google.com (mail-pa0-f48.google.com [209.85.220.48]) by kanga.kvack.org (Postfix) with ESMTP id A20356B0039 for ; Thu, 9 Jan 2014 02:04:37 -0500 (EST) Received: by mail-pa0-f48.google.com with SMTP id lf10so616253pab.35 for ; Wed, 08 Jan 2014 23:04:37 -0800 (PST) Received: from LGEMRELSE7Q.lge.com (LGEMRELSE7Q.lge.com. [156.147.1.151]) by mx.google.com with ESMTP id eb3si2955614pbd.257.2014.01.08.23.04.34 for ; Wed, 08 Jan 2014 23:04:36 -0800 (PST) From: Joonsoo Kim Subject: [PATCH 4/7] mm/isolation: remove invalid check condition Date: Thu, 9 Jan 2014 16:04:44 +0900 Message-Id: <1389251087-10224-5-git-send-email-iamjoonsoo.kim@lge.com> In-Reply-To: <1389251087-10224-1-git-send-email-iamjoonsoo.kim@lge.com> References: <1389251087-10224-1-git-send-email-iamjoonsoo.kim@lge.com> Sender: owner-linux-mm@kvack.org List-ID: To: Andrew Morton Cc: "Kirill A. Shutemov" , Rik van Riel , Jiang Liu , Mel Gorman , Cody P Schafer , Johannes Weiner , Michal Hocko , Minchan Kim , Michal Nazarewicz , Andi Kleen , Wei Yongjun , Tang Chen , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Joonsoo Kim , Joonsoo Kim test_page_isolated() checks stability of pages. It checks two conditions, one is that the page is on isolate migratetype and the other is that the page is on the buddy and the isolate freelist. With satisfying these two conditions, we can determine that the page is stable and then go forward. __test_page_isolated_in_pageblock() is one of the main functions for this test. In that function, if it meets the page with page_count 0 and isolate migratetype, it decides that this page is stable. But this is not true, because there is possiblity that this kind of page is on the pcp and then it can be allocated by other users even though we hold the zone lock. So removing this check. Signed-off-by: Joonsoo Kim diff --git a/mm/page_isolation.c b/mm/page_isolation.c index d1473b2..534fb3a 100644 --- a/mm/page_isolation.c +++ b/mm/page_isolation.c @@ -199,9 +199,6 @@ __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn, } pfn += 1 << page_order(page); } - else if (page_count(page) == 0 && - get_freepage_migratetype(page) == MIGRATE_ISOLATE) - pfn += 1; else if (skip_hwpoisoned_pages && PageHWPoison(page)) { /* * The HWPoisoned page may be not in buddy -- 1.7.9.5 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pb0-f50.google.com (mail-pb0-f50.google.com [209.85.160.50]) by kanga.kvack.org (Postfix) with ESMTP id 2AD796B003B for ; Thu, 9 Jan 2014 02:04:38 -0500 (EST) Received: by mail-pb0-f50.google.com with SMTP id rr13so2645418pbb.37 for ; Wed, 08 Jan 2014 23:04:37 -0800 (PST) Received: from LGEMRELSE7Q.lge.com (LGEMRELSE7Q.lge.com. [156.147.1.151]) by mx.google.com with ESMTP id sj5si2938579pab.342.2014.01.08.23.04.34 for ; Wed, 08 Jan 2014 23:04:36 -0800 (PST) From: Joonsoo Kim Subject: [PATCH 5/7] mm/page_alloc: separate interface to set/get migratetype of freepage Date: Thu, 9 Jan 2014 16:04:45 +0900 Message-Id: <1389251087-10224-6-git-send-email-iamjoonsoo.kim@lge.com> In-Reply-To: <1389251087-10224-1-git-send-email-iamjoonsoo.kim@lge.com> References: <1389251087-10224-1-git-send-email-iamjoonsoo.kim@lge.com> Sender: owner-linux-mm@kvack.org List-ID: To: Andrew Morton Cc: "Kirill A. Shutemov" , Rik van Riel , Jiang Liu , Mel Gorman , Cody P Schafer , Johannes Weiner , Michal Hocko , Minchan Kim , Michal Nazarewicz , Andi Kleen , Wei Yongjun , Tang Chen , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Joonsoo Kim , Joonsoo Kim Currently, we use (set/get)_freepage_migratetype in two use cases. One is to know the buddy list where this page will be linked and the other is to know the buddy list where this page is linked now. But, we should deal these two use cases differently, because information isn't sufficient for the second use case and properly setting this information needs some overhead. Whenever the page is merged or split in buddy, this information isn't properly re-assigned and it may not have enough information for the second use case. This patch just separates interface, so there is no functional change. Following patch will do further steps about this issue. Signed-off-by: Joonsoo Kim diff --git a/include/linux/mm.h b/include/linux/mm.h index 3552717..2733e0b 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -257,14 +257,31 @@ struct inode; #define page_private(page) ((page)->private) #define set_page_private(page, v) ((page)->private = (v)) -/* It's valid only if the page is free path or free_list */ -static inline void set_freepage_migratetype(struct page *page, int migratetype) +/* + * It's valid only if the page is on buddy. It represents + * which freelist the page is linked. + */ +static inline void set_buddy_migratetype(struct page *page, int migratetype) +{ + page->index = migratetype; +} + +static inline int get_buddy_migratetype(struct page *page) +{ + return page->index; +} + +/* + * It's valid only if the page is on pcp list. It represents + * which freelist the page should go on buddy. + */ +static inline void set_pcp_migratetype(struct page *page, int migratetype) { page->index = migratetype; } -/* It's valid only if the page is free path or free_list */ -static inline int get_freepage_migratetype(struct page *page) +/* It's valid only if the page is on pcp list */ +static inline int get_pcp_migratetype(struct page *page) { return page->index; } diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 4913829..c9e6622 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -681,7 +681,7 @@ static void free_pcppages_bulk(struct zone *zone, int count, page = list_entry(list->prev, struct page, lru); /* must delete as __free_one_page list manipulates */ list_del(&page->lru); - mt = get_freepage_migratetype(page); + mt = get_pcp_migratetype(page); /* MIGRATE_MOVABLE list may include MIGRATE_RESERVEs */ __free_one_page(page, zone, 0, mt); trace_mm_page_pcpu_drain(page, 0, mt); @@ -745,7 +745,7 @@ static void __free_pages_ok(struct page *page, unsigned int order) local_irq_save(flags); __count_vm_events(PGFREE, 1 << order); migratetype = get_pageblock_migratetype(page); - set_freepage_migratetype(page, migratetype); + set_buddy_migratetype(page, migratetype); free_one_page(page_zone(page), page, order, migratetype); local_irq_restore(flags); } @@ -903,7 +903,7 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order, rmv_page_order(page); area->nr_free--; expand(zone, page, order, current_order, area, migratetype); - set_freepage_migratetype(page, migratetype); + set_pcp_migratetype(page, migratetype); return page; } @@ -971,7 +971,7 @@ int move_freepages(struct zone *zone, order = page_order(page); list_move(&page->lru, &zone->free_area[order].free_list[migratetype]); - set_freepage_migratetype(page, migratetype); + set_buddy_migratetype(page, migratetype); page += 1 << order; pages_moved += 1 << order; } @@ -1094,12 +1094,11 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype) /* CMA pages cannot be stolen */ if (is_migrate_cma(migratetype)) { - set_freepage_migratetype(page, migratetype); + set_pcp_migratetype(page, migratetype); __mod_zone_page_state(zone, NR_FREE_CMA_PAGES, -(1 << order)); } else { - set_freepage_migratetype(page, - start_migratetype); + set_pcp_migratetype(page, start_migratetype); } /* Remove the page from the freelists */ @@ -1346,7 +1345,7 @@ void free_hot_cold_page(struct page *page, int cold) return; migratetype = get_pageblock_migratetype(page); - set_freepage_migratetype(page, migratetype); + set_pcp_migratetype(page, migratetype); local_irq_save(flags); __count_vm_event(PGFREE); diff --git a/mm/page_isolation.c b/mm/page_isolation.c index 534fb3a..c341413 100644 --- a/mm/page_isolation.c +++ b/mm/page_isolation.c @@ -190,7 +190,7 @@ __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn, * is MIGRATE_ISOLATE. Catch it and move the page into * MIGRATE_ISOLATE list. */ - if (get_freepage_migratetype(page) != MIGRATE_ISOLATE) { + if (get_buddy_migratetype(page) != MIGRATE_ISOLATE) { struct page *end_page; end_page = page + (1 << page_order(page)) - 1; -- 1.7.9.5 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f46.google.com (mail-pa0-f46.google.com [209.85.220.46]) by kanga.kvack.org (Postfix) with ESMTP id 455B06B003B for ; Thu, 9 Jan 2014 02:04:39 -0500 (EST) Received: by mail-pa0-f46.google.com with SMTP id kp14so2932045pab.33 for ; Wed, 08 Jan 2014 23:04:38 -0800 (PST) Received: from LGEMRELSE7Q.lge.com (LGEMRELSE7Q.lge.com. [156.147.1.151]) by mx.google.com with ESMTP id eb3si2955614pbd.257.2014.01.08.23.04.36 for ; Wed, 08 Jan 2014 23:04:38 -0800 (PST) From: Joonsoo Kim Subject: [PATCH 6/7] mm/page_alloc: store freelist migratetype to the page on buddy properly Date: Thu, 9 Jan 2014 16:04:46 +0900 Message-Id: <1389251087-10224-7-git-send-email-iamjoonsoo.kim@lge.com> In-Reply-To: <1389251087-10224-1-git-send-email-iamjoonsoo.kim@lge.com> References: <1389251087-10224-1-git-send-email-iamjoonsoo.kim@lge.com> Sender: owner-linux-mm@kvack.org List-ID: To: Andrew Morton Cc: "Kirill A. Shutemov" , Rik van Riel , Jiang Liu , Mel Gorman , Cody P Schafer , Johannes Weiner , Michal Hocko , Minchan Kim , Michal Nazarewicz , Andi Kleen , Wei Yongjun , Tang Chen , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Joonsoo Kim , Joonsoo Kim To maintain freelist migratetype information on buddy pages, migratetype should be set again whenever the page order is changed. set_page_order() is the best place to do, because it is called whenever the page order is changed, so this patch adds set_buddy_migratetype() to set_page_order(). And this patch makes set/get_buddy_migratetype() only enabled if it is really needed, because it has some overhead. Signed-off-by: Joonsoo Kim diff --git a/include/linux/mm.h b/include/linux/mm.h index 2733e0b..046e09f 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -258,6 +258,12 @@ struct inode; #define set_page_private(page, v) ((page)->private = (v)) /* + * This is for tracking the type of the list on buddy. + * It imposes some performance overhead to the buddy allocator, + * so we make it enabled only if it is needed. + */ +#if defined(CONFIG_MEMORY_ISOLATION) || defined(CONFIG_CMA) +/* * It's valid only if the page is on buddy. It represents * which freelist the page is linked. */ @@ -270,6 +276,10 @@ static inline int get_buddy_migratetype(struct page *page) { return page->index; } +#else +static inline void set_buddy_migratetype(struct page *page, int migratetype) {} +static inline int get_buddy_migratetype(struct page *page) { return 0; } +#endif /* * It's valid only if the page is on pcp list. It represents diff --git a/mm/page_alloc.c b/mm/page_alloc.c index c9e6622..2548b42 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -446,9 +446,11 @@ static inline void set_page_guard_flag(struct page *page) { } static inline void clear_page_guard_flag(struct page *page) { } #endif -static inline void set_page_order(struct page *page, int order) +static inline void set_page_order(struct page *page, int order, + int migratetype) { set_page_private(page, order); + set_buddy_migratetype(page, migratetype); __SetPageBuddy(page); } @@ -588,7 +590,7 @@ static inline void __free_one_page(struct page *page, page_idx = combined_idx; order++; } - set_page_order(page, order); + set_page_order(page, order, migratetype); /* * If this is not the largest possible page, check if the buddy @@ -745,7 +747,6 @@ static void __free_pages_ok(struct page *page, unsigned int order) local_irq_save(flags); __count_vm_events(PGFREE, 1 << order); migratetype = get_pageblock_migratetype(page); - set_buddy_migratetype(page, migratetype); free_one_page(page_zone(page), page, order, migratetype); local_irq_restore(flags); } @@ -834,7 +835,7 @@ static inline void expand(struct zone *zone, struct page *page, #endif list_add(&page[size].lru, &area->free_list[migratetype]); area->nr_free++; - set_page_order(&page[size], high); + set_page_order(&page[size], high, migratetype); } } -- 1.7.9.5 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f54.google.com (mail-pa0-f54.google.com [209.85.220.54]) by kanga.kvack.org (Postfix) with ESMTP id 4DA9E6B003C for ; Thu, 9 Jan 2014 02:04:41 -0500 (EST) Received: by mail-pa0-f54.google.com with SMTP id kl14so2897635pab.41 for ; Wed, 08 Jan 2014 23:04:40 -0800 (PST) Received: from LGEMRELSE7Q.lge.com (LGEMRELSE7Q.lge.com. [156.147.1.151]) by mx.google.com with ESMTP id sj5si2938579pab.342.2014.01.08.23.04.37 for ; Wed, 08 Jan 2014 23:04:40 -0800 (PST) From: Joonsoo Kim Subject: [PATCH 7/7] mm/page_alloc: don't merge MIGRATE_(CMA|ISOLATE) pages on buddy Date: Thu, 9 Jan 2014 16:04:47 +0900 Message-Id: <1389251087-10224-8-git-send-email-iamjoonsoo.kim@lge.com> In-Reply-To: <1389251087-10224-1-git-send-email-iamjoonsoo.kim@lge.com> References: <1389251087-10224-1-git-send-email-iamjoonsoo.kim@lge.com> Sender: owner-linux-mm@kvack.org List-ID: To: Andrew Morton Cc: "Kirill A. Shutemov" , Rik van Riel , Jiang Liu , Mel Gorman , Cody P Schafer , Johannes Weiner , Michal Hocko , Minchan Kim , Michal Nazarewicz , Andi Kleen , Wei Yongjun , Tang Chen , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Joonsoo Kim , Joonsoo Kim If (MAX_ORDER-1) is greater than pageblock order, there is a possibility to merge different migratetype pages and to be linked in unintended freelist. While I test CMA, CMA pages are merged and linked into MOVABLE freelist by above issue and then, the pages change their migratetype to UNMOVABLE by try_to_steal_freepages(). After that, CMA to this region always fail. To prevent this, we should not merge the page on MIGRATE_(CMA|ISOLATE) freelist. Signed-off-by: Joonsoo Kim diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 2548b42..ea99cee 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -581,6 +581,15 @@ static inline void __free_one_page(struct page *page, __mod_zone_freepage_state(zone, 1 << order, migratetype); } else { + int buddy_mt = get_buddy_migratetype(buddy); + + /* We don't want to merge cma, isolate pages */ + if (unlikely(order >= pageblock_order) && + migratetype != buddy_mt && + (migratetype >= MIGRATE_PCPTYPES || + buddy_mt >= MIGRATE_PCPTYPES)) { + break; + } list_del(&buddy->lru); zone->free_area[order].nr_free--; rmv_page_order(buddy); -- 1.7.9.5 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wi0-f180.google.com (mail-wi0-f180.google.com [209.85.212.180]) by kanga.kvack.org (Postfix) with ESMTP id E0B246B0031 for ; Thu, 9 Jan 2014 04:06:42 -0500 (EST) Received: by mail-wi0-f180.google.com with SMTP id hm19so3166574wib.1 for ; Thu, 09 Jan 2014 01:06:42 -0800 (PST) Received: from mail-we0-x230.google.com (mail-we0-x230.google.com [2a00:1450:400c:c03::230]) by mx.google.com with ESMTPS id x6si2200249wib.85.2014.01.09.01.06.42 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 09 Jan 2014 01:06:42 -0800 (PST) Received: by mail-we0-f176.google.com with SMTP id p61so2431870wes.21 for ; Thu, 09 Jan 2014 01:06:41 -0800 (PST) From: Michal Nazarewicz Subject: Re: [PATCH 0/7] improve robustness on handling migratetype In-Reply-To: <1389251087-10224-1-git-send-email-iamjoonsoo.kim@lge.com> References: <1389251087-10224-1-git-send-email-iamjoonsoo.kim@lge.com> Date: Thu, 09 Jan 2014 10:06:34 +0100 Message-ID: MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Sender: owner-linux-mm@kvack.org List-ID: To: Joonsoo Kim , Andrew Morton Cc: "Kirill A. Shutemov" , Rik van Riel , Jiang Liu , Mel Gorman , Cody P Schafer , Johannes Weiner , Michal Hocko , Minchan Kim , Andi Kleen , Wei Yongjun , Tang Chen , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Joonsoo Kim --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On Thu, Jan 09 2014, Joonsoo Kim wrote: > Third, there is the problem on buddy allocator. It doesn't consider > migratetype when merging buddy, so pages from cma or isolate region can > be moved to other migratetype freelist. It makes CMA failed over and over. > To prevent it, the buddy allocator should consider migratetype if > CMA/ISOLATE is enabled. There should never be situation where a CMA page shares a pageblock (or a max-order page) with a non-CMA page though, so this should never be an issue. --=20 Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=3D./ `o ..o | Computer Science, Micha=C5=82 =E2=80=9Cmina86=E2=80=9D Nazarewicz = (o o) ooo +------ooO--(_)--Ooo-- --=-=-= Content-Type: multipart/signed; boundary="==-=-="; micalg=pgp-sha1; protocol="application/pgp-signature" --==-=-= Content-Type: text/plain --==-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAEBAgAGBQJSzmaaAAoJECBgQBJQdR/03rwP/A70MFWvk9Zz81DwMlNFZQTi jkeQWXQbKlz0Q6W3FPaTZK7nTqIcRQCwfttIGrnvWJOjw9IfLyWNSdzHaI8eQZsB 2Wyw0jUhhv4ARwOSEopst5Z+oeAmeJKHz4meJObdaH+3szfxNZm5wyl0RaNLqDGd 8MsfmnmLbgTNE/l/Fl2lNzGod36T51DMfzgTqVnBO3zK7WU8iAO3jP3y99aoijbZ TbV3IH12T6heh/bYN9ky/FIxZ9LGI59pr0AvrvHzPEHD3ubE6La/40XCbosjsKif MGw+2KmgL4/old+Xdtl6fm1hRAvnjpYk6rBVEzBlSZNcbHu5avfYJjcAfgV9sfMS i9E02AuI3muTPvglCtAUtciBMdDcqbR8DLyJO8SYNqPqSZz4LtVz0pBqkpQyq7Uw onUiuzlLSYvD7U1v8KfmRmhMkeIvBRGma2ifOoUOMCLIkhbBE0EIFztis/0BcjTf CyWHDAMDI2ThkyM+aOgygrS0bnnDCHQDHWv3XvColu+Of6gRQL5gU/KyKI0Mau5T Zh8DQKlX0u0BZ9Kt7HZ44SNrwEDzpf7Ov/KBgZF/Meg/fDnL06KJOMWjWG6jjTLp 1H6mQiqChwyVJ7qS2RoTWtG78sqI/xgBFVGFzogTehypy0UoRB6C64QjfHkg8SIW SKevhN+2RDwN3u2FDdC5 =9KuD -----END PGP SIGNATURE----- --==-=-=-- --=-=-=-- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wi0-f170.google.com (mail-wi0-f170.google.com [209.85.212.170]) by kanga.kvack.org (Postfix) with ESMTP id 9317C6B0031 for ; Thu, 9 Jan 2014 04:08:18 -0500 (EST) Received: by mail-wi0-f170.google.com with SMTP id hq4so6379952wib.3 for ; Thu, 09 Jan 2014 01:08:18 -0800 (PST) Received: from mail-wg0-x229.google.com (mail-wg0-x229.google.com [2a00:1450:400c:c00::229]) by mx.google.com with ESMTPS id r4si896428wjr.86.2014.01.09.01.08.17 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 09 Jan 2014 01:08:17 -0800 (PST) Received: by mail-wg0-f41.google.com with SMTP id y10so5708684wgg.0 for ; Thu, 09 Jan 2014 01:08:17 -0800 (PST) From: Michal Nazarewicz Subject: Re: [PATCH 1/7] mm/page_alloc: synchronize get/set pageblock In-Reply-To: <1389251087-10224-2-git-send-email-iamjoonsoo.kim@lge.com> References: <1389251087-10224-1-git-send-email-iamjoonsoo.kim@lge.com> <1389251087-10224-2-git-send-email-iamjoonsoo.kim@lge.com> Date: Thu, 09 Jan 2014 10:08:10 +0100 Message-ID: MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Sender: owner-linux-mm@kvack.org List-ID: To: Joonsoo Kim , Andrew Morton Cc: "Kirill A. Shutemov" , Rik van Riel , Jiang Liu , Mel Gorman , Cody P Schafer , Johannes Weiner , Michal Hocko , Minchan Kim , Andi Kleen , Wei Yongjun , Tang Chen , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Joonsoo Kim --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On Thu, Jan 09 2014, Joonsoo Kim wrote: > @@ -5927,15 +5928,19 @@ unsigned long get_pageblock_flags_group(struct pa= ge *page, > unsigned long pfn, bitidx; > unsigned long flags =3D 0; > unsigned long value =3D 1; > + unsigned int seq; >=20=20 > zone =3D page_zone(page); > pfn =3D page_to_pfn(page); > bitmap =3D get_pageblock_bitmap(zone, pfn); > bitidx =3D pfn_to_bitidx(zone, pfn); >=20=20 > - for (; start_bitidx <=3D end_bitidx; start_bitidx++, value <<=3D 1) > - if (test_bit(bitidx + start_bitidx, bitmap)) > - flags |=3D value; > + do { + flags =3D 0; > + seq =3D read_seqbegin(&zone->pageblock_seqlock); > + for (; start_bitidx <=3D end_bitidx; start_bitidx++, value <<=3D 1) > + if (test_bit(bitidx + start_bitidx, bitmap)) > + flags |=3D value; > + } while (read_seqretry(&zone->pageblock_seqlock, seq)); >=20=20 > return flags; > } --=20 Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=3D./ `o ..o | Computer Science, Micha=C5=82 =E2=80=9Cmina86=E2=80=9D Nazarewicz = (o o) ooo +------ooO--(_)--Ooo-- --=-=-= Content-Type: multipart/signed; boundary="==-=-="; micalg=pgp-sha1; protocol="application/pgp-signature" --==-=-= Content-Type: text/plain --==-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAEBAgAGBQJSzmb6AAoJECBgQBJQdR/0Ew4P/RsSxW22DyEt524wSIwWSYKS aOVMs2Qv/t5xtuW5wHh1rjxmAAIEoZItYYsYaBHr1tbY0X1Uw0OwcZz6DcKBERlm hSiAHVbg3K62V4LTq0Dj8QXzEgzQvXh9T8Kvin6QBzZIRHWHTHQZweHyPMDCy5Ny 4ATCcT8qEvzCTjq584TC1fYPJgG0X+ZjgTpxNdPzFBVXXrZwTrt7DRrlrVQCtdYb 3OQktscAGv4HlImUJQWRn2pn61eKqoJk4/OcmHQX5EHet2QUZ6Bp0nwy/V2Spyis i1+e4OFc245eMTitDeNR6duI7K/n4IOmgsTePmj7C8uVp1XqdToW2Oic9BxN3td6 4NUw8pIz+f6Fj3BMxYPD5rvBXMAeZ9lxctXT/NTy2EYVWQvVeVN4NApj/WJZ9lD8 lVxb1f8relKG3xdj73juDaZUg9w/fV3b2fuJAKiybEYa5g23Rm5Xk5elZzT8NZKy K5pGBsm18Chr1IZewBfQlVP/MR/M2LO2Dar3q2cTNo+VQJA3a11+gzd9u18hog+m BCn4wazugDfhwLIpMNvKPQJbwKgbTRsrzbibSYt6kRUr6DL86Gh6IYcu0PwUivJK t+x1yYttyZQAgRxosxkTGxMbqyOXEwo7ux56E1VCHQv9yPfEia9KHv1SiBp12HGT D7vD5QTb9GkAlUDykK0o =mRRG -----END PGP SIGNATURE----- --==-=-=-- --=-=-=-- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wg0-f49.google.com (mail-wg0-f49.google.com [74.125.82.49]) by kanga.kvack.org (Postfix) with ESMTP id C42846B0031 for ; Thu, 9 Jan 2014 04:18:08 -0500 (EST) Received: by mail-wg0-f49.google.com with SMTP id a1so223544wgh.28 for ; Thu, 09 Jan 2014 01:18:08 -0800 (PST) Received: from mail-wi0-x22d.google.com (mail-wi0-x22d.google.com [2a00:1450:400c:c05::22d]) by mx.google.com with ESMTPS id ap4si916295wjc.64.2014.01.09.01.18.08 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 09 Jan 2014 01:18:08 -0800 (PST) Received: by mail-wi0-f173.google.com with SMTP id hn9so6634505wib.0 for ; Thu, 09 Jan 2014 01:18:08 -0800 (PST) From: Michal Nazarewicz Subject: Re: [PATCH 5/7] mm/page_alloc: separate interface to set/get migratetype of freepage In-Reply-To: <1389251087-10224-6-git-send-email-iamjoonsoo.kim@lge.com> References: <1389251087-10224-1-git-send-email-iamjoonsoo.kim@lge.com> <1389251087-10224-6-git-send-email-iamjoonsoo.kim@lge.com> Date: Thu, 09 Jan 2014 10:18:00 +0100 Message-ID: MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Sender: owner-linux-mm@kvack.org List-ID: To: Joonsoo Kim , Andrew Morton Cc: "Kirill A. Shutemov" , Rik van Riel , Jiang Liu , Mel Gorman , Cody P Schafer , Johannes Weiner , Michal Hocko , Minchan Kim , Andi Kleen , Wei Yongjun , Tang Chen , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Joonsoo Kim --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On Thu, Jan 09 2014, Joonsoo Kim wrote: > Currently, we use (set/get)_freepage_migratetype in two use cases. > One is to know the buddy list where this page will be linked and > the other is to know the buddy list where this page is linked now. > > But, we should deal these two use cases differently, because information > isn't sufficient for the second use case and properly setting this > information needs some overhead. Whenever the page is merged or split > in buddy, this information isn't properly re-assigned and it may not > have enough information for the second use case. > > This patch just separates interface, so there is no functional change. > Following patch will do further steps about this issue. > > Signed-off-by: Joonsoo Kim Acked-by: Michal Nazarewicz I think this patch would be smaller if it was pushed earlier in the patchset. --=20 Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=3D./ `o ..o | Computer Science, Micha=C5=82 =E2=80=9Cmina86=E2=80=9D Nazarewicz = (o o) ooo +------ooO--(_)--Ooo-- --=-=-= Content-Type: multipart/signed; boundary="==-=-="; micalg=pgp-sha1; protocol="application/pgp-signature" --==-=-= Content-Type: text/plain --==-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAEBAgAGBQJSzmlIAAoJECBgQBJQdR/0aFAP+waDevUQpa9xhmLPbYlXrCpa LO3GprL2KYWtpEnjGAkGmI1ywnsbukNNpXg/q9n1xY/fr7SYQlys9TFnPsydRFq7 R5K3M07ITUEeEl65h269aU86odK1iH246ch3fwjPOrPOz6hmZkwiHUos6dDWE4SN Oe8/FzbhLHVXpKrSrnc9rSdArZfUbjSmPx3Np/32WCWTE9nEQxT5G1tLrRMhd2nh QAyKS93Z4YDwFGRnniibbfC3lns7lRbSAtUUS+SBNXaqQpa8jPA7rklsuDR8YXw1 YLY88ojn7pyW8cZsNn93oe9m9O850EbTJOHzVZIgJeRU04pOWRmKF7WYQSq8ZSvo MvuRBNXz05huYVwyUKvCUAyNmoDhobOSEFE2Go3vaYcA7dhPYMm00VzIdJI1u/w0 63zwaWfVUcqFvnnsOZMTHrJlb/U0Cvv8pBUJcSW8uPL3VNl8P5v4jKXaY7gWMEmq g8h6Pz8Bv3S9qAnO9YDRaT20jcQjVVRnrxya/ovgwhU8l+/qbWMkCQvcMRXXgY7G +oBXZwmRYcFGIdrMox2GbtlrQWFj9C8/VrzlqbJNvAOU76t9PJ/429JENp7hjidL U9TngSMevAAOgbvZzIchVKLKBLXIiCb+RIf87JEYIdLT3sspRBZwgPiBAn/PYXTL kdwG+bGk3CGEGcrT8/Tp =19u7 -----END PGP SIGNATURE----- --==-=-=-- --=-=-=-- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wi0-f169.google.com (mail-wi0-f169.google.com [209.85.212.169]) by kanga.kvack.org (Postfix) with ESMTP id 2A5586B0031 for ; Thu, 9 Jan 2014 04:19:52 -0500 (EST) Received: by mail-wi0-f169.google.com with SMTP id q15so150553wie.2 for ; Thu, 09 Jan 2014 01:19:51 -0800 (PST) Received: from mail-wi0-x230.google.com (mail-wi0-x230.google.com [2a00:1450:400c:c05::230]) by mx.google.com with ESMTPS id b2si2777068wix.13.2014.01.09.01.19.51 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 09 Jan 2014 01:19:51 -0800 (PST) Received: by mail-wi0-f176.google.com with SMTP id hq4so6629025wib.9 for ; Thu, 09 Jan 2014 01:19:51 -0800 (PST) From: Michal Nazarewicz Subject: Re: [PATCH 6/7] mm/page_alloc: store freelist migratetype to the page on buddy properly In-Reply-To: <1389251087-10224-7-git-send-email-iamjoonsoo.kim@lge.com> References: <1389251087-10224-1-git-send-email-iamjoonsoo.kim@lge.com> <1389251087-10224-7-git-send-email-iamjoonsoo.kim@lge.com> Date: Thu, 09 Jan 2014 10:19:44 +0100 Message-ID: MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Sender: owner-linux-mm@kvack.org List-ID: To: Joonsoo Kim , Andrew Morton Cc: "Kirill A. Shutemov" , Rik van Riel , Jiang Liu , Mel Gorman , Cody P Schafer , Johannes Weiner , Michal Hocko , Minchan Kim , Andi Kleen , Wei Yongjun , Tang Chen , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Joonsoo Kim --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On Thu, Jan 09 2014, Joonsoo Kim wrote: > To maintain freelist migratetype information on buddy pages, migratetype > should be set again whenever the page order is changed. set_page_order() > is the best place to do, because it is called whenever the page order is > changed, so this patch adds set_buddy_migratetype() to set_page_order(). > > And this patch makes set/get_buddy_migratetype() only enabled if it is > really needed, because it has some overhead. > > Signed-off-by: Joonsoo Kim Acked-by: Michal Nazarewicz --=20 Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=3D./ `o ..o | Computer Science, Micha=C5=82 =E2=80=9Cmina86=E2=80=9D Nazarewicz = (o o) ooo +------ooO--(_)--Ooo-- --=-=-= Content-Type: multipart/signed; boundary="==-=-="; micalg=pgp-sha1; protocol="application/pgp-signature" --==-=-= Content-Type: text/plain --==-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAEBAgAGBQJSzmmwAAoJECBgQBJQdR/01c4P/Ane1F6g9Fyos/HioQHFOTyT qElpv05RKxm+fQ6zBraYsXxqtybmVeaxm9CLV3ara5lgyFbRlsWYuLGNeVPrs7P9 4GzdiK1Cd16f6F7ljLQ3SdZO0JgumR0hItG1eV5pR32XGmgZkTPJTfAKBtDNnsO+ QDW6WqNL4GAK5k5m9PGpj9h0RAdQK/FhiiK00rjiPkCm+tqsHw4rJrBusOwUKPrv rRSsLRUTPhFLXM6EEL6+BrrdZ6ONjCci9Gq6PImIElz2+QTkNg5qcEMHeIE7phLQ n0LKZ4ojcdTzfRE5vu3w9iCzl8LLlww48HgRcru0faitpNcrs3cVU/h/i4kJ1YWM gWx2l+qwi30C5Rxlx6Kg9wJq/rBw+ZZSe/HE3ndbsL55JyQhJFSDkD0JR4OSbJ/d nLNJPsU3u0X5stHeDSfNakc2S/drDvNsR0JOWtLmme2ruUBjz2MrYNWqGDAaYcNf RkEpln08lsKrNpOdHZK9bUdzVxnADW3nZaJGYu0s1ZNgfg7Ug/CqGg0Mr+uSznW/ YZSeruDxaMFGlckkkwIkYc7IKRz6/wh3jQ2YxPepPOEw5a6uUxIPXoM19EcsKTh+ KL7bEp96FhnH1Us3N/cYLqacRlsIARQVXYx7ydLzsRB5UiKnVu/3pDgJDoGVd29c ozr6HnxHAPC2a7AaX+js =CZkg -----END PGP SIGNATURE----- --==-=-=-- --=-=-=-- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wi0-f176.google.com (mail-wi0-f176.google.com [209.85.212.176]) by kanga.kvack.org (Postfix) with ESMTP id B993D6B0031 for ; Thu, 9 Jan 2014 04:22:11 -0500 (EST) Received: by mail-wi0-f176.google.com with SMTP id hq4so6637193wib.15 for ; Thu, 09 Jan 2014 01:22:11 -0800 (PST) Received: from mail-we0-x22f.google.com (mail-we0-x22f.google.com [2a00:1450:400c:c03::22f]) by mx.google.com with ESMTPS id ui5si930482wjc.22.2014.01.09.01.22.11 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 09 Jan 2014 01:22:11 -0800 (PST) Received: by mail-we0-f175.google.com with SMTP id w62so2464666wes.6 for ; Thu, 09 Jan 2014 01:22:11 -0800 (PST) From: Michal Nazarewicz Subject: Re: [PATCH 7/7] mm/page_alloc: don't merge MIGRATE_(CMA|ISOLATE) pages on buddy In-Reply-To: <1389251087-10224-8-git-send-email-iamjoonsoo.kim@lge.com> References: <1389251087-10224-1-git-send-email-iamjoonsoo.kim@lge.com> <1389251087-10224-8-git-send-email-iamjoonsoo.kim@lge.com> Date: Thu, 09 Jan 2014 10:22:02 +0100 Message-ID: MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Sender: owner-linux-mm@kvack.org List-ID: To: Joonsoo Kim , Andrew Morton Cc: "Kirill A. Shutemov" , Rik van Riel , Jiang Liu , Mel Gorman , Cody P Schafer , Johannes Weiner , Michal Hocko , Minchan Kim , Andi Kleen , Wei Yongjun , Tang Chen , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Joonsoo Kim --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On Thu, Jan 09 2014, Joonsoo Kim wrote: > If (MAX_ORDER-1) is greater than pageblock order, there is a possibility > to merge different migratetype pages and to be linked in unintended > freelist. > > While I test CMA, CMA pages are merged and linked into MOVABLE freelist > by above issue and then, the pages change their migratetype to UNMOVABLE = by > try_to_steal_freepages(). After that, CMA to this region always fail. > > To prevent this, we should not merge the page on MIGRATE_(CMA|ISOLATE) > freelist. This is strange. CMA regions are always multiplies of max-pages (or pageblocks whichever is larger), so MOVABLE free pages should never be inside of a CMA region. If what you're describing happens, it looks like an issue somewhere else. > Signed-off-by: Joonsoo Kim > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 2548b42..ea99cee 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -581,6 +581,15 @@ static inline void __free_one_page(struct page *page, > __mod_zone_freepage_state(zone, 1 << order, > migratetype); > } else { > + int buddy_mt =3D get_buddy_migratetype(buddy); > + > + /* We don't want to merge cma, isolate pages */ > + if (unlikely(order >=3D pageblock_order) && > + migratetype !=3D buddy_mt && > + (migratetype >=3D MIGRATE_PCPTYPES || > + buddy_mt >=3D MIGRATE_PCPTYPES)) { > + break; > + } > list_del(&buddy->lru); > zone->free_area[order].nr_free--; > rmv_page_order(buddy); > --=20 > 1.7.9.5 > --=20 Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=3D./ `o ..o | Computer Science, Micha=C5=82 =E2=80=9Cmina86=E2=80=9D Nazarewicz = (o o) ooo +------ooO--(_)--Ooo-- --=-=-= Content-Type: multipart/signed; boundary="==-=-="; micalg=pgp-sha1; protocol="application/pgp-signature" --==-=-= Content-Type: text/plain --==-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAEBAgAGBQJSzmo6AAoJECBgQBJQdR/0hXcP+gNcSKtRGUP6Z1/0L4u+rMIp oJqXjZ1M6kSzqPYeTEZhHqJLOMLYJfEmDAzUkP3xeVvnqv0HatHjcn5JDklkv9PT gOQ1sflANnIwIw930rVLQQM5s0QhR4gic+CnJ7Sc9YPadopn1l+JQHy/93ylXruU /+g23QCFS+uQoQZ6HqhJS2AXXworLMTi9IA/YA1PuMXLDpnlhLFh9tkeJeWIR+rX Frr7U35NeZtWyKbHSZttULJGFAtscD0mdHP79Bnqzosyqi92HyjSoIjzOCe4ptkM FMie0i9Rx/NiRVRNOzQrsI7ryr1RR/lXhbcmTYyvMfxBuzbXW3/r1gQEIuJvDpJ/ Us9zl2ayWpFvjgBE9m/4vawZO/+PGVsv74iVcL60KgEuftAPyYHqkYeAf8cI8WOh CgKpR6oyUOFp81kX0GeEJ2b5JJh+lOzmufg4Ow1eLgQWpBY/u02hQ/sLyEpHgqiu ZfgYBNP5horayy6VqIrnw1/oIBg2CUp31RQtJ5sB+AaGHTtd7cw1X8PblLWRJvsn ErdJKRJV1fe/bnwD3EEt5iI8Y9oCOB6mTI5pHWhdIunBEG0//J8qYpqk+U9jfB6Q BsLwO55NOyC5MzRmaXLcLGmmn+ENfPDrcugtmwqOK1SNKK2NOKN6Q0cGuu0Sa+8Z 2fcUX72K8i+6h1sDR1vt =vPh+ -----END PGP SIGNATURE----- --==-=-=-- --=-=-=-- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-we0-f170.google.com (mail-we0-f170.google.com [74.125.82.170]) by kanga.kvack.org (Postfix) with ESMTP id EFC896B0031 for ; Thu, 9 Jan 2014 04:27:30 -0500 (EST) Received: by mail-we0-f170.google.com with SMTP id u57so2310652wes.1 for ; Thu, 09 Jan 2014 01:27:30 -0800 (PST) Received: from mx2.suse.de (cantor2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id j47si2663415eeo.116.2014.01.09.01.27.30 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Thu, 09 Jan 2014 01:27:30 -0800 (PST) Date: Thu, 9 Jan 2014 09:27:20 +0000 From: Mel Gorman Subject: Re: [PATCH 0/7] improve robustness on handling migratetype Message-ID: <20140109092720.GM27046@suse.de> References: <1389251087-10224-1-git-send-email-iamjoonsoo.kim@lge.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <1389251087-10224-1-git-send-email-iamjoonsoo.kim@lge.com> Sender: owner-linux-mm@kvack.org List-ID: To: Joonsoo Kim Cc: Andrew Morton , "Kirill A. Shutemov" , Rik van Riel , Jiang Liu , Cody P Schafer , Johannes Weiner , Michal Hocko , Minchan Kim , Michal Nazarewicz , Andi Kleen , Wei Yongjun , Tang Chen , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Joonsoo Kim On Thu, Jan 09, 2014 at 04:04:40PM +0900, Joonsoo Kim wrote: > Hello, > > I found some weaknesses on handling migratetype during code review and > testing CMA. > > First, we don't have any synchronization method on get/set pageblock > migratetype. When we change migratetype, we hold the zone lock. So > writer-writer race doesn't exist. But while someone changes migratetype, > others can get migratetype. This may introduce totally unintended value > as migratetype. Although I haven't heard of any problem report about > that, it is better to protect properly. > This is deliberate. The migratetypes for the majority of users are advisory and aimed for fragmentation avoidance. It was important that the cost of that be kept as low as possible and the general case is that migration types change very rarely. In many cases, the zone lock is held. In other cases, such as splitting free pages, the cost is simply not justified. I doubt there is any amount of data you could add in support that would justify hammering the free fast paths (which call get_pageblock_type). > Second, (get/set)_freepage_migrate isn't used properly. I guess that it > would be introduced for per cpu page(pcp) performance, but, it is also > used by memory isolation, now. For that case, the information isn't > enough to use, so we need to fix it. > > Third, there is the problem on buddy allocator. It doesn't consider > migratetype when merging buddy, so pages from cma or isolate region can > be moved to other migratetype freelist. It makes CMA failed over and over. > To prevent it, the buddy allocator should consider migratetype if > CMA/ISOLATE is enabled. Without loioing at the patches, this is likely to add some cost to the page free fast path -- heavy cost if it's a pageblock lookup and lighter cost if you are using cached page information which is potentially stale. Why not force CMA regions to be aligned on MAX_ORDER_NR_PAGES boundary instead to avoid any possibility of merging issues? > This patchset is aimed at fixing these problems and based on v3.13-rc7. > > mm/page_alloc: synchronize get/set pageblock cost with no justification. > mm/cma: fix cma free page accounting sounds like it would be a fix but unrelated to the leader and should be seperated out on its own > mm/page_alloc: move set_freepage_migratetype() to better place Very vague. If this does something useful then it could do with a better subject. > mm/isolation: remove invalid check condition Looks harmless. > mm/page_alloc: separate interface to set/get migratetype of freepage > mm/page_alloc: store freelist migratetype to the page on buddy > properly Potentially sounds useful > mm/page_alloc: don't merge MIGRATE_(CMA|ISOLATE) pages on buddy > Sounds unnecessary if CMA regions were MAX_ORDER_NR_PAGES aligned and then the free paths would be unaffected for everybody. I didn't look at the patches because it felt like cost without any supporting justification for the patches. Superficially it looks like patch 1 needs to go away and the last patch could be done without affected !CMA users. The rest are potentially useful but there should have been some supporting data on how it helps CMA with some backup showing that the page allocation paths are not impacted as a result. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-we0-f170.google.com (mail-we0-f170.google.com [74.125.82.170]) by kanga.kvack.org (Postfix) with ESMTP id C86F96B0031 for ; Thu, 9 Jan 2014 09:05:05 -0500 (EST) Received: by mail-we0-f170.google.com with SMTP id u57so2588736wes.29 for ; Thu, 09 Jan 2014 06:05:05 -0800 (PST) Received: from mail-wg0-x22c.google.com (mail-wg0-x22c.google.com [2a00:1450:400c:c00::22c]) by mx.google.com with ESMTPS id kc5si1382092wjc.145.2014.01.09.06.05.04 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 09 Jan 2014 06:05:04 -0800 (PST) Received: by mail-wg0-f44.google.com with SMTP id l18so1862586wgh.23 for ; Thu, 09 Jan 2014 06:05:04 -0800 (PST) MIME-Version: 1.0 In-Reply-To: References: <1389251087-10224-1-git-send-email-iamjoonsoo.kim@lge.com> Date: Thu, 9 Jan 2014 23:05:04 +0900 Message-ID: Subject: Re: [PATCH 0/7] improve robustness on handling migratetype From: Joonsoo Kim Content-Type: text/plain; charset=ISO-8859-1 Sender: owner-linux-mm@kvack.org List-ID: To: Michal Nazarewicz Cc: Joonsoo Kim , Andrew Morton , "Kirill A. Shutemov" , Rik van Riel , Jiang Liu , Mel Gorman , Cody P Schafer , Johannes Weiner , Michal Hocko , Minchan Kim , Andi Kleen , Wei Yongjun , Tang Chen , Linux Memory Management List , LKML 2014/1/9 Michal Nazarewicz : > On Thu, Jan 09 2014, Joonsoo Kim wrote: >> Third, there is the problem on buddy allocator. It doesn't consider >> migratetype when merging buddy, so pages from cma or isolate region can >> be moved to other migratetype freelist. It makes CMA failed over and over. >> To prevent it, the buddy allocator should consider migratetype if >> CMA/ISOLATE is enabled. > > There should never be situation where a CMA page shares a pageblock (or > a max-order page) with a non-CMA page though, so this should never be an > issue. Right... It never happens. When I ported CMA region reservation code to my own code for testing, I made a mistake. Sorry for noise. Thanks. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pd0-f169.google.com (mail-pd0-f169.google.com [209.85.192.169]) by kanga.kvack.org (Postfix) with ESMTP id E031E6B0035 for ; Thu, 9 Jan 2014 16:10:40 -0500 (EST) Received: by mail-pd0-f169.google.com with SMTP id v10so3688090pde.0 for ; Thu, 09 Jan 2014 13:10:40 -0800 (PST) Received: from smtp.codeaurora.org (smtp.codeaurora.org. [198.145.11.231]) by mx.google.com with ESMTPS id nu8si4883447pbb.342.2014.01.09.13.10.32 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 09 Jan 2014 13:10:39 -0800 (PST) Message-ID: <52CF1045.30903@codeaurora.org> Date: Thu, 09 Jan 2014 13:10:29 -0800 From: Laura Abbott MIME-Version: 1.0 Subject: Re: [PATCH 2/7] mm/cma: fix cma free page accounting References: <1389251087-10224-1-git-send-email-iamjoonsoo.kim@lge.com> <1389251087-10224-3-git-send-email-iamjoonsoo.kim@lge.com> In-Reply-To: <1389251087-10224-3-git-send-email-iamjoonsoo.kim@lge.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Joonsoo Kim , Andrew Morton Cc: "Kirill A. Shutemov" , Rik van Riel , Jiang Liu , Mel Gorman , Cody P Schafer , Johannes Weiner , Michal Hocko , Minchan Kim , Michal Nazarewicz , Andi Kleen , Wei Yongjun , Tang Chen , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Joonsoo Kim On 1/8/2014 11:04 PM, Joonsoo Kim wrote: > Cma pages can be allocated by not only order 0 request but also high order > request. So, we should consider to account free cma page in the both > places. > > Signed-off-by: Joonsoo Kim > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index b36aa5a..1489c301 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -1091,6 +1091,12 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype) > start_migratetype, > migratetype); > > + /* CMA pages cannot be stolen */ > + if (is_migrate_cma(migratetype)) { > + __mod_zone_page_state(zone, > + NR_FREE_CMA_PAGES, -(1 << order)); > + } > + > /* Remove the page from the freelists */ > list_del(&page->lru); > rmv_page_order(page); > @@ -1175,9 +1181,6 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order, > } > set_freepage_migratetype(page, mt); > list = &page->lru; > - if (is_migrate_cma(mt)) > - __mod_zone_page_state(zone, NR_FREE_CMA_PAGES, > - -(1 << order)); > } > __mod_zone_page_state(zone, NR_FREE_PAGES, -(i << order)); > spin_unlock(&zone->lock); > Wouldn't this result in double counting? in the buffered_rmqueue non zero ordered request we call __mod_zone_freepage_state which already accounts for CMA pages if the migrate type is CMA so it seems like we would get hit twice: buffered_rmqueue __rmqueue __rmqueue_fallback decrement __mod_zone_freepage_state decrement Thanks, Laura -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pb0-f47.google.com (mail-pb0-f47.google.com [209.85.160.47]) by kanga.kvack.org (Postfix) with ESMTP id D88ED6B0035 for ; Fri, 10 Jan 2014 03:48:35 -0500 (EST) Received: by mail-pb0-f47.google.com with SMTP id um1so4194607pbc.20 for ; Fri, 10 Jan 2014 00:48:35 -0800 (PST) Received: from LGEAMRELO02.lge.com (lgeamrelo02.lge.com. [156.147.1.126]) by mx.google.com with ESMTP id yd9si6427500pab.263.2014.01.10.00.48.33 for ; Fri, 10 Jan 2014 00:48:34 -0800 (PST) Date: Fri, 10 Jan 2014 17:48:55 +0900 From: Joonsoo Kim Subject: Re: [PATCH 0/7] improve robustness on handling migratetype Message-ID: <20140110084854.GA22058@lge.com> References: <1389251087-10224-1-git-send-email-iamjoonsoo.kim@lge.com> <20140109092720.GM27046@suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20140109092720.GM27046@suse.de> Sender: owner-linux-mm@kvack.org List-ID: To: Mel Gorman Cc: Andrew Morton , "Kirill A. Shutemov" , Rik van Riel , Jiang Liu , Cody P Schafer , Johannes Weiner , Michal Hocko , Minchan Kim , Michal Nazarewicz , Andi Kleen , Wei Yongjun , Tang Chen , linux-mm@kvack.org, linux-kernel@vger.kernel.org On Thu, Jan 09, 2014 at 09:27:20AM +0000, Mel Gorman wrote: > On Thu, Jan 09, 2014 at 04:04:40PM +0900, Joonsoo Kim wrote: > > Hello, > > > > I found some weaknesses on handling migratetype during code review and > > testing CMA. > > > > First, we don't have any synchronization method on get/set pageblock > > migratetype. When we change migratetype, we hold the zone lock. So > > writer-writer race doesn't exist. But while someone changes migratetype, > > others can get migratetype. This may introduce totally unintended value > > as migratetype. Although I haven't heard of any problem report about > > that, it is better to protect properly. > > > > This is deliberate. The migratetypes for the majority of users are advisory > and aimed for fragmentation avoidance. It was important that the cost of > that be kept as low as possible and the general case is that migration types > change very rarely. In many cases, the zone lock is held. In other cases, > such as splitting free pages, the cost is simply not justified. > > I doubt there is any amount of data you could add in support that would > justify hammering the free fast paths (which call get_pageblock_type). Hello, Mel. There is a possibility that we can get unintended value such as 6 as migratetype if reader-writer (get/set pageblock_migratetype) race happends. It can be possible, because we read the value without any synchronization method. And this migratetype, 6, has no place in buddy freelist, so array index overrun can be possible and the system can break, although I haven't heard that it occurs. I think that my solution is too expensive. However, I think that we need solution. aren't we? Do you have any better idea? > > > Second, (get/set)_freepage_migrate isn't used properly. I guess that it > > would be introduced for per cpu page(pcp) performance, but, it is also > > used by memory isolation, now. For that case, the information isn't > > enough to use, so we need to fix it. > > > > Third, there is the problem on buddy allocator. It doesn't consider > > migratetype when merging buddy, so pages from cma or isolate region can > > be moved to other migratetype freelist. It makes CMA failed over and over. > > To prevent it, the buddy allocator should consider migratetype if > > CMA/ISOLATE is enabled. > > Without loioing at the patches, this is likely to add some cost to the > page free fast path -- heavy cost if it's a pageblock lookup and lighter > cost if you are using cached page information which is potentially stale. > Why not force CMA regions to be aligned on MAX_ORDER_NR_PAGES boundary > instead to avoid any possibility of merging issues? > There was my mistake. CMA region is aligned on MAX_ORDER_NR_PAGES, so it can't happed. Sorry for noise. > > This patchset is aimed at fixing these problems and based on v3.13-rc7. > > > > mm/page_alloc: synchronize get/set pageblock > > cost with no justification. > > > mm/cma: fix cma free page accounting > > sounds like it would be a fix but unrelated to the leader and should be > seperated out on its own Yes, it is not related to this topic and it is wrong patch as Laura pointed out, so I will drop it. > > mm/page_alloc: move set_freepage_migratetype() to better place > > Very vague. If this does something useful then it could do with a better > subject. Okay. > > mm/isolation: remove invalid check condition > > Looks harmless. > > > mm/page_alloc: separate interface to set/get migratetype of freepage > > mm/page_alloc: store freelist migratetype to the page on buddy > > properly > > Potentially sounds useful > I made these two patches for last patch to reduce performance effect of it. In case of dropping last patch, it is better to remove the last callsite using freelist migratetype to know the buddy freelist type. I will do respin. Thanks. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pb0-f45.google.com (mail-pb0-f45.google.com [209.85.160.45]) by kanga.kvack.org (Postfix) with ESMTP id C11E56B0037 for ; Fri, 10 Jan 2014 03:49:48 -0500 (EST) Received: by mail-pb0-f45.google.com with SMTP id rp16so4186932pbb.4 for ; Fri, 10 Jan 2014 00:49:48 -0800 (PST) Received: from LGEMRELSE7Q.lge.com (LGEMRELSE7Q.lge.com. [156.147.1.151]) by mx.google.com with ESMTP id pu3si6473701pbc.30.2014.01.10.00.49.46 for ; Fri, 10 Jan 2014 00:49:47 -0800 (PST) Date: Fri, 10 Jan 2014 17:50:05 +0900 From: Joonsoo Kim Subject: Re: [PATCH 2/7] mm/cma: fix cma free page accounting Message-ID: <20140110085005.GB22058@lge.com> References: <1389251087-10224-1-git-send-email-iamjoonsoo.kim@lge.com> <1389251087-10224-3-git-send-email-iamjoonsoo.kim@lge.com> <52CF1045.30903@codeaurora.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <52CF1045.30903@codeaurora.org> Sender: owner-linux-mm@kvack.org List-ID: To: Laura Abbott Cc: Andrew Morton , "Kirill A. Shutemov" , Rik van Riel , Jiang Liu , Mel Gorman , Cody P Schafer , Johannes Weiner , Michal Hocko , Minchan Kim , Michal Nazarewicz , Andi Kleen , Wei Yongjun , Tang Chen , linux-mm@kvack.org, linux-kernel@vger.kernel.org On Thu, Jan 09, 2014 at 01:10:29PM -0800, Laura Abbott wrote: > On 1/8/2014 11:04 PM, Joonsoo Kim wrote: > >Cma pages can be allocated by not only order 0 request but also high order > >request. So, we should consider to account free cma page in the both > >places. > > > >Signed-off-by: Joonsoo Kim > > > >diff --git a/mm/page_alloc.c b/mm/page_alloc.c > >index b36aa5a..1489c301 100644 > >--- a/mm/page_alloc.c > >+++ b/mm/page_alloc.c > >@@ -1091,6 +1091,12 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype) > > start_migratetype, > > migratetype); > > > >+ /* CMA pages cannot be stolen */ > >+ if (is_migrate_cma(migratetype)) { > >+ __mod_zone_page_state(zone, > >+ NR_FREE_CMA_PAGES, -(1 << order)); > >+ } > >+ > > /* Remove the page from the freelists */ > > list_del(&page->lru); > > rmv_page_order(page); > >@@ -1175,9 +1181,6 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order, > > } > > set_freepage_migratetype(page, mt); > > list = &page->lru; > >- if (is_migrate_cma(mt)) > >- __mod_zone_page_state(zone, NR_FREE_CMA_PAGES, > >- -(1 << order)); > > } > > __mod_zone_page_state(zone, NR_FREE_PAGES, -(i << order)); > > spin_unlock(&zone->lock); > > > > Wouldn't this result in double counting? in the buffered_rmqueue non > zero ordered request we call __mod_zone_freepage_state which already > accounts for CMA pages if the migrate type is CMA so it seems like > we would get hit twice: > > buffered_rmqueue > __rmqueue > __rmqueue_fallback > decrement > __mod_zone_freepage_state > decrement > Hello, Laura. You are right. I missed it. I will drop this patch. Thanks. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ee0-f52.google.com (mail-ee0-f52.google.com [74.125.83.52]) by kanga.kvack.org (Postfix) with ESMTP id 7AB486B0035 for ; Fri, 10 Jan 2014 04:48:40 -0500 (EST) Received: by mail-ee0-f52.google.com with SMTP id d17so1808569eek.39 for ; Fri, 10 Jan 2014 01:48:39 -0800 (PST) Received: from mx2.suse.de (cantor2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id p46si8890300eem.231.2014.01.10.01.48.39 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Fri, 10 Jan 2014 01:48:39 -0800 (PST) Date: Fri, 10 Jan 2014 09:48:34 +0000 From: Mel Gorman Subject: Re: [PATCH 0/7] improve robustness on handling migratetype Message-ID: <20140110094834.GV27046@suse.de> References: <1389251087-10224-1-git-send-email-iamjoonsoo.kim@lge.com> <20140109092720.GM27046@suse.de> <20140110084854.GA22058@lge.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <20140110084854.GA22058@lge.com> Sender: owner-linux-mm@kvack.org List-ID: To: Joonsoo Kim Cc: Andrew Morton , "Kirill A. Shutemov" , Rik van Riel , Jiang Liu , Cody P Schafer , Johannes Weiner , Michal Hocko , Minchan Kim , Michal Nazarewicz , Andi Kleen , Wei Yongjun , Tang Chen , linux-mm@kvack.org, linux-kernel@vger.kernel.org On Fri, Jan 10, 2014 at 05:48:55PM +0900, Joonsoo Kim wrote: > On Thu, Jan 09, 2014 at 09:27:20AM +0000, Mel Gorman wrote: > > On Thu, Jan 09, 2014 at 04:04:40PM +0900, Joonsoo Kim wrote: > > > Hello, > > > > > > I found some weaknesses on handling migratetype during code review and > > > testing CMA. > > > > > > First, we don't have any synchronization method on get/set pageblock > > > migratetype. When we change migratetype, we hold the zone lock. So > > > writer-writer race doesn't exist. But while someone changes migratetype, > > > others can get migratetype. This may introduce totally unintended value > > > as migratetype. Although I haven't heard of any problem report about > > > that, it is better to protect properly. > > > > > > > This is deliberate. The migratetypes for the majority of users are advisory > > and aimed for fragmentation avoidance. It was important that the cost of > > that be kept as low as possible and the general case is that migration types > > change very rarely. In many cases, the zone lock is held. In other cases, > > such as splitting free pages, the cost is simply not justified. > > > > I doubt there is any amount of data you could add in support that would > > justify hammering the free fast paths (which call get_pageblock_type). > > Hello, Mel. > > There is a possibility that we can get unintended value such as 6 as migratetype > if reader-writer (get/set pageblock_migratetype) race happends. It can be > possible, because we read the value without any synchronization method. And > this migratetype, 6, has no place in buddy freelist, so array index overrun can > be possible and the system can break, although I haven't heard that it occurs. > > I think that my solution is too expensive. However, I think that we need > solution. aren't we? Do you have any better idea? > It's not something I have ever heard or seen of occurring but if you've identified that it's a real possibility then split get_pageblock_migratetype into locked and unlocked versions. Ensure that calls to set_pageblock_migratetype is always under zone->lock and get_pageblock_migratetype is also under zone->lock which both should be true in the majority of cases. Use the unlocked version otherwise but instead of synchronoing, check if it's returning >= MIGRATE_TYPES and return MIGRATE_MOVABLE in the unlikely event of a race. This will avoid harming the fast paths for the majority of users and limit the damage if a MIGRATE_CMA region is accidentally treated as MIGRATe_MOVABLE -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pb0-f46.google.com (mail-pb0-f46.google.com [209.85.160.46]) by kanga.kvack.org (Postfix) with ESMTP id 923806B0035 for ; Sun, 12 Jan 2014 20:56:29 -0500 (EST) Received: by mail-pb0-f46.google.com with SMTP id ma3so1190053pbc.33 for ; Sun, 12 Jan 2014 17:56:29 -0800 (PST) Received: from LGEMRELSE7Q.lge.com (LGEMRELSE7Q.lge.com. [156.147.1.151]) by mx.google.com with ESMTP id pu3si14161316pbc.240.2014.01.12.17.56.27 for ; Sun, 12 Jan 2014 17:56:28 -0800 (PST) Date: Mon, 13 Jan 2014 10:57:00 +0900 From: Joonsoo Kim Subject: Re: [PATCH 0/7] improve robustness on handling migratetype Message-ID: <20140113015659.GA28140@lge.com> References: <1389251087-10224-1-git-send-email-iamjoonsoo.kim@lge.com> <20140109092720.GM27046@suse.de> <20140110084854.GA22058@lge.com> <20140110094834.GV27046@suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20140110094834.GV27046@suse.de> Sender: owner-linux-mm@kvack.org List-ID: To: Mel Gorman Cc: Andrew Morton , "Kirill A. Shutemov" , Rik van Riel , Jiang Liu , Cody P Schafer , Johannes Weiner , Michal Hocko , Minchan Kim , Michal Nazarewicz , Andi Kleen , Wei Yongjun , Tang Chen , linux-mm@kvack.org, linux-kernel@vger.kernel.org On Fri, Jan 10, 2014 at 09:48:34AM +0000, Mel Gorman wrote: > On Fri, Jan 10, 2014 at 05:48:55PM +0900, Joonsoo Kim wrote: > > On Thu, Jan 09, 2014 at 09:27:20AM +0000, Mel Gorman wrote: > > > On Thu, Jan 09, 2014 at 04:04:40PM +0900, Joonsoo Kim wrote: > > > > Hello, > > > > > > > > I found some weaknesses on handling migratetype during code review and > > > > testing CMA. > > > > > > > > First, we don't have any synchronization method on get/set pageblock > > > > migratetype. When we change migratetype, we hold the zone lock. So > > > > writer-writer race doesn't exist. But while someone changes migratetype, > > > > others can get migratetype. This may introduce totally unintended value > > > > as migratetype. Although I haven't heard of any problem report about > > > > that, it is better to protect properly. > > > > > > > > > > This is deliberate. The migratetypes for the majority of users are advisory > > > and aimed for fragmentation avoidance. It was important that the cost of > > > that be kept as low as possible and the general case is that migration types > > > change very rarely. In many cases, the zone lock is held. In other cases, > > > such as splitting free pages, the cost is simply not justified. > > > > > > I doubt there is any amount of data you could add in support that would > > > justify hammering the free fast paths (which call get_pageblock_type). > > > > Hello, Mel. > > > > There is a possibility that we can get unintended value such as 6 as migratetype > > if reader-writer (get/set pageblock_migratetype) race happends. It can be > > possible, because we read the value without any synchronization method. And > > this migratetype, 6, has no place in buddy freelist, so array index overrun can > > be possible and the system can break, although I haven't heard that it occurs. > > > > I think that my solution is too expensive. However, I think that we need > > solution. aren't we? Do you have any better idea? > > > > It's not something I have ever heard or seen of occurring but > if you've identified that it's a real possibility then split > get_pageblock_migratetype into locked and unlocked versions. Ensure > that calls to set_pageblock_migratetype is always under zone->lock and > get_pageblock_migratetype is also under zone->lock which both should be > true in the majority of cases. Use the unlocked version otherwise but > instead of synchronoing, check if it's returning >= MIGRATE_TYPES and > return MIGRATE_MOVABLE in the unlikely event of a race. This will avoid > harming the fast paths for the majority of users and limit the damage if > a MIGRATE_CMA region is accidentally treated as MIGRATe_MOVABLE Okay. I will re-investigate it and if I have indentified that it's a real possiblity, I will re-make this patch according to your advice. Thanks for comment! -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wg0-f47.google.com (mail-wg0-f47.google.com [74.125.82.47]) by kanga.kvack.org (Postfix) with ESMTP id 554FD6B0031 for ; Wed, 29 Jan 2014 11:52:49 -0500 (EST) Received: by mail-wg0-f47.google.com with SMTP id m15so4028453wgh.2 for ; Wed, 29 Jan 2014 08:52:48 -0800 (PST) Received: from mx2.suse.de (cantor2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id a5si5733732wik.4.2014.01.29.08.52.47 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Wed, 29 Jan 2014 08:52:47 -0800 (PST) Message-ID: <52E931D9.8050002@suse.cz> Date: Wed, 29 Jan 2014 17:52:41 +0100 From: Vlastimil Babka MIME-Version: 1.0 Subject: Re: [PATCH 0/7] improve robustness on handling migratetype References: <1389251087-10224-1-git-send-email-iamjoonsoo.kim@lge.com> <20140109092720.GM27046@suse.de> <20140110084854.GA22058@lge.com> In-Reply-To: <20140110084854.GA22058@lge.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Joonsoo Kim , Mel Gorman Cc: Andrew Morton , "Kirill A. Shutemov" , Rik van Riel , Jiang Liu , Cody P Schafer , Johannes Weiner , Michal Hocko , Minchan Kim , Michal Nazarewicz , Andi Kleen , Wei Yongjun , Tang Chen , linux-mm@kvack.org, linux-kernel@vger.kernel.org On 01/10/2014 09:48 AM, Joonsoo Kim wrote: > On Thu, Jan 09, 2014 at 09:27:20AM +0000, Mel Gorman wrote: >> On Thu, Jan 09, 2014 at 04:04:40PM +0900, Joonsoo Kim wrote: >>> Hello, >>> >>> I found some weaknesses on handling migratetype during code review and >>> testing CMA. >>> >>> First, we don't have any synchronization method on get/set pageblock >>> migratetype. When we change migratetype, we hold the zone lock. So >>> writer-writer race doesn't exist. But while someone changes migratetype, >>> others can get migratetype. This may introduce totally unintended value >>> as migratetype. Although I haven't heard of any problem report about >>> that, it is better to protect properly. >>> >> >> This is deliberate. The migratetypes for the majority of users are advisory >> and aimed for fragmentation avoidance. It was important that the cost of >> that be kept as low as possible and the general case is that migration types >> change very rarely. In many cases, the zone lock is held. In other cases, >> such as splitting free pages, the cost is simply not justified. >> >> I doubt there is any amount of data you could add in support that would >> justify hammering the free fast paths (which call get_pageblock_type). > > Hello, Mel. > > There is a possibility that we can get unintended value such as 6 as migratetype > if reader-writer (get/set pageblock_migratetype) race happends. It can be > possible, because we read the value without any synchronization method. And > this migratetype, 6, has no place in buddy freelist, so array index overrun can > be possible and the system can break, although I haven't heard that it occurs. Hello, it seems this can indeed happen. I'm working on memory compaction improvements and in a prototype patch, I'm basically adding calls of start_isolate_page_range() undo_isolate_page_range() some functions under compact_zone(). With this I've seen occurrences of NULL pointers in move_freepages(), free_one_page() in places where free_list[migratetype] is manipulated by e.g. list_move(). That lead me to question the value of migratetype and I found this thread. Adding some debugging in get_pageblock_migratetype() and voila, I get a value of 6 being read. So is it just my patch adding a dangerous situation, or does it exist in mainline as well? By looking at free_one_page(), it uses zone->lock, but get_pageblock_migratetype() is called by its callers (free_hot_cold_page() or __free_pages_ok()) outside of the lock. This determined migratetype is then used under free_one_page() to access a free_list. It seems that this could race with set_pageblock_migratetype() called from try_to_steal_freepages() (despite the latter being properly locked). There are also other callers but those seem to be either limited to initialization and isolation, which should be rare (?). However, try_to_steal_freepages can occur repeatedly. So I assume that the race happens but never manifests as a fatal error as long as MIGRATE_UNMOVABLE, MIGRATE_RECLAIMABLE and MIGRATE_MOVABLE values are used. Only MIGRATE_CMA and MIGRATE_ISOLATE have values with bit 4 enabled and can thus result in invalid values due to non-atomic access. Does that make sense to you and should we thus proceed with patching this race? Vlastimil > I think that my solution is too expensive. However, I think that we need > solution. aren't we? Do you have any better idea? > >> >>> Second, (get/set)_freepage_migrate isn't used properly. I guess that it >>> would be introduced for per cpu page(pcp) performance, but, it is also >>> used by memory isolation, now. For that case, the information isn't >>> enough to use, so we need to fix it. >>> >>> Third, there is the problem on buddy allocator. It doesn't consider >>> migratetype when merging buddy, so pages from cma or isolate region can >>> be moved to other migratetype freelist. It makes CMA failed over and over. >>> To prevent it, the buddy allocator should consider migratetype if >>> CMA/ISOLATE is enabled. >> >> Without loioing at the patches, this is likely to add some cost to the >> page free fast path -- heavy cost if it's a pageblock lookup and lighter >> cost if you are using cached page information which is potentially stale. >> Why not force CMA regions to be aligned on MAX_ORDER_NR_PAGES boundary >> instead to avoid any possibility of merging issues? >> > > There was my mistake. CMA region is aligned on MAX_ORDER_NR_PAGES, so it > can't happed. Sorry for noise. > >>> This patchset is aimed at fixing these problems and based on v3.13-rc7. >>> >>> mm/page_alloc: synchronize get/set pageblock >> >> cost with no justification. >> >>> mm/cma: fix cma free page accounting >> >> sounds like it would be a fix but unrelated to the leader and should be >> seperated out on its own > > Yes, it is not related to this topic and it is wrong patch as Laura > pointed out, so I will drop it. > >>> mm/page_alloc: move set_freepage_migratetype() to better place >> >> Very vague. If this does something useful then it could do with a better >> subject. > > Okay. > >>> mm/isolation: remove invalid check condition >> >> Looks harmless. >> >>> mm/page_alloc: separate interface to set/get migratetype of freepage >>> mm/page_alloc: store freelist migratetype to the page on buddy >>> properly >> >> Potentially sounds useful >> > > I made these two patches for last patch to reduce performance effect of it. > In case of dropping last patch, it is better to remove the last callsite > using freelist migratetype to know the buddy freelist type. I will do respin. > > Thanks. > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wg0-f49.google.com (mail-wg0-f49.google.com [74.125.82.49]) by kanga.kvack.org (Postfix) with ESMTP id E464D6B0031 for ; Fri, 31 Jan 2014 10:39:17 -0500 (EST) Received: by mail-wg0-f49.google.com with SMTP id a1so8975636wgh.28 for ; Fri, 31 Jan 2014 07:39:17 -0800 (PST) Received: from mx2.suse.de (cantor2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id 18si5364778wjo.128.2014.01.31.07.39.16 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Fri, 31 Jan 2014 07:39:16 -0800 (PST) Date: Fri, 31 Jan 2014 15:39:08 +0000 From: Mel Gorman Subject: Re: [PATCH 0/7] improve robustness on handling migratetype Message-ID: <20140131153908.GA14581@suse.de> References: <1389251087-10224-1-git-send-email-iamjoonsoo.kim@lge.com> <20140109092720.GM27046@suse.de> <20140110084854.GA22058@lge.com> <52E931D9.8050002@suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <52E931D9.8050002@suse.cz> Sender: owner-linux-mm@kvack.org List-ID: To: Vlastimil Babka Cc: Joonsoo Kim , Andrew Morton , "Kirill A. Shutemov" , Rik van Riel , Jiang Liu , Cody P Schafer , Johannes Weiner , Michal Hocko , Minchan Kim , Michal Nazarewicz , Andi Kleen , Wei Yongjun , Tang Chen , linux-mm@kvack.org, linux-kernel@vger.kernel.org On Wed, Jan 29, 2014 at 05:52:41PM +0100, Vlastimil Babka wrote: > On 01/10/2014 09:48 AM, Joonsoo Kim wrote: > >On Thu, Jan 09, 2014 at 09:27:20AM +0000, Mel Gorman wrote: > >>On Thu, Jan 09, 2014 at 04:04:40PM +0900, Joonsoo Kim wrote: > >>>Hello, > >>> > >>>I found some weaknesses on handling migratetype during code review and > >>>testing CMA. > >>> > >>>First, we don't have any synchronization method on get/set pageblock > >>>migratetype. When we change migratetype, we hold the zone lock. So > >>>writer-writer race doesn't exist. But while someone changes migratetype, > >>>others can get migratetype. This may introduce totally unintended value > >>>as migratetype. Although I haven't heard of any problem report about > >>>that, it is better to protect properly. > >>> > >> > >>This is deliberate. The migratetypes for the majority of users are advisory > >>and aimed for fragmentation avoidance. It was important that the cost of > >>that be kept as low as possible and the general case is that migration types > >>change very rarely. In many cases, the zone lock is held. In other cases, > >>such as splitting free pages, the cost is simply not justified. > >> > >>I doubt there is any amount of data you could add in support that would > >>justify hammering the free fast paths (which call get_pageblock_type). > > > >Hello, Mel. > > > >There is a possibility that we can get unintended value such as 6 as migratetype > >if reader-writer (get/set pageblock_migratetype) race happends. It can be > >possible, because we read the value without any synchronization method. And > >this migratetype, 6, has no place in buddy freelist, so array index overrun can > >be possible and the system can break, although I haven't heard that it occurs. > > Hello, > > it seems this can indeed happen. I'm working on memory compaction > improvements and in a prototype patch, I'm basically adding calls of > start_isolate_page_range() undo_isolate_page_range() some functions > under compact_zone(). With this I've seen occurrences of NULL > pointers in move_freepages(), free_one_page() in places where > free_list[migratetype] is manipulated by e.g. list_move(). That lead > me to question the value of migratetype and I found this thread. > Adding some debugging in get_pageblock_migratetype() and voila, I > get a value of 6 being read. > > So is it just my patch adding a dangerous situation, or does it exist in > mainline as well? By looking at free_one_page(), it uses zone->lock, but > get_pageblock_migratetype() is called by its callers > (free_hot_cold_page() or __free_pages_ok()) outside of the lock. > This determined migratetype is then used under free_one_page() to > access a free_list. > > It seems that this could race with set_pageblock_migratetype() > called from try_to_steal_freepages() (despite the latter being > properly locked). There are also other callers but those seem to be > either limited to initialization and isolation, which should be rare > (?). > However, try_to_steal_freepages can occur repeatedly. > So I assume that the race happens but never manifests as a fatal > error as long as MIGRATE_UNMOVABLE, MIGRATE_RECLAIMABLE and > MIGRATE_MOVABLE > values are used. Only MIGRATE_CMA and MIGRATE_ISOLATE have values > with bit 4 enabled and can thus result in invalid values due to > non-atomic access. > > Does that make sense to you and should we thus proceed with patching > this race? > If you have direct evidence then it is indeed a problem. the key would be to avoid taking the zone->lock just to stabilise this and instead modify when get_pageblock_pagetype is called to make it safe. Looking at the callers of get_pageblock_pagetype it would appear that 1. __free_pages_ok's call to get_pageblock_pagetype can move into free_one_page() under the zone lock as long as you also move the set_freepage_migratetype call. The migratetype will be read twice by the free_hot_cold_page->free_one_page call but that's ok because you have established that it is necessary 2. rmqueue_bulk calls under zone->lock 3. free_hot_cold_page cannot take zone->lock to stabilise the migratetype read but if it gets a bad read due to a race, it enters the slow path. Force it to call free_one_page() there and take the lock in the event of a race instead of only calling in there due to is_migrate_isolatetype. Consider adding a debug patch that counts with vmstat how often this race occurs and check the value with and without the compaction patches you've added 4. It's not obvious but __isolate_free_page should already hold the zone lock 5. buffered_rmqueue, move the call to get_pageblock_migratetype under the zone lock. It'll just cost a local variable. 6. A race in setup_zone_migrate_reserve is relatively harmless. Check system_state == SYSTEM_BOOTING and take the zone->lock if the system is live. Release, resched and reacquire if need_resched() 7. has_unmovable_pages is harmless, the range should be isolated and not racing against other updates -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pd0-f176.google.com (mail-pd0-f176.google.com [209.85.192.176]) by kanga.kvack.org (Postfix) with ESMTP id 9F3CD6B0035 for ; Mon, 3 Feb 2014 02:45:10 -0500 (EST) Received: by mail-pd0-f176.google.com with SMTP id w10so6526781pde.21 for ; Sun, 02 Feb 2014 23:45:10 -0800 (PST) Received: from lgemrelse7q.lge.com (LGEMRELSE7Q.lge.com. [156.147.1.151]) by mx.google.com with ESMTP id ek3si19520102pbd.115.2014.02.02.23.45.08 for ; Sun, 02 Feb 2014 23:45:09 -0800 (PST) Date: Mon, 3 Feb 2014 16:45:07 +0900 From: Joonsoo Kim Subject: Re: [PATCH 0/7] improve robustness on handling migratetype Message-ID: <20140203074507.GB2360@lge.com> References: <1389251087-10224-1-git-send-email-iamjoonsoo.kim@lge.com> <20140109092720.GM27046@suse.de> <20140110084854.GA22058@lge.com> <52E931D9.8050002@suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <52E931D9.8050002@suse.cz> Sender: owner-linux-mm@kvack.org List-ID: To: Vlastimil Babka Cc: Mel Gorman , Andrew Morton , "Kirill A. Shutemov" , Rik van Riel , Jiang Liu , Cody P Schafer , Johannes Weiner , Michal Hocko , Minchan Kim , Michal Nazarewicz , Andi Kleen , Wei Yongjun , Tang Chen , linux-mm@kvack.org, linux-kernel@vger.kernel.org On Wed, Jan 29, 2014 at 05:52:41PM +0100, Vlastimil Babka wrote: > On 01/10/2014 09:48 AM, Joonsoo Kim wrote: > >On Thu, Jan 09, 2014 at 09:27:20AM +0000, Mel Gorman wrote: > >>On Thu, Jan 09, 2014 at 04:04:40PM +0900, Joonsoo Kim wrote: > >>>Hello, > >>> > >>>I found some weaknesses on handling migratetype during code review and > >>>testing CMA. > >>> > >>>First, we don't have any synchronization method on get/set pageblock > >>>migratetype. When we change migratetype, we hold the zone lock. So > >>>writer-writer race doesn't exist. But while someone changes migratetype, > >>>others can get migratetype. This may introduce totally unintended value > >>>as migratetype. Although I haven't heard of any problem report about > >>>that, it is better to protect properly. > >>> > >> > >>This is deliberate. The migratetypes for the majority of users are advisory > >>and aimed for fragmentation avoidance. It was important that the cost of > >>that be kept as low as possible and the general case is that migration types > >>change very rarely. In many cases, the zone lock is held. In other cases, > >>such as splitting free pages, the cost is simply not justified. > >> > >>I doubt there is any amount of data you could add in support that would > >>justify hammering the free fast paths (which call get_pageblock_type). > > > >Hello, Mel. > > > >There is a possibility that we can get unintended value such as 6 as migratetype > >if reader-writer (get/set pageblock_migratetype) race happends. It can be > >possible, because we read the value without any synchronization method. And > >this migratetype, 6, has no place in buddy freelist, so array index overrun can > >be possible and the system can break, although I haven't heard that it occurs. > > Hello, > > it seems this can indeed happen. I'm working on memory compaction > improvements and in a prototype patch, I'm basically adding calls of > start_isolate_page_range() undo_isolate_page_range() some functions > under compact_zone(). With this I've seen occurrences of NULL > pointers in move_freepages(), free_one_page() in places where > free_list[migratetype] is manipulated by e.g. list_move(). That lead > me to question the value of migratetype and I found this thread. > Adding some debugging in get_pageblock_migratetype() and voila, I > get a value of 6 being read. > > So is it just my patch adding a dangerous situation, or does it exist in > mainline as well? By looking at free_one_page(), it uses zone->lock, but > get_pageblock_migratetype() is called by its callers > (free_hot_cold_page() or __free_pages_ok()) outside of the lock. > This determined migratetype is then used under free_one_page() to > access a free_list. > > It seems that this could race with set_pageblock_migratetype() > called from try_to_steal_freepages() (despite the latter being > properly locked). There are also other callers but those seem to be > either limited to initialization and isolation, which should be rare > (?). > However, try_to_steal_freepages can occur repeatedly. > So I assume that the race happens but never manifests as a fatal > error as long as MIGRATE_UNMOVABLE, MIGRATE_RECLAIMABLE and > MIGRATE_MOVABLE > values are used. Only MIGRATE_CMA and MIGRATE_ISOLATE have values > with bit 4 enabled and can thus result in invalid values due to > non-atomic access. > > Does that make sense to you and should we thus proceed with patching > this race? > Hello, This race is possible without your prototype patch, however, on very low probability. Some codes related to memory failure use set_migratetype_isolate() which could result in this race. Although it may be very rare case and not critical, it is better to fix this race. I prefer that we don't depend on luck. :) Mel's suggestion looks good to me. Do you have another idea? Thanks. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-we0-f181.google.com (mail-we0-f181.google.com [74.125.82.181]) by kanga.kvack.org (Postfix) with ESMTP id 9310B6B0035 for ; Mon, 3 Feb 2014 04:16:59 -0500 (EST) Received: by mail-we0-f181.google.com with SMTP id w61so1892103wes.12 for ; Mon, 03 Feb 2014 01:16:58 -0800 (PST) Received: from mx2.suse.de (cantor2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id fh3si3811002wib.84.2014.02.03.01.16.57 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Mon, 03 Feb 2014 01:16:57 -0800 (PST) Message-ID: <52EF5E82.4060003@suse.cz> Date: Mon, 03 Feb 2014 10:16:50 +0100 From: Vlastimil Babka MIME-Version: 1.0 Subject: Re: [PATCH 0/7] improve robustness on handling migratetype References: <1389251087-10224-1-git-send-email-iamjoonsoo.kim@lge.com> <20140109092720.GM27046@suse.de> <20140110084854.GA22058@lge.com> <52E931D9.8050002@suse.cz> <20140203074507.GB2360@lge.com> In-Reply-To: <20140203074507.GB2360@lge.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Joonsoo Kim Cc: Mel Gorman , Andrew Morton , "Kirill A. Shutemov" , Rik van Riel , Jiang Liu , Cody P Schafer , Johannes Weiner , Michal Hocko , Minchan Kim , Michal Nazarewicz , Andi Kleen , Wei Yongjun , Tang Chen , linux-mm@kvack.org, linux-kernel@vger.kernel.org On 02/03/2014 08:45 AM, Joonsoo Kim wrote: > On Wed, Jan 29, 2014 at 05:52:41PM +0100, Vlastimil Babka wrote: >> On 01/10/2014 09:48 AM, Joonsoo Kim wrote: >>> On Thu, Jan 09, 2014 at 09:27:20AM +0000, Mel Gorman wrote: >>>> On Thu, Jan 09, 2014 at 04:04:40PM +0900, Joonsoo Kim wrote: >>>>> Hello, >>>>> >>>>> I found some weaknesses on handling migratetype during code review and >>>>> testing CMA. >>>>> >>>>> First, we don't have any synchronization method on get/set pageblock >>>>> migratetype. When we change migratetype, we hold the zone lock. So >>>>> writer-writer race doesn't exist. But while someone changes migratetype, >>>>> others can get migratetype. This may introduce totally unintended value >>>>> as migratetype. Although I haven't heard of any problem report about >>>>> that, it is better to protect properly. >>>>> >>>> >>>> This is deliberate. The migratetypes for the majority of users are advisory >>>> and aimed for fragmentation avoidance. It was important that the cost of >>>> that be kept as low as possible and the general case is that migration types >>>> change very rarely. In many cases, the zone lock is held. In other cases, >>>> such as splitting free pages, the cost is simply not justified. >>>> >>>> I doubt there is any amount of data you could add in support that would >>>> justify hammering the free fast paths (which call get_pageblock_type). >>> >>> Hello, Mel. >>> >>> There is a possibility that we can get unintended value such as 6 as migratetype >>> if reader-writer (get/set pageblock_migratetype) race happends. It can be >>> possible, because we read the value without any synchronization method. And >>> this migratetype, 6, has no place in buddy freelist, so array index overrun can >>> be possible and the system can break, although I haven't heard that it occurs. >> >> Hello, >> >> it seems this can indeed happen. I'm working on memory compaction >> improvements and in a prototype patch, I'm basically adding calls of >> start_isolate_page_range() undo_isolate_page_range() some functions >> under compact_zone(). With this I've seen occurrences of NULL >> pointers in move_freepages(), free_one_page() in places where >> free_list[migratetype] is manipulated by e.g. list_move(). That lead >> me to question the value of migratetype and I found this thread. >> Adding some debugging in get_pageblock_migratetype() and voila, I >> get a value of 6 being read. >> >> So is it just my patch adding a dangerous situation, or does it exist in >> mainline as well? By looking at free_one_page(), it uses zone->lock, but >> get_pageblock_migratetype() is called by its callers >> (free_hot_cold_page() or __free_pages_ok()) outside of the lock. >> This determined migratetype is then used under free_one_page() to >> access a free_list. >> >> It seems that this could race with set_pageblock_migratetype() >> called from try_to_steal_freepages() (despite the latter being >> properly locked). There are also other callers but those seem to be >> either limited to initialization and isolation, which should be rare >> (?). >> However, try_to_steal_freepages can occur repeatedly. >> So I assume that the race happens but never manifests as a fatal >> error as long as MIGRATE_UNMOVABLE, MIGRATE_RECLAIMABLE and >> MIGRATE_MOVABLE >> values are used. Only MIGRATE_CMA and MIGRATE_ISOLATE have values >> with bit 4 enabled and can thus result in invalid values due to >> non-atomic access. >> >> Does that make sense to you and should we thus proceed with patching >> this race? >> > > Hello, > > This race is possible without your prototype patch, however, on very low > probability. Some codes related to memory failure use set_migratetype_isolate() > which could result in this race. > > Although it may be very rare case and not critical, it is better to fix > this race. I prefer that we don't depend on luck. :) I agree :) I also don't like the possibility that the non-fatal type of race (where higher-order bits are not involved) occurs and can hurt anti-fragmentation, or even suddenly become a problem in the future if e.g. more migratetypes are added. I'll try to quantify that with a debug patch. > Mel's suggestion looks good to me. Do you have another idea? No, it sounds good so I'm going to work on this as outlined. > Thanks. > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753151AbaAIHEd (ORCPT ); Thu, 9 Jan 2014 02:04:33 -0500 Received: from LGEMRELSE7Q.lge.com ([156.147.1.151]:43856 "EHLO LGEMRELSE7Q.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751472AbaAIHEb (ORCPT ); Thu, 9 Jan 2014 02:04:31 -0500 X-AuditID: 9c930197-b7b37ae000002e5a-bc-52ce49fd4c2a From: Joonsoo Kim To: Andrew Morton Cc: "Kirill A. Shutemov" , Rik van Riel , Jiang Liu , Mel Gorman , Cody P Schafer , Johannes Weiner , Michal Hocko , Minchan Kim , Michal Nazarewicz , Andi Kleen , Wei Yongjun , Tang Chen , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Joonsoo Kim , Joonsoo Kim Subject: [PATCH 0/7] improve robustness on handling migratetype Date: Thu, 9 Jan 2014 16:04:40 +0900 Message-Id: <1389251087-10224-1-git-send-email-iamjoonsoo.kim@lge.com> X-Mailer: git-send-email 1.7.9.5 X-Brightmail-Tracker: AAAAAA== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello, I found some weaknesses on handling migratetype during code review and testing CMA. First, we don't have any synchronization method on get/set pageblock migratetype. When we change migratetype, we hold the zone lock. So writer-writer race doesn't exist. But while someone changes migratetype, others can get migratetype. This may introduce totally unintended value as migratetype. Although I haven't heard of any problem report about that, it is better to protect properly. Second, (get/set)_freepage_migrate isn't used properly. I guess that it would be introduced for per cpu page(pcp) performance, but, it is also used by memory isolation, now. For that case, the information isn't enough to use, so we need to fix it. Third, there is the problem on buddy allocator. It doesn't consider migratetype when merging buddy, so pages from cma or isolate region can be moved to other migratetype freelist. It makes CMA failed over and over. To prevent it, the buddy allocator should consider migratetype if CMA/ISOLATE is enabled. This patchset is aimed at fixing these problems and based on v3.13-rc7. Thanks. Joonsoo Kim (7): mm/page_alloc: synchronize get/set pageblock mm/cma: fix cma free page accounting mm/page_alloc: move set_freepage_migratetype() to better place mm/isolation: remove invalid check condition mm/page_alloc: separate interface to set/get migratetype of freepage mm/page_alloc: store freelist migratetype to the page on buddy properly mm/page_alloc: don't merge MIGRATE_(CMA|ISOLATE) pages on buddy include/linux/mm.h | 35 +++++++++++++++++++++--- include/linux/mmzone.h | 2 ++ include/linux/page-isolation.h | 1 - mm/page_alloc.c | 59 ++++++++++++++++++++++++++-------------- mm/page_isolation.c | 5 +--- 5 files changed, 73 insertions(+), 29 deletions(-) -- 1.7.9.5 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754028AbaAIHFA (ORCPT ); Thu, 9 Jan 2014 02:05:00 -0500 Received: from LGEMRELSE7Q.lge.com ([156.147.1.151]:54034 "EHLO LGEMRELSE7Q.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751508AbaAIHEb (ORCPT ); Thu, 9 Jan 2014 02:04:31 -0500 X-AuditID: 9c930197-b7b37ae000002e5a-c0-52ce49fd7979 From: Joonsoo Kim To: Andrew Morton Cc: "Kirill A. Shutemov" , Rik van Riel , Jiang Liu , Mel Gorman , Cody P Schafer , Johannes Weiner , Michal Hocko , Minchan Kim , Michal Nazarewicz , Andi Kleen , Wei Yongjun , Tang Chen , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Joonsoo Kim , Joonsoo Kim Subject: [PATCH 1/7] mm/page_alloc: synchronize get/set pageblock Date: Thu, 9 Jan 2014 16:04:41 +0900 Message-Id: <1389251087-10224-2-git-send-email-iamjoonsoo.kim@lge.com> X-Mailer: git-send-email 1.7.9.5 In-Reply-To: <1389251087-10224-1-git-send-email-iamjoonsoo.kim@lge.com> References: <1389251087-10224-1-git-send-email-iamjoonsoo.kim@lge.com> X-Brightmail-Tracker: AAAAAA== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Now get/set pageblock is done without any syncronization. Therefore there is race condition and migratetype can be unintended value. Sometime we move some pageblocks from one migratetype to the other type, and, at the sametime, some page in this pageblock could be freed. In this case, we can get totally unintended value, since get/set pageblock don't get/set atomically. Instead, it is accessed in bit unit. Since set pageblock isn't used frequently rather than get pageblock, I think that seqlock is proper method to synchronize it. This type of lock has minimum overhead if there are a lot of readers and few of writers. So it fits to this situation. Signed-off-by: Joonsoo Kim diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index bd791e4..feaa607 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -79,6 +79,7 @@ static inline int get_pageblock_migratetype(struct page *page) { return get_pageblock_flags_group(page, PB_migrate, PB_migrate_end); } +void set_pageblock_migratetype(struct page *page, int migratetype); struct free_area { struct list_head free_list[MIGRATE_TYPES]; @@ -367,6 +368,7 @@ struct zone { #endif struct free_area free_area[MAX_ORDER]; + seqlock_t pageblock_seqlock; #ifndef CONFIG_SPARSEMEM /* * Flags for a pageblock_nr_pages block. See pageblock-flags.h. diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h index 3fff8e7..58e2a89 100644 --- a/include/linux/page-isolation.h +++ b/include/linux/page-isolation.h @@ -23,7 +23,6 @@ static inline bool is_migrate_isolate(int migratetype) bool has_unmovable_pages(struct zone *zone, struct page *page, int count, bool skip_hwpoisoned_pages); -void set_pageblock_migratetype(struct page *page, int migratetype); int move_freepages_block(struct zone *zone, struct page *page, int migratetype); int move_freepages(struct zone *zone, diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 5248fe0..b36aa5a 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -4788,6 +4788,7 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat, spin_lock_init(&zone->lock); spin_lock_init(&zone->lru_lock); zone_seqlock_init(zone); + seqlock_init(&zone->pageblock_seqlock); zone->zone_pgdat = pgdat; zone_pcp_init(zone); @@ -5927,15 +5928,19 @@ unsigned long get_pageblock_flags_group(struct page *page, unsigned long pfn, bitidx; unsigned long flags = 0; unsigned long value = 1; + unsigned int seq; zone = page_zone(page); pfn = page_to_pfn(page); bitmap = get_pageblock_bitmap(zone, pfn); bitidx = pfn_to_bitidx(zone, pfn); - for (; start_bitidx <= end_bitidx; start_bitidx++, value <<= 1) - if (test_bit(bitidx + start_bitidx, bitmap)) - flags |= value; + do { + seq = read_seqbegin(&zone->pageblock_seqlock); + for (; start_bitidx <= end_bitidx; start_bitidx++, value <<= 1) + if (test_bit(bitidx + start_bitidx, bitmap)) + flags |= value; + } while (read_seqretry(&zone->pageblock_seqlock, seq)); return flags; } @@ -5954,6 +5959,7 @@ void set_pageblock_flags_group(struct page *page, unsigned long flags, unsigned long *bitmap; unsigned long pfn, bitidx; unsigned long value = 1; + unsigned long irq_flags; zone = page_zone(page); pfn = page_to_pfn(page); @@ -5961,11 +5967,13 @@ void set_pageblock_flags_group(struct page *page, unsigned long flags, bitidx = pfn_to_bitidx(zone, pfn); VM_BUG_ON(!zone_spans_pfn(zone, pfn)); + write_seqlock_irqsave(&zone->pageblock_seqlock, irq_flags); for (; start_bitidx <= end_bitidx; start_bitidx++, value <<= 1) if (flags & value) __set_bit(bitidx + start_bitidx, bitmap); else __clear_bit(bitidx + start_bitidx, bitmap); + write_sequnlock_irqrestore(&zone->pageblock_seqlock, irq_flags); } /* -- 1.7.9.5 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754849AbaAIHFN (ORCPT ); Thu, 9 Jan 2014 02:05:13 -0500 Received: from LGEMRELSE7Q.lge.com ([156.147.1.151]:54034 "EHLO LGEMRELSE7Q.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752613AbaAIHEd (ORCPT ); Thu, 9 Jan 2014 02:04:33 -0500 X-AuditID: 9c930197-b7b37ae000002e5a-cc-52ce49fef305 From: Joonsoo Kim To: Andrew Morton Cc: "Kirill A. Shutemov" , Rik van Riel , Jiang Liu , Mel Gorman , Cody P Schafer , Johannes Weiner , Michal Hocko , Minchan Kim , Michal Nazarewicz , Andi Kleen , Wei Yongjun , Tang Chen , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Joonsoo Kim , Joonsoo Kim Subject: [PATCH 5/7] mm/page_alloc: separate interface to set/get migratetype of freepage Date: Thu, 9 Jan 2014 16:04:45 +0900 Message-Id: <1389251087-10224-6-git-send-email-iamjoonsoo.kim@lge.com> X-Mailer: git-send-email 1.7.9.5 In-Reply-To: <1389251087-10224-1-git-send-email-iamjoonsoo.kim@lge.com> References: <1389251087-10224-1-git-send-email-iamjoonsoo.kim@lge.com> X-Brightmail-Tracker: AAAAAA== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Currently, we use (set/get)_freepage_migratetype in two use cases. One is to know the buddy list where this page will be linked and the other is to know the buddy list where this page is linked now. But, we should deal these two use cases differently, because information isn't sufficient for the second use case and properly setting this information needs some overhead. Whenever the page is merged or split in buddy, this information isn't properly re-assigned and it may not have enough information for the second use case. This patch just separates interface, so there is no functional change. Following patch will do further steps about this issue. Signed-off-by: Joonsoo Kim diff --git a/include/linux/mm.h b/include/linux/mm.h index 3552717..2733e0b 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -257,14 +257,31 @@ struct inode; #define page_private(page) ((page)->private) #define set_page_private(page, v) ((page)->private = (v)) -/* It's valid only if the page is free path or free_list */ -static inline void set_freepage_migratetype(struct page *page, int migratetype) +/* + * It's valid only if the page is on buddy. It represents + * which freelist the page is linked. + */ +static inline void set_buddy_migratetype(struct page *page, int migratetype) +{ + page->index = migratetype; +} + +static inline int get_buddy_migratetype(struct page *page) +{ + return page->index; +} + +/* + * It's valid only if the page is on pcp list. It represents + * which freelist the page should go on buddy. + */ +static inline void set_pcp_migratetype(struct page *page, int migratetype) { page->index = migratetype; } -/* It's valid only if the page is free path or free_list */ -static inline int get_freepage_migratetype(struct page *page) +/* It's valid only if the page is on pcp list */ +static inline int get_pcp_migratetype(struct page *page) { return page->index; } diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 4913829..c9e6622 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -681,7 +681,7 @@ static void free_pcppages_bulk(struct zone *zone, int count, page = list_entry(list->prev, struct page, lru); /* must delete as __free_one_page list manipulates */ list_del(&page->lru); - mt = get_freepage_migratetype(page); + mt = get_pcp_migratetype(page); /* MIGRATE_MOVABLE list may include MIGRATE_RESERVEs */ __free_one_page(page, zone, 0, mt); trace_mm_page_pcpu_drain(page, 0, mt); @@ -745,7 +745,7 @@ static void __free_pages_ok(struct page *page, unsigned int order) local_irq_save(flags); __count_vm_events(PGFREE, 1 << order); migratetype = get_pageblock_migratetype(page); - set_freepage_migratetype(page, migratetype); + set_buddy_migratetype(page, migratetype); free_one_page(page_zone(page), page, order, migratetype); local_irq_restore(flags); } @@ -903,7 +903,7 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order, rmv_page_order(page); area->nr_free--; expand(zone, page, order, current_order, area, migratetype); - set_freepage_migratetype(page, migratetype); + set_pcp_migratetype(page, migratetype); return page; } @@ -971,7 +971,7 @@ int move_freepages(struct zone *zone, order = page_order(page); list_move(&page->lru, &zone->free_area[order].free_list[migratetype]); - set_freepage_migratetype(page, migratetype); + set_buddy_migratetype(page, migratetype); page += 1 << order; pages_moved += 1 << order; } @@ -1094,12 +1094,11 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype) /* CMA pages cannot be stolen */ if (is_migrate_cma(migratetype)) { - set_freepage_migratetype(page, migratetype); + set_pcp_migratetype(page, migratetype); __mod_zone_page_state(zone, NR_FREE_CMA_PAGES, -(1 << order)); } else { - set_freepage_migratetype(page, - start_migratetype); + set_pcp_migratetype(page, start_migratetype); } /* Remove the page from the freelists */ @@ -1346,7 +1345,7 @@ void free_hot_cold_page(struct page *page, int cold) return; migratetype = get_pageblock_migratetype(page); - set_freepage_migratetype(page, migratetype); + set_pcp_migratetype(page, migratetype); local_irq_save(flags); __count_vm_event(PGFREE); diff --git a/mm/page_isolation.c b/mm/page_isolation.c index 534fb3a..c341413 100644 --- a/mm/page_isolation.c +++ b/mm/page_isolation.c @@ -190,7 +190,7 @@ __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn, * is MIGRATE_ISOLATE. Catch it and move the page into * MIGRATE_ISOLATE list. */ - if (get_freepage_migratetype(page) != MIGRATE_ISOLATE) { + if (get_buddy_migratetype(page) != MIGRATE_ISOLATE) { struct page *end_page; end_page = page + (1 << page_order(page)) - 1; -- 1.7.9.5 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755140AbaAIHFW (ORCPT ); Thu, 9 Jan 2014 02:05:22 -0500 Received: from LGEMRELSE7Q.lge.com ([156.147.1.151]:54034 "EHLO LGEMRELSE7Q.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751594AbaAIHEc (ORCPT ); Thu, 9 Jan 2014 02:04:32 -0500 X-AuditID: 9c930197-b7b37ae000002e5a-c6-52ce49fed4b0 From: Joonsoo Kim To: Andrew Morton Cc: "Kirill A. Shutemov" , Rik van Riel , Jiang Liu , Mel Gorman , Cody P Schafer , Johannes Weiner , Michal Hocko , Minchan Kim , Michal Nazarewicz , Andi Kleen , Wei Yongjun , Tang Chen , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Joonsoo Kim , Joonsoo Kim Subject: [PATCH 3/7] mm/page_alloc: move set_freepage_migratetype() to better place Date: Thu, 9 Jan 2014 16:04:43 +0900 Message-Id: <1389251087-10224-4-git-send-email-iamjoonsoo.kim@lge.com> X-Mailer: git-send-email 1.7.9.5 In-Reply-To: <1389251087-10224-1-git-send-email-iamjoonsoo.kim@lge.com> References: <1389251087-10224-1-git-send-email-iamjoonsoo.kim@lge.com> X-Brightmail-Tracker: AAAAAA== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org set_freepage_migratetype() inform us of the buddy freelist where the page should be linked when it goes to buddy freelist. Now, it has done in rmqueue_bulk() so that we should call get_pageblock_migratetype() to know it's migratetype exactly if CONFIG_CMA is enabled. That function has some overhead so that removing it is preferable. To remove it, we move set_freepage_migratetype() to __rmqueue_fallback() and __rmqueue_smallest(). In those functions, we can know migratetype easily so that we don't need to call get_pageblock_migratetype(). Removing is_migrate_isolate() is safe since what we want to ensure is that the page from cma will not go to other migratetype freelist. Signed-off-by: Joonsoo Kim diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 1489c301..4913829 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -903,6 +903,7 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order, rmv_page_order(page); area->nr_free--; expand(zone, page, order, current_order, area, migratetype); + set_freepage_migratetype(page, migratetype); return page; } @@ -1093,8 +1094,12 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype) /* CMA pages cannot be stolen */ if (is_migrate_cma(migratetype)) { + set_freepage_migratetype(page, migratetype); __mod_zone_page_state(zone, NR_FREE_CMA_PAGES, -(1 << order)); + } else { + set_freepage_migratetype(page, + start_migratetype); } /* Remove the page from the freelists */ @@ -1153,7 +1158,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order, unsigned long count, struct list_head *list, int migratetype, int cold) { - int mt = migratetype, i; + int i; spin_lock(&zone->lock); for (i = 0; i < count; ++i) { @@ -1174,12 +1179,6 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order, list_add(&page->lru, list); else list_add_tail(&page->lru, list); - if (IS_ENABLED(CONFIG_CMA)) { - mt = get_pageblock_migratetype(page); - if (!is_migrate_cma(mt) && !is_migrate_isolate(mt)) - mt = migratetype; - } - set_freepage_migratetype(page, mt); list = &page->lru; } __mod_zone_page_state(zone, NR_FREE_PAGES, -(i << order)); -- 1.7.9.5 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754976AbaAIHFR (ORCPT ); Thu, 9 Jan 2014 02:05:17 -0500 Received: from LGEMRELSE7Q.lge.com ([156.147.1.151]:43856 "EHLO LGEMRELSE7Q.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751204AbaAIHEc (ORCPT ); Thu, 9 Jan 2014 02:04:32 -0500 X-AuditID: 9c930197-b7b37ae000002e5a-c3-52ce49fdb87f From: Joonsoo Kim To: Andrew Morton Cc: "Kirill A. Shutemov" , Rik van Riel , Jiang Liu , Mel Gorman , Cody P Schafer , Johannes Weiner , Michal Hocko , Minchan Kim , Michal Nazarewicz , Andi Kleen , Wei Yongjun , Tang Chen , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Joonsoo Kim , Joonsoo Kim Subject: [PATCH 2/7] mm/cma: fix cma free page accounting Date: Thu, 9 Jan 2014 16:04:42 +0900 Message-Id: <1389251087-10224-3-git-send-email-iamjoonsoo.kim@lge.com> X-Mailer: git-send-email 1.7.9.5 In-Reply-To: <1389251087-10224-1-git-send-email-iamjoonsoo.kim@lge.com> References: <1389251087-10224-1-git-send-email-iamjoonsoo.kim@lge.com> X-Brightmail-Tracker: AAAAAA== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Cma pages can be allocated by not only order 0 request but also high order request. So, we should consider to account free cma page in the both places. Signed-off-by: Joonsoo Kim diff --git a/mm/page_alloc.c b/mm/page_alloc.c index b36aa5a..1489c301 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1091,6 +1091,12 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype) start_migratetype, migratetype); + /* CMA pages cannot be stolen */ + if (is_migrate_cma(migratetype)) { + __mod_zone_page_state(zone, + NR_FREE_CMA_PAGES, -(1 << order)); + } + /* Remove the page from the freelists */ list_del(&page->lru); rmv_page_order(page); @@ -1175,9 +1181,6 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order, } set_freepage_migratetype(page, mt); list = &page->lru; - if (is_migrate_cma(mt)) - __mod_zone_page_state(zone, NR_FREE_CMA_PAGES, - -(1 << order)); } __mod_zone_page_state(zone, NR_FREE_PAGES, -(i << order)); spin_unlock(&zone->lock); -- 1.7.9.5 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754708AbaAIHFK (ORCPT ); Thu, 9 Jan 2014 02:05:10 -0500 Received: from LGEMRELSE7Q.lge.com ([156.147.1.151]:54034 "EHLO LGEMRELSE7Q.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753423AbaAIHEf (ORCPT ); Thu, 9 Jan 2014 02:04:35 -0500 X-AuditID: 9c930197-b7b37ae000002e5a-d0-52ce49ffc5ee From: Joonsoo Kim To: Andrew Morton Cc: "Kirill A. Shutemov" , Rik van Riel , Jiang Liu , Mel Gorman , Cody P Schafer , Johannes Weiner , Michal Hocko , Minchan Kim , Michal Nazarewicz , Andi Kleen , Wei Yongjun , Tang Chen , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Joonsoo Kim , Joonsoo Kim Subject: [PATCH 7/7] mm/page_alloc: don't merge MIGRATE_(CMA|ISOLATE) pages on buddy Date: Thu, 9 Jan 2014 16:04:47 +0900 Message-Id: <1389251087-10224-8-git-send-email-iamjoonsoo.kim@lge.com> X-Mailer: git-send-email 1.7.9.5 In-Reply-To: <1389251087-10224-1-git-send-email-iamjoonsoo.kim@lge.com> References: <1389251087-10224-1-git-send-email-iamjoonsoo.kim@lge.com> X-Brightmail-Tracker: AAAAAA== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org If (MAX_ORDER-1) is greater than pageblock order, there is a possibility to merge different migratetype pages and to be linked in unintended freelist. While I test CMA, CMA pages are merged and linked into MOVABLE freelist by above issue and then, the pages change their migratetype to UNMOVABLE by try_to_steal_freepages(). After that, CMA to this region always fail. To prevent this, we should not merge the page on MIGRATE_(CMA|ISOLATE) freelist. Signed-off-by: Joonsoo Kim diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 2548b42..ea99cee 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -581,6 +581,15 @@ static inline void __free_one_page(struct page *page, __mod_zone_freepage_state(zone, 1 << order, migratetype); } else { + int buddy_mt = get_buddy_migratetype(buddy); + + /* We don't want to merge cma, isolate pages */ + if (unlikely(order >= pageblock_order) && + migratetype != buddy_mt && + (migratetype >= MIGRATE_PCPTYPES || + buddy_mt >= MIGRATE_PCPTYPES)) { + break; + } list_del(&buddy->lru); zone->free_area[order].nr_free--; rmv_page_order(buddy); -- 1.7.9.5 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754544AbaAIHFG (ORCPT ); Thu, 9 Jan 2014 02:05:06 -0500 Received: from LGEMRELSE7Q.lge.com ([156.147.1.151]:43856 "EHLO LGEMRELSE7Q.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753412AbaAIHEe (ORCPT ); Thu, 9 Jan 2014 02:04:34 -0500 X-AuditID: 9c930197-b7b37ae000002e5a-ce-52ce49fea5a2 From: Joonsoo Kim To: Andrew Morton Cc: "Kirill A. Shutemov" , Rik van Riel , Jiang Liu , Mel Gorman , Cody P Schafer , Johannes Weiner , Michal Hocko , Minchan Kim , Michal Nazarewicz , Andi Kleen , Wei Yongjun , Tang Chen , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Joonsoo Kim , Joonsoo Kim Subject: [PATCH 6/7] mm/page_alloc: store freelist migratetype to the page on buddy properly Date: Thu, 9 Jan 2014 16:04:46 +0900 Message-Id: <1389251087-10224-7-git-send-email-iamjoonsoo.kim@lge.com> X-Mailer: git-send-email 1.7.9.5 In-Reply-To: <1389251087-10224-1-git-send-email-iamjoonsoo.kim@lge.com> References: <1389251087-10224-1-git-send-email-iamjoonsoo.kim@lge.com> X-Brightmail-Tracker: AAAAAA== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org To maintain freelist migratetype information on buddy pages, migratetype should be set again whenever the page order is changed. set_page_order() is the best place to do, because it is called whenever the page order is changed, so this patch adds set_buddy_migratetype() to set_page_order(). And this patch makes set/get_buddy_migratetype() only enabled if it is really needed, because it has some overhead. Signed-off-by: Joonsoo Kim diff --git a/include/linux/mm.h b/include/linux/mm.h index 2733e0b..046e09f 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -258,6 +258,12 @@ struct inode; #define set_page_private(page, v) ((page)->private = (v)) /* + * This is for tracking the type of the list on buddy. + * It imposes some performance overhead to the buddy allocator, + * so we make it enabled only if it is needed. + */ +#if defined(CONFIG_MEMORY_ISOLATION) || defined(CONFIG_CMA) +/* * It's valid only if the page is on buddy. It represents * which freelist the page is linked. */ @@ -270,6 +276,10 @@ static inline int get_buddy_migratetype(struct page *page) { return page->index; } +#else +static inline void set_buddy_migratetype(struct page *page, int migratetype) {} +static inline int get_buddy_migratetype(struct page *page) { return 0; } +#endif /* * It's valid only if the page is on pcp list. It represents diff --git a/mm/page_alloc.c b/mm/page_alloc.c index c9e6622..2548b42 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -446,9 +446,11 @@ static inline void set_page_guard_flag(struct page *page) { } static inline void clear_page_guard_flag(struct page *page) { } #endif -static inline void set_page_order(struct page *page, int order) +static inline void set_page_order(struct page *page, int order, + int migratetype) { set_page_private(page, order); + set_buddy_migratetype(page, migratetype); __SetPageBuddy(page); } @@ -588,7 +590,7 @@ static inline void __free_one_page(struct page *page, page_idx = combined_idx; order++; } - set_page_order(page, order); + set_page_order(page, order, migratetype); /* * If this is not the largest possible page, check if the buddy @@ -745,7 +747,6 @@ static void __free_pages_ok(struct page *page, unsigned int order) local_irq_save(flags); __count_vm_events(PGFREE, 1 << order); migratetype = get_pageblock_migratetype(page); - set_buddy_migratetype(page, migratetype); free_one_page(page_zone(page), page, order, migratetype); local_irq_restore(flags); } @@ -834,7 +835,7 @@ static inline void expand(struct zone *zone, struct page *page, #endif list_add(&page[size].lru, &area->free_list[migratetype]); area->nr_free++; - set_page_order(&page[size], high); + set_page_order(&page[size], high, migratetype); } } -- 1.7.9.5 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754293AbaAIHFE (ORCPT ); Thu, 9 Jan 2014 02:05:04 -0500 Received: from LGEMRELSE7Q.lge.com ([156.147.1.151]:43856 "EHLO LGEMRELSE7Q.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751599AbaAIHEd (ORCPT ); Thu, 9 Jan 2014 02:04:33 -0500 X-AuditID: 9c930197-b7b37ae000002e5a-ca-52ce49fe07e5 From: Joonsoo Kim To: Andrew Morton Cc: "Kirill A. Shutemov" , Rik van Riel , Jiang Liu , Mel Gorman , Cody P Schafer , Johannes Weiner , Michal Hocko , Minchan Kim , Michal Nazarewicz , Andi Kleen , Wei Yongjun , Tang Chen , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Joonsoo Kim , Joonsoo Kim Subject: [PATCH 4/7] mm/isolation: remove invalid check condition Date: Thu, 9 Jan 2014 16:04:44 +0900 Message-Id: <1389251087-10224-5-git-send-email-iamjoonsoo.kim@lge.com> X-Mailer: git-send-email 1.7.9.5 In-Reply-To: <1389251087-10224-1-git-send-email-iamjoonsoo.kim@lge.com> References: <1389251087-10224-1-git-send-email-iamjoonsoo.kim@lge.com> X-Brightmail-Tracker: AAAAAA== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org test_page_isolated() checks stability of pages. It checks two conditions, one is that the page is on isolate migratetype and the other is that the page is on the buddy and the isolate freelist. With satisfying these two conditions, we can determine that the page is stable and then go forward. __test_page_isolated_in_pageblock() is one of the main functions for this test. In that function, if it meets the page with page_count 0 and isolate migratetype, it decides that this page is stable. But this is not true, because there is possiblity that this kind of page is on the pcp and then it can be allocated by other users even though we hold the zone lock. So removing this check. Signed-off-by: Joonsoo Kim diff --git a/mm/page_isolation.c b/mm/page_isolation.c index d1473b2..534fb3a 100644 --- a/mm/page_isolation.c +++ b/mm/page_isolation.c @@ -199,9 +199,6 @@ __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn, } pfn += 1 << page_order(page); } - else if (page_count(page) == 0 && - get_freepage_migratetype(page) == MIGRATE_ISOLATE) - pfn += 1; else if (skip_hwpoisoned_pages && PageHWPoison(page)) { /* * The HWPoisoned page may be not in buddy -- 1.7.9.5 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756207AbaAIJGt (ORCPT ); Thu, 9 Jan 2014 04:06:49 -0500 Received: from mail-wi0-f178.google.com ([209.85.212.178]:54330 "EHLO mail-wi0-f178.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754147AbaAIJGn (ORCPT ); Thu, 9 Jan 2014 04:06:43 -0500 From: Michal Nazarewicz To: Joonsoo Kim , Andrew Morton Cc: "Kirill A. Shutemov" , Rik van Riel , Jiang Liu , Mel Gorman , Cody P Schafer , Johannes Weiner , Michal Hocko , Minchan Kim , Andi Kleen , Wei Yongjun , Tang Chen , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Joonsoo Kim , Joonsoo Kim Subject: Re: [PATCH 0/7] improve robustness on handling migratetype In-Reply-To: <1389251087-10224-1-git-send-email-iamjoonsoo.kim@lge.com> Organization: http://mina86.com/ References: <1389251087-10224-1-git-send-email-iamjoonsoo.kim@lge.com> User-Agent: Notmuch/0.17+15~gb65ca8e (http://notmuchmail.org) Emacs/24.3.50.1 (x86_64-unknown-linux-gnu) X-Face: PbkBB1w#)bOqd`iCe"Ds{e+!C7`pkC9a|f)Qo^BMQvy\q5x3?vDQJeN(DS?|-^$uMti[3D*#^_Ts"pU$jBQLq~Ud6iNwAw_r_o_4]|JO?]}P_}Nc&"p#D(ZgUb4uCNPe7~a[DbPG0T~!&c.y$Ur,=N4RT>]dNpd;KFrfMCylc}gc??'U2j,!8%xdD Face: iVBORw0KGgoAAAANSUhEUgAAADAAAAAwBAMAAAClLOS0AAAAJFBMVEWbfGlUPDDHgE57V0jUupKjgIObY0PLrom9mH4dFRK4gmjPs41MxjOgAAACQElEQVQ4jW3TMWvbQBQHcBk1xE6WyALX1069oZBMlq+ouUwpEQQ6uRjttkWP4CmBgGM0BQLBdPFZYPsyFUo6uEtKDQ7oy/U96XR2Ux8ehH/89Z6enqxBcS7Lg81jmSuujrfCZcLI/TYYvbGj+jbgFpHJ/bqQAUISj8iLyu4LuFHJTosxsucO4jSDNE0Hq3hwK/ceQ5sx97b8LcUDsILfk+ovHkOIsMbBfg43VuQ5Ln9YAGCkUdKJoXR9EclFBhixy3EGVz1K6eEkhxCAkeMMnqoAhAKwhoUJkDrCqvbecaYINlFKSRS1i12VKH1XpUd4qxL876EkMcDvHj3s5RBajHHMlA5iK32e0C7VgG0RlzFPvoYHZLRmAC0BmNcBruhkE0KsMsbEc62ZwUJDxWUdMsMhVqovoT96i/DnX/ASvz/6hbCabELLk/6FF/8PNpPCGqcZTGFcBhhAaZZDbQPaAB3+KrWWy2XgbYDNIinkdWAFcCpraDE/knwe5DBqGmgzESl1p2E4MWAz0VUPgYYzmfWb9yS4vCvgsxJriNTHoIBz5YteBvg+VGISQWUqhMiByPIPpygeDBE6elD973xWwKkEiHZAHKjhuPsFnBuArrzxtakRcISv+XMIPl4aGBUJm8Emk7qBYU8IlgNEIpiJhk/No24jHwkKTFHDWfPniR4iw5vJaw2nzSjfq2zffcE/GDjRC2dn0J0XwPAbDL84TvaFCJEU4Oml9pRyEUhR3Cl2t01AoEjRbs0sYugp14/4X5n4pU4EHHnMAAAAAElFTkSuQmCC X-PGP: 50751FF4 X-PGP-FP: AC1F 5F5C D418 88F8 CC84 5858 2060 4012 5075 1FF4 X-Hashcash: 1:20:140109:kirill.shutemov@linux.intel.com::Yf0jqtrZHZFT1n4a:00000000000000000000000000000002bk X-Hashcash: 1:20:140109:js1304@gmail.com::uV8zNzeFABzcIKe3:00RaG X-Hashcash: 1:20:140109:minchan@kernel.org::7n8qCTBv0U/6i5M6:00000000000000000000000000000000000000000000jcz X-Hashcash: 1:20:140109:iamjoonsoo.kim@lge.com::N200yJdbCWAqF8Zn:0000000000000000000000000000000000000000+Zs X-Hashcash: 1:20:140109:linux-kernel@vger.kernel.org::6bXCcJrF8iuvNcLp:0000000000000000000000000000000000pRH X-Hashcash: 1:20:140109:riel@redhat.com::C6oMFZc8KEcUjtMN:001RHr X-Hashcash: 1:20:140109:ak@linux.intel.com::vrzKyUV8oWrth56x:00000000000000000000000000000000000000000001hnH X-Hashcash: 1:20:140109:jiang.liu@huawei.com::y1sjOQxNpZKoMh8j:000000000000000000000000000000000000000003ETz X-Hashcash: 1:20:140109:linux-mm@kvack.org::m9NjOMu+9ogv7l9e:00000000000000000000000000000000000000000003Txy X-Hashcash: 1:20:140109:mhocko@suse.cz::hiJRjHQ2aRv/SCBF:0003iyz X-Hashcash: 1:20:140109:iamjoonsoo.kim@lge.com::q1KdE2JLIBAhe727:0000000000000000000000000000000000000004avY X-Hashcash: 1:20:140109:cody@linux.vnet.ibm.com::Kp1tAOA1vFDYnx3W:000000000000000000000000000000000000004GeS X-Hashcash: 1:20:140109:yongjun_wei@trendmicro.com.cn::FbYTV0/fLTN3S+WA:000000000000000000000000000000005WdQ X-Hashcash: 1:20:140109:tangchen@cn.fujitsu.com::ULP8mvgqE8TNgna6:000000000000000000000000000000000000005CLq X-Hashcash: 1:20:140109:akpm@linux-foundation.org::f9QDGTvm7ljz+NPN:0000000000000000000000000000000000007jsx X-Hashcash: 1:20:140109:mgorman@suse.de::VSLNBq+wAk9KrFtn:00C1l8 X-Hashcash: 1:20:140109:hannes@cmpxchg.org::5XJiw3Z8lU2d6c4d:0000000000000000000000000000000000000000000EO6z Date: Thu, 09 Jan 2014 10:06:34 +0100 Message-ID: MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On Thu, Jan 09 2014, Joonsoo Kim wrote: > Third, there is the problem on buddy allocator. It doesn't consider > migratetype when merging buddy, so pages from cma or isolate region can > be moved to other migratetype freelist. It makes CMA failed over and over. > To prevent it, the buddy allocator should consider migratetype if > CMA/ISOLATE is enabled. There should never be situation where a CMA page shares a pageblock (or a max-order page) with a non-CMA page though, so this should never be an issue. --=20 Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=3D./ `o ..o | Computer Science, Micha=C5=82 =E2=80=9Cmina86=E2=80=9D Nazarewicz = (o o) ooo +------ooO--(_)--Ooo-- --=-=-= Content-Type: multipart/signed; boundary="==-=-="; micalg=pgp-sha1; protocol="application/pgp-signature" --==-=-= Content-Type: text/plain --==-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAEBAgAGBQJSzmaaAAoJECBgQBJQdR/03rwP/A70MFWvk9Zz81DwMlNFZQTi jkeQWXQbKlz0Q6W3FPaTZK7nTqIcRQCwfttIGrnvWJOjw9IfLyWNSdzHaI8eQZsB 2Wyw0jUhhv4ARwOSEopst5Z+oeAmeJKHz4meJObdaH+3szfxNZm5wyl0RaNLqDGd 8MsfmnmLbgTNE/l/Fl2lNzGod36T51DMfzgTqVnBO3zK7WU8iAO3jP3y99aoijbZ TbV3IH12T6heh/bYN9ky/FIxZ9LGI59pr0AvrvHzPEHD3ubE6La/40XCbosjsKif MGw+2KmgL4/old+Xdtl6fm1hRAvnjpYk6rBVEzBlSZNcbHu5avfYJjcAfgV9sfMS i9E02AuI3muTPvglCtAUtciBMdDcqbR8DLyJO8SYNqPqSZz4LtVz0pBqkpQyq7Uw onUiuzlLSYvD7U1v8KfmRmhMkeIvBRGma2ifOoUOMCLIkhbBE0EIFztis/0BcjTf CyWHDAMDI2ThkyM+aOgygrS0bnnDCHQDHWv3XvColu+Of6gRQL5gU/KyKI0Mau5T Zh8DQKlX0u0BZ9Kt7HZ44SNrwEDzpf7Ov/KBgZF/Meg/fDnL06KJOMWjWG6jjTLp 1H6mQiqChwyVJ7qS2RoTWtG78sqI/xgBFVGFzogTehypy0UoRB6C64QjfHkg8SIW SKevhN+2RDwN3u2FDdC5 =9KuD -----END PGP SIGNATURE----- --==-=-=-- --=-=-=-- From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756085AbaAIJIg (ORCPT ); Thu, 9 Jan 2014 04:08:36 -0500 Received: from mail-wi0-f172.google.com ([209.85.212.172]:41184 "EHLO mail-wi0-f172.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754497AbaAIJIT (ORCPT ); Thu, 9 Jan 2014 04:08:19 -0500 From: Michal Nazarewicz To: Joonsoo Kim , Andrew Morton Cc: "Kirill A. Shutemov" , Rik van Riel , Jiang Liu , Mel Gorman , Cody P Schafer , Johannes Weiner , Michal Hocko , Minchan Kim , Andi Kleen , Wei Yongjun , Tang Chen , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Joonsoo Kim , Joonsoo Kim Subject: Re: [PATCH 1/7] mm/page_alloc: synchronize get/set pageblock In-Reply-To: <1389251087-10224-2-git-send-email-iamjoonsoo.kim@lge.com> Organization: http://mina86.com/ References: <1389251087-10224-1-git-send-email-iamjoonsoo.kim@lge.com> <1389251087-10224-2-git-send-email-iamjoonsoo.kim@lge.com> User-Agent: Notmuch/0.17+15~gb65ca8e (http://notmuchmail.org) Emacs/24.3.50.1 (x86_64-unknown-linux-gnu) X-Face: PbkBB1w#)bOqd`iCe"Ds{e+!C7`pkC9a|f)Qo^BMQvy\q5x3?vDQJeN(DS?|-^$uMti[3D*#^_Ts"pU$jBQLq~Ud6iNwAw_r_o_4]|JO?]}P_}Nc&"p#D(ZgUb4uCNPe7~a[DbPG0T~!&c.y$Ur,=N4RT>]dNpd;KFrfMCylc}gc??'U2j,!8%xdD Face: iVBORw0KGgoAAAANSUhEUgAAADAAAAAwBAMAAAClLOS0AAAAJFBMVEWbfGlUPDDHgE57V0jUupKjgIObY0PLrom9mH4dFRK4gmjPs41MxjOgAAACQElEQVQ4jW3TMWvbQBQHcBk1xE6WyALX1069oZBMlq+ouUwpEQQ6uRjttkWP4CmBgGM0BQLBdPFZYPsyFUo6uEtKDQ7oy/U96XR2Ux8ehH/89Z6enqxBcS7Lg81jmSuujrfCZcLI/TYYvbGj+jbgFpHJ/bqQAUISj8iLyu4LuFHJTosxsucO4jSDNE0Hq3hwK/ceQ5sx97b8LcUDsILfk+ovHkOIsMbBfg43VuQ5Ln9YAGCkUdKJoXR9EclFBhixy3EGVz1K6eEkhxCAkeMMnqoAhAKwhoUJkDrCqvbecaYINlFKSRS1i12VKH1XpUd4qxL876EkMcDvHj3s5RBajHHMlA5iK32e0C7VgG0RlzFPvoYHZLRmAC0BmNcBruhkE0KsMsbEc62ZwUJDxWUdMsMhVqovoT96i/DnX/ASvz/6hbCabELLk/6FF/8PNpPCGqcZTGFcBhhAaZZDbQPaAB3+KrWWy2XgbYDNIinkdWAFcCpraDE/knwe5DBqGmgzESl1p2E4MWAz0VUPgYYzmfWb9yS4vCvgsxJriNTHoIBz5YteBvg+VGISQWUqhMiByPIPpygeDBE6elD973xWwKkEiHZAHKjhuPsFnBuArrzxtakRcISv+XMIPl4aGBUJm8Emk7qBYU8IlgNEIpiJhk/No24jHwkKTFHDWfPniR4iw5vJaw2nzSjfq2zffcE/GDjRC2dn0J0XwPAbDL84TvaFCJEU4Oml9pRyEUhR3Cl2t01AoEjRbs0sYugp14/4X5n4pU4EHHnMAAAAAElFTkSuQmCC X-PGP: 50751FF4 X-PGP-FP: AC1F 5F5C D418 88F8 CC84 5858 2060 4012 5075 1FF4 X-Hashcash: 1:20:140109:iamjoonsoo.kim@lge.com::vdMEso2TwIYL6JmO:0000000000000000000000000000000000000000B6c X-Hashcash: 1:20:140109:tangchen@cn.fujitsu.com::lVZl2N3sspvA4Vgf:000000000000000000000000000000000000000MlJ X-Hashcash: 1:20:140109:hannes@cmpxchg.org::VghNJQ11lS4Uxhf/:00000000000000000000000000000000000000000000cgc X-Hashcash: 1:20:140109:iamjoonsoo.kim@lge.com::6hfu/5lqbznLkcXI:0000000000000000000000000000000000000000jJV X-Hashcash: 1:20:140109:kirill.shutemov@linux.intel.com::CL4KUT5tRDrI92Z5:00000000000000000000000000000015RN X-Hashcash: 1:20:140109:linux-mm@kvack.org::SAx17LZ07f50Kly/:0000000000000000000000000000000000000000000155c X-Hashcash: 1:20:140109:js1304@gmail.com::J4u+riom/A7Bi0uO:01Li7 X-Hashcash: 1:20:140109:akpm@linux-foundation.org::ux4LMFwCvtEsVxPW:0000000000000000000000000000000000001TTL X-Hashcash: 1:20:140109:linux-kernel@vger.kernel.org::ayhJCjEfoaB/Kg6e:00000000000000000000000000000000020cM X-Hashcash: 1:20:140109:minchan@kernel.org::1lozXqSP8EwvHxyq:0000000000000000000000000000000000000000000393u X-Hashcash: 1:20:140109:mhocko@suse.cz::jLJAVTkWXcrhln/X:0003mhW X-Hashcash: 1:20:140109:ak@linux.intel.com::wV75IIv3keN8v9rq:00000000000000000000000000000000000000000003tOp X-Hashcash: 1:20:140109:cody@linux.vnet.ibm.com::ZRV/Y02rc4Iu2NAt:000000000000000000000000000000000000005F9a X-Hashcash: 1:20:140109:jiang.liu@huawei.com::YOjGu5SPknAQdu64:000000000000000000000000000000000000000005NPR X-Hashcash: 1:20:140109:yongjun_wei@trendmicro.com.cn::AiHZLss0/hDCMOSJ:000000000000000000000000000000005TU2 X-Hashcash: 1:20:140109:riel@redhat.com::YP80euX6OQCANVRE:006DKP X-Hashcash: 1:20:140109:mgorman@suse.de::c8XOXZ1CGfX6YdmI:008LCQ Date: Thu, 09 Jan 2014 10:08:10 +0100 Message-ID: MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On Thu, Jan 09 2014, Joonsoo Kim wrote: > @@ -5927,15 +5928,19 @@ unsigned long get_pageblock_flags_group(struct pa= ge *page, > unsigned long pfn, bitidx; > unsigned long flags =3D 0; > unsigned long value =3D 1; > + unsigned int seq; >=20=20 > zone =3D page_zone(page); > pfn =3D page_to_pfn(page); > bitmap =3D get_pageblock_bitmap(zone, pfn); > bitidx =3D pfn_to_bitidx(zone, pfn); >=20=20 > - for (; start_bitidx <=3D end_bitidx; start_bitidx++, value <<=3D 1) > - if (test_bit(bitidx + start_bitidx, bitmap)) > - flags |=3D value; > + do { + flags =3D 0; > + seq =3D read_seqbegin(&zone->pageblock_seqlock); > + for (; start_bitidx <=3D end_bitidx; start_bitidx++, value <<=3D 1) > + if (test_bit(bitidx + start_bitidx, bitmap)) > + flags |=3D value; > + } while (read_seqretry(&zone->pageblock_seqlock, seq)); >=20=20 > return flags; > } --=20 Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=3D./ `o ..o | Computer Science, Micha=C5=82 =E2=80=9Cmina86=E2=80=9D Nazarewicz = (o o) ooo +------ooO--(_)--Ooo-- --=-=-= Content-Type: multipart/signed; boundary="==-=-="; micalg=pgp-sha1; protocol="application/pgp-signature" --==-=-= Content-Type: text/plain --==-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAEBAgAGBQJSzmb6AAoJECBgQBJQdR/0Ew4P/RsSxW22DyEt524wSIwWSYKS aOVMs2Qv/t5xtuW5wHh1rjxmAAIEoZItYYsYaBHr1tbY0X1Uw0OwcZz6DcKBERlm hSiAHVbg3K62V4LTq0Dj8QXzEgzQvXh9T8Kvin6QBzZIRHWHTHQZweHyPMDCy5Ny 4ATCcT8qEvzCTjq584TC1fYPJgG0X+ZjgTpxNdPzFBVXXrZwTrt7DRrlrVQCtdYb 3OQktscAGv4HlImUJQWRn2pn61eKqoJk4/OcmHQX5EHet2QUZ6Bp0nwy/V2Spyis i1+e4OFc245eMTitDeNR6duI7K/n4IOmgsTePmj7C8uVp1XqdToW2Oic9BxN3td6 4NUw8pIz+f6Fj3BMxYPD5rvBXMAeZ9lxctXT/NTy2EYVWQvVeVN4NApj/WJZ9lD8 lVxb1f8relKG3xdj73juDaZUg9w/fV3b2fuJAKiybEYa5g23Rm5Xk5elZzT8NZKy K5pGBsm18Chr1IZewBfQlVP/MR/M2LO2Dar3q2cTNo+VQJA3a11+gzd9u18hog+m BCn4wazugDfhwLIpMNvKPQJbwKgbTRsrzbibSYt6kRUr6DL86Gh6IYcu0PwUivJK t+x1yYttyZQAgRxosxkTGxMbqyOXEwo7ux56E1VCHQv9yPfEia9KHv1SiBp12HGT D7vD5QTb9GkAlUDykK0o =mRRG -----END PGP SIGNATURE----- --==-=-=-- --=-=-=-- From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755990AbaAIJS0 (ORCPT ); Thu, 9 Jan 2014 04:18:26 -0500 Received: from mail-wg0-f42.google.com ([74.125.82.42]:49937 "EHLO mail-wg0-f42.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752319AbaAIJSJ (ORCPT ); Thu, 9 Jan 2014 04:18:09 -0500 From: Michal Nazarewicz To: Joonsoo Kim , Andrew Morton Cc: "Kirill A. Shutemov" , Rik van Riel , Jiang Liu , Mel Gorman , Cody P Schafer , Johannes Weiner , Michal Hocko , Minchan Kim , Andi Kleen , Wei Yongjun , Tang Chen , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Joonsoo Kim , Joonsoo Kim Subject: Re: [PATCH 5/7] mm/page_alloc: separate interface to set/get migratetype of freepage In-Reply-To: <1389251087-10224-6-git-send-email-iamjoonsoo.kim@lge.com> Organization: http://mina86.com/ References: <1389251087-10224-1-git-send-email-iamjoonsoo.kim@lge.com> <1389251087-10224-6-git-send-email-iamjoonsoo.kim@lge.com> User-Agent: Notmuch/0.17+15~gb65ca8e (http://notmuchmail.org) Emacs/24.3.50.1 (x86_64-unknown-linux-gnu) X-Face: PbkBB1w#)bOqd`iCe"Ds{e+!C7`pkC9a|f)Qo^BMQvy\q5x3?vDQJeN(DS?|-^$uMti[3D*#^_Ts"pU$jBQLq~Ud6iNwAw_r_o_4]|JO?]}P_}Nc&"p#D(ZgUb4uCNPe7~a[DbPG0T~!&c.y$Ur,=N4RT>]dNpd;KFrfMCylc}gc??'U2j,!8%xdD Face: iVBORw0KGgoAAAANSUhEUgAAADAAAAAwBAMAAAClLOS0AAAAJFBMVEWbfGlUPDDHgE57V0jUupKjgIObY0PLrom9mH4dFRK4gmjPs41MxjOgAAACQElEQVQ4jW3TMWvbQBQHcBk1xE6WyALX1069oZBMlq+ouUwpEQQ6uRjttkWP4CmBgGM0BQLBdPFZYPsyFUo6uEtKDQ7oy/U96XR2Ux8ehH/89Z6enqxBcS7Lg81jmSuujrfCZcLI/TYYvbGj+jbgFpHJ/bqQAUISj8iLyu4LuFHJTosxsucO4jSDNE0Hq3hwK/ceQ5sx97b8LcUDsILfk+ovHkOIsMbBfg43VuQ5Ln9YAGCkUdKJoXR9EclFBhixy3EGVz1K6eEkhxCAkeMMnqoAhAKwhoUJkDrCqvbecaYINlFKSRS1i12VKH1XpUd4qxL876EkMcDvHj3s5RBajHHMlA5iK32e0C7VgG0RlzFPvoYHZLRmAC0BmNcBruhkE0KsMsbEc62ZwUJDxWUdMsMhVqovoT96i/DnX/ASvz/6hbCabELLk/6FF/8PNpPCGqcZTGFcBhhAaZZDbQPaAB3+KrWWy2XgbYDNIinkdWAFcCpraDE/knwe5DBqGmgzESl1p2E4MWAz0VUPgYYzmfWb9yS4vCvgsxJriNTHoIBz5YteBvg+VGISQWUqhMiByPIPpygeDBE6elD973xWwKkEiHZAHKjhuPsFnBuArrzxtakRcISv+XMIPl4aGBUJm8Emk7qBYU8IlgNEIpiJhk/No24jHwkKTFHDWfPniR4iw5vJaw2nzSjfq2zffcE/GDjRC2dn0J0XwPAbDL84TvaFCJEU4Oml9pRyEUhR3Cl2t01AoEjRbs0sYugp14/4X5n4pU4EHHnMAAAAAElFTkSuQmCC X-PGP: 50751FF4 X-PGP-FP: AC1F 5F5C D418 88F8 CC84 5858 2060 4012 5075 1FF4 X-Hashcash: 1:20:140109:minchan@kernel.org::zNQcrf8K05Qy1P0L:000000000000000000000000000000000000000000001qa X-Hashcash: 1:20:140109:mhocko@suse.cz::CYrG+xvcyE4k88CU:00006wS X-Hashcash: 1:20:140109:hannes@cmpxchg.org::SWcygaCBjIo+XhAQ:00000000000000000000000000000000000000000000e3D X-Hashcash: 1:20:140109:kirill.shutemov@linux.intel.com::2FOnX7z614BLMR1Y:0000000000000000000000000000000CXx X-Hashcash: 1:20:140109:js1304@gmail.com::gERS0vSExrgsqf4B:00DY5 X-Hashcash: 1:20:140109:linux-mm@kvack.org::ekRIlCIsn6IyUaT7:00000000000000000000000000000000000000000000nPf X-Hashcash: 1:20:140109:tangchen@cn.fujitsu.com::Ns1urbGAS805/usX:000000000000000000000000000000000000000yJK X-Hashcash: 1:20:140109:ak@linux.intel.com::MpOgkGgbgucaDWn/:00000000000000000000000000000000000000000001v9n X-Hashcash: 1:20:140109:jiang.liu@huawei.com::4oJNbP+nstO8FV2G:000000000000000000000000000000000000000002Po0 X-Hashcash: 1:20:140109:iamjoonsoo.kim@lge.com::kWp4GUsKtoCORN2h:0000000000000000000000000000000000000002j1q X-Hashcash: 1:20:140109:riel@redhat.com::TEMtLvd/A+7sY7XQ:002xiF X-Hashcash: 1:20:140109:linux-kernel@vger.kernel.org::4Eq7FhxLT6ej0Sax:0000000000000000000000000000000003gAJ X-Hashcash: 1:20:140109:cody@linux.vnet.ibm.com::66gvlHlPuex7QgVD:000000000000000000000000000000000000005zeE X-Hashcash: 1:20:140109:mgorman@suse.de::ITZxtQTQ1/E5jS0N:008KkO X-Hashcash: 1:20:140109:iamjoonsoo.kim@lge.com::2oCy807DhFrQMFF8:0000000000000000000000000000000000000007aOB X-Hashcash: 1:20:140109:akpm@linux-foundation.org::omxJAgkVotdf0d81:000000000000000000000000000000000000AicW X-Hashcash: 1:20:140109:yongjun_wei@trendmicro.com.cn::Xt9JvrC7RlIzNqcZ:00000000000000000000000000000000Cwqa Date: Thu, 09 Jan 2014 10:18:00 +0100 Message-ID: MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On Thu, Jan 09 2014, Joonsoo Kim wrote: > Currently, we use (set/get)_freepage_migratetype in two use cases. > One is to know the buddy list where this page will be linked and > the other is to know the buddy list where this page is linked now. > > But, we should deal these two use cases differently, because information > isn't sufficient for the second use case and properly setting this > information needs some overhead. Whenever the page is merged or split > in buddy, this information isn't properly re-assigned and it may not > have enough information for the second use case. > > This patch just separates interface, so there is no functional change. > Following patch will do further steps about this issue. > > Signed-off-by: Joonsoo Kim Acked-by: Michal Nazarewicz I think this patch would be smaller if it was pushed earlier in the patchset. --=20 Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=3D./ `o ..o | Computer Science, Micha=C5=82 =E2=80=9Cmina86=E2=80=9D Nazarewicz = (o o) ooo +------ooO--(_)--Ooo-- --=-=-= Content-Type: multipart/signed; boundary="==-=-="; micalg=pgp-sha1; protocol="application/pgp-signature" --==-=-= Content-Type: text/plain --==-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAEBAgAGBQJSzmlIAAoJECBgQBJQdR/0aFAP+waDevUQpa9xhmLPbYlXrCpa LO3GprL2KYWtpEnjGAkGmI1ywnsbukNNpXg/q9n1xY/fr7SYQlys9TFnPsydRFq7 R5K3M07ITUEeEl65h269aU86odK1iH246ch3fwjPOrPOz6hmZkwiHUos6dDWE4SN Oe8/FzbhLHVXpKrSrnc9rSdArZfUbjSmPx3Np/32WCWTE9nEQxT5G1tLrRMhd2nh QAyKS93Z4YDwFGRnniibbfC3lns7lRbSAtUUS+SBNXaqQpa8jPA7rklsuDR8YXw1 YLY88ojn7pyW8cZsNn93oe9m9O850EbTJOHzVZIgJeRU04pOWRmKF7WYQSq8ZSvo MvuRBNXz05huYVwyUKvCUAyNmoDhobOSEFE2Go3vaYcA7dhPYMm00VzIdJI1u/w0 63zwaWfVUcqFvnnsOZMTHrJlb/U0Cvv8pBUJcSW8uPL3VNl8P5v4jKXaY7gWMEmq g8h6Pz8Bv3S9qAnO9YDRaT20jcQjVVRnrxya/ovgwhU8l+/qbWMkCQvcMRXXgY7G +oBXZwmRYcFGIdrMox2GbtlrQWFj9C8/VrzlqbJNvAOU76t9PJ/429JENp7hjidL U9TngSMevAAOgbvZzIchVKLKBLXIiCb+RIf87JEYIdLT3sspRBZwgPiBAn/PYXTL kdwG+bGk3CGEGcrT8/Tp =19u7 -----END PGP SIGNATURE----- --==-=-=-- --=-=-=-- From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756562AbaAIJUL (ORCPT ); Thu, 9 Jan 2014 04:20:11 -0500 Received: from mail-wi0-f180.google.com ([209.85.212.180]:49498 "EHLO mail-wi0-f180.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756171AbaAIJTw (ORCPT ); Thu, 9 Jan 2014 04:19:52 -0500 From: Michal Nazarewicz To: Joonsoo Kim , Andrew Morton Cc: "Kirill A. Shutemov" , Rik van Riel , Jiang Liu , Mel Gorman , Cody P Schafer , Johannes Weiner , Michal Hocko , Minchan Kim , Andi Kleen , Wei Yongjun , Tang Chen , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Joonsoo Kim , Joonsoo Kim Subject: Re: [PATCH 6/7] mm/page_alloc: store freelist migratetype to the page on buddy properly In-Reply-To: <1389251087-10224-7-git-send-email-iamjoonsoo.kim@lge.com> Organization: http://mina86.com/ References: <1389251087-10224-1-git-send-email-iamjoonsoo.kim@lge.com> <1389251087-10224-7-git-send-email-iamjoonsoo.kim@lge.com> User-Agent: Notmuch/0.17+15~gb65ca8e (http://notmuchmail.org) Emacs/24.3.50.1 (x86_64-unknown-linux-gnu) X-Face: PbkBB1w#)bOqd`iCe"Ds{e+!C7`pkC9a|f)Qo^BMQvy\q5x3?vDQJeN(DS?|-^$uMti[3D*#^_Ts"pU$jBQLq~Ud6iNwAw_r_o_4]|JO?]}P_}Nc&"p#D(ZgUb4uCNPe7~a[DbPG0T~!&c.y$Ur,=N4RT>]dNpd;KFrfMCylc}gc??'U2j,!8%xdD Face: iVBORw0KGgoAAAANSUhEUgAAADAAAAAwBAMAAAClLOS0AAAAJFBMVEWbfGlUPDDHgE57V0jUupKjgIObY0PLrom9mH4dFRK4gmjPs41MxjOgAAACQElEQVQ4jW3TMWvbQBQHcBk1xE6WyALX1069oZBMlq+ouUwpEQQ6uRjttkWP4CmBgGM0BQLBdPFZYPsyFUo6uEtKDQ7oy/U96XR2Ux8ehH/89Z6enqxBcS7Lg81jmSuujrfCZcLI/TYYvbGj+jbgFpHJ/bqQAUISj8iLyu4LuFHJTosxsucO4jSDNE0Hq3hwK/ceQ5sx97b8LcUDsILfk+ovHkOIsMbBfg43VuQ5Ln9YAGCkUdKJoXR9EclFBhixy3EGVz1K6eEkhxCAkeMMnqoAhAKwhoUJkDrCqvbecaYINlFKSRS1i12VKH1XpUd4qxL876EkMcDvHj3s5RBajHHMlA5iK32e0C7VgG0RlzFPvoYHZLRmAC0BmNcBruhkE0KsMsbEc62ZwUJDxWUdMsMhVqovoT96i/DnX/ASvz/6hbCabELLk/6FF/8PNpPCGqcZTGFcBhhAaZZDbQPaAB3+KrWWy2XgbYDNIinkdWAFcCpraDE/knwe5DBqGmgzESl1p2E4MWAz0VUPgYYzmfWb9yS4vCvgsxJriNTHoIBz5YteBvg+VGISQWUqhMiByPIPpygeDBE6elD973xWwKkEiHZAHKjhuPsFnBuArrzxtakRcISv+XMIPl4aGBUJm8Emk7qBYU8IlgNEIpiJhk/No24jHwkKTFHDWfPniR4iw5vJaw2nzSjfq2zffcE/GDjRC2dn0J0XwPAbDL84TvaFCJEU4Oml9pRyEUhR3Cl2t01AoEjRbs0sYugp14/4X5n4pU4EHHnMAAAAAElFTkSuQmCC X-PGP: 50751FF4 X-PGP-FP: AC1F 5F5C D418 88F8 CC84 5858 2060 4012 5075 1FF4 X-Hashcash: 1:20:140109:akpm@linux-foundation.org::nOfiZZ8sAIoK9VKk:0000000000000000000000000000000000000NrE X-Hashcash: 1:20:140109:iamjoonsoo.kim@lge.com::C8cA1JndooMzlZNf:0000000000000000000000000000000000000000LJH X-Hashcash: 1:20:140109:cody@linux.vnet.ibm.com::eLZwybb9p0UCsTxO:000000000000000000000000000000000000000hEJ X-Hashcash: 1:20:140109:jiang.liu@huawei.com::4YOkIM02d5H0Kelv:000000000000000000000000000000000000000000be1 X-Hashcash: 1:20:140109:hannes@cmpxchg.org::t0XS05NfB2L7ALjW:0000000000000000000000000000000000000000000113+ X-Hashcash: 1:20:140109:mgorman@suse.de::nx3bcKS0Arthvent:001OCO X-Hashcash: 1:20:140109:tangchen@cn.fujitsu.com::n01I2HEYvwKGzPp+:000000000000000000000000000000000000001kMn X-Hashcash: 1:20:140109:iamjoonsoo.kim@lge.com::zGu1htoMJcLNKvhJ:0000000000000000000000000000000000000002409 X-Hashcash: 1:20:140109:js1304@gmail.com::N32SsdDCUpDrTuq3:02k+M X-Hashcash: 1:20:140109:riel@redhat.com::HZXlzmTPEcxku6Ku:002Wc4 X-Hashcash: 1:20:140109:minchan@kernel.org::z8qSBk51A3Uqfbpi:000000000000000000000000000000000000000000034JH X-Hashcash: 1:20:140109:mhocko@suse.cz::Nk2QnXsPmLFPTewt:0003O9D X-Hashcash: 1:20:140109:linux-kernel@vger.kernel.org::+PabEAW+4ogx9x8L:0000000000000000000000000000000004LTL X-Hashcash: 1:20:140109:kirill.shutemov@linux.intel.com::eQt10G0Z0sk0Tjxw:0000000000000000000000000000005R4f X-Hashcash: 1:20:140109:yongjun_wei@trendmicro.com.cn::Arv6jTCJVrC1SGnT:000000000000000000000000000000009HVg X-Hashcash: 1:20:140109:ak@linux.intel.com::VhKHBdFRnplJ870G:0000000000000000000000000000000000000000000CuZw X-Hashcash: 1:20:140109:linux-mm@kvack.org::ZaNIdRj3WOZFZ3Dy:000000000000000000000000000000000000000000099p3 Date: Thu, 09 Jan 2014 10:19:44 +0100 Message-ID: MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On Thu, Jan 09 2014, Joonsoo Kim wrote: > To maintain freelist migratetype information on buddy pages, migratetype > should be set again whenever the page order is changed. set_page_order() > is the best place to do, because it is called whenever the page order is > changed, so this patch adds set_buddy_migratetype() to set_page_order(). > > And this patch makes set/get_buddy_migratetype() only enabled if it is > really needed, because it has some overhead. > > Signed-off-by: Joonsoo Kim Acked-by: Michal Nazarewicz --=20 Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=3D./ `o ..o | Computer Science, Micha=C5=82 =E2=80=9Cmina86=E2=80=9D Nazarewicz = (o o) ooo +------ooO--(_)--Ooo-- --=-=-= Content-Type: multipart/signed; boundary="==-=-="; micalg=pgp-sha1; protocol="application/pgp-signature" --==-=-= Content-Type: text/plain --==-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAEBAgAGBQJSzmmwAAoJECBgQBJQdR/01c4P/Ane1F6g9Fyos/HioQHFOTyT qElpv05RKxm+fQ6zBraYsXxqtybmVeaxm9CLV3ara5lgyFbRlsWYuLGNeVPrs7P9 4GzdiK1Cd16f6F7ljLQ3SdZO0JgumR0hItG1eV5pR32XGmgZkTPJTfAKBtDNnsO+ QDW6WqNL4GAK5k5m9PGpj9h0RAdQK/FhiiK00rjiPkCm+tqsHw4rJrBusOwUKPrv rRSsLRUTPhFLXM6EEL6+BrrdZ6ONjCci9Gq6PImIElz2+QTkNg5qcEMHeIE7phLQ n0LKZ4ojcdTzfRE5vu3w9iCzl8LLlww48HgRcru0faitpNcrs3cVU/h/i4kJ1YWM gWx2l+qwi30C5Rxlx6Kg9wJq/rBw+ZZSe/HE3ndbsL55JyQhJFSDkD0JR4OSbJ/d nLNJPsU3u0X5stHeDSfNakc2S/drDvNsR0JOWtLmme2ruUBjz2MrYNWqGDAaYcNf RkEpln08lsKrNpOdHZK9bUdzVxnADW3nZaJGYu0s1ZNgfg7Ug/CqGg0Mr+uSznW/ YZSeruDxaMFGlckkkwIkYc7IKRz6/wh3jQ2YxPepPOEw5a6uUxIPXoM19EcsKTh+ KL7bEp96FhnH1Us3N/cYLqacRlsIARQVXYx7ydLzsRB5UiKnVu/3pDgJDoGVd29c ozr6HnxHAPC2a7AaX+js =CZkg -----END PGP SIGNATURE----- --==-=-=-- --=-=-=-- From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757277AbaAIJXd (ORCPT ); Thu, 9 Jan 2014 04:23:33 -0500 Received: from mail-wg0-f45.google.com ([74.125.82.45]:61121 "EHLO mail-wg0-f45.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753033AbaAIJWM (ORCPT ); Thu, 9 Jan 2014 04:22:12 -0500 From: Michal Nazarewicz To: Joonsoo Kim , Andrew Morton Cc: "Kirill A. Shutemov" , Rik van Riel , Jiang Liu , Mel Gorman , Cody P Schafer , Johannes Weiner , Michal Hocko , Minchan Kim , Andi Kleen , Wei Yongjun , Tang Chen , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Joonsoo Kim , Joonsoo Kim Subject: Re: [PATCH 7/7] mm/page_alloc: don't merge MIGRATE_(CMA|ISOLATE) pages on buddy In-Reply-To: <1389251087-10224-8-git-send-email-iamjoonsoo.kim@lge.com> Organization: http://mina86.com/ References: <1389251087-10224-1-git-send-email-iamjoonsoo.kim@lge.com> <1389251087-10224-8-git-send-email-iamjoonsoo.kim@lge.com> User-Agent: Notmuch/0.17+15~gb65ca8e (http://notmuchmail.org) Emacs/24.3.50.1 (x86_64-unknown-linux-gnu) X-Face: PbkBB1w#)bOqd`iCe"Ds{e+!C7`pkC9a|f)Qo^BMQvy\q5x3?vDQJeN(DS?|-^$uMti[3D*#^_Ts"pU$jBQLq~Ud6iNwAw_r_o_4]|JO?]}P_}Nc&"p#D(ZgUb4uCNPe7~a[DbPG0T~!&c.y$Ur,=N4RT>]dNpd;KFrfMCylc}gc??'U2j,!8%xdD Face: iVBORw0KGgoAAAANSUhEUgAAADAAAAAwBAMAAAClLOS0AAAAJFBMVEWbfGlUPDDHgE57V0jUupKjgIObY0PLrom9mH4dFRK4gmjPs41MxjOgAAACQElEQVQ4jW3TMWvbQBQHcBk1xE6WyALX1069oZBMlq+ouUwpEQQ6uRjttkWP4CmBgGM0BQLBdPFZYPsyFUo6uEtKDQ7oy/U96XR2Ux8ehH/89Z6enqxBcS7Lg81jmSuujrfCZcLI/TYYvbGj+jbgFpHJ/bqQAUISj8iLyu4LuFHJTosxsucO4jSDNE0Hq3hwK/ceQ5sx97b8LcUDsILfk+ovHkOIsMbBfg43VuQ5Ln9YAGCkUdKJoXR9EclFBhixy3EGVz1K6eEkhxCAkeMMnqoAhAKwhoUJkDrCqvbecaYINlFKSRS1i12VKH1XpUd4qxL876EkMcDvHj3s5RBajHHMlA5iK32e0C7VgG0RlzFPvoYHZLRmAC0BmNcBruhkE0KsMsbEc62ZwUJDxWUdMsMhVqovoT96i/DnX/ASvz/6hbCabELLk/6FF/8PNpPCGqcZTGFcBhhAaZZDbQPaAB3+KrWWy2XgbYDNIinkdWAFcCpraDE/knwe5DBqGmgzESl1p2E4MWAz0VUPgYYzmfWb9yS4vCvgsxJriNTHoIBz5YteBvg+VGISQWUqhMiByPIPpygeDBE6elD973xWwKkEiHZAHKjhuPsFnBuArrzxtakRcISv+XMIPl4aGBUJm8Emk7qBYU8IlgNEIpiJhk/No24jHwkKTFHDWfPniR4iw5vJaw2nzSjfq2zffcE/GDjRC2dn0J0XwPAbDL84TvaFCJEU4Oml9pRyEUhR3Cl2t01AoEjRbs0sYugp14/4X5n4pU4EHHnMAAAAAElFTkSuQmCC X-PGP: 50751FF4 X-PGP-FP: AC1F 5F5C D418 88F8 CC84 5858 2060 4012 5075 1FF4 X-Hashcash: 1:20:140109:linux-kernel@vger.kernel.org::RphoQIjoM9LV07R7:00000000000000000000000000000000001uQ X-Hashcash: 1:20:140109:yongjun_wei@trendmicro.com.cn::tM99tOQzUZnsyqFE:000000000000000000000000000000000alT X-Hashcash: 1:20:140109:iamjoonsoo.kim@lge.com::ntUPk1ED9BEr0jR0:0000000000000000000000000000000000000000ZlG X-Hashcash: 1:20:140109:kirill.shutemov@linux.intel.com::7S4QXEFQuWmEKaQn:0000000000000000000000000000000uVt X-Hashcash: 1:20:140109:mgorman@suse.de::K6Mi2fkhcTS57C41:001AxB X-Hashcash: 1:20:140109:tangchen@cn.fujitsu.com::7Bmj3k0WMeftAk0y:000000000000000000000000000000000000000wTZ X-Hashcash: 1:20:140109:linux-mm@kvack.org::DXFig+S22B+Met13:00000000000000000000000000000000000000000001MV5 X-Hashcash: 1:20:140109:mhocko@suse.cz::AVmyIBA53MtRjjKZ:0001dJ5 X-Hashcash: 1:20:140109:hannes@cmpxchg.org::bRNNbzh9RX3asE7l:000000000000000000000000000000000000000000028yE X-Hashcash: 1:20:140109:riel@redhat.com::MZArinu+/RqB/MTE:002ePB X-Hashcash: 1:20:140109:cody@linux.vnet.ibm.com::jF52nVIYGz15UdYm:000000000000000000000000000000000000002PRT X-Hashcash: 1:20:140109:jiang.liu@huawei.com::ddw91mKCrgvO5Bt0:000000000000000000000000000000000000000003WHz X-Hashcash: 1:20:140109:iamjoonsoo.kim@lge.com::l1gKuEYbzO9FhfBe:0000000000000000000000000000000000000005wFY X-Hashcash: 1:20:140109:js1304@gmail.com::m9dJbwiwe0CI8+c/:092Ja X-Hashcash: 1:20:140109:ak@linux.intel.com::7HhRCNIBMW5v05Cq:00000000000000000000000000000000000000000009QGo X-Hashcash: 1:20:140109:akpm@linux-foundation.org::thunFytmzktdiFdK:000000000000000000000000000000000000BvWc X-Hashcash: 1:20:140109:minchan@kernel.org::IEJyJ+tatijw71Sm:0000000000000000000000000000000000000000000Ef+u Date: Thu, 09 Jan 2014 10:22:02 +0100 Message-ID: MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On Thu, Jan 09 2014, Joonsoo Kim wrote: > If (MAX_ORDER-1) is greater than pageblock order, there is a possibility > to merge different migratetype pages and to be linked in unintended > freelist. > > While I test CMA, CMA pages are merged and linked into MOVABLE freelist > by above issue and then, the pages change their migratetype to UNMOVABLE = by > try_to_steal_freepages(). After that, CMA to this region always fail. > > To prevent this, we should not merge the page on MIGRATE_(CMA|ISOLATE) > freelist. This is strange. CMA regions are always multiplies of max-pages (or pageblocks whichever is larger), so MOVABLE free pages should never be inside of a CMA region. If what you're describing happens, it looks like an issue somewhere else. > Signed-off-by: Joonsoo Kim > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 2548b42..ea99cee 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -581,6 +581,15 @@ static inline void __free_one_page(struct page *page, > __mod_zone_freepage_state(zone, 1 << order, > migratetype); > } else { > + int buddy_mt =3D get_buddy_migratetype(buddy); > + > + /* We don't want to merge cma, isolate pages */ > + if (unlikely(order >=3D pageblock_order) && > + migratetype !=3D buddy_mt && > + (migratetype >=3D MIGRATE_PCPTYPES || > + buddy_mt >=3D MIGRATE_PCPTYPES)) { > + break; > + } > list_del(&buddy->lru); > zone->free_area[order].nr_free--; > rmv_page_order(buddy); > --=20 > 1.7.9.5 > --=20 Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=3D./ `o ..o | Computer Science, Micha=C5=82 =E2=80=9Cmina86=E2=80=9D Nazarewicz = (o o) ooo +------ooO--(_)--Ooo-- --=-=-= Content-Type: multipart/signed; boundary="==-=-="; micalg=pgp-sha1; protocol="application/pgp-signature" --==-=-= Content-Type: text/plain --==-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAEBAgAGBQJSzmo6AAoJECBgQBJQdR/0hXcP+gNcSKtRGUP6Z1/0L4u+rMIp oJqXjZ1M6kSzqPYeTEZhHqJLOMLYJfEmDAzUkP3xeVvnqv0HatHjcn5JDklkv9PT gOQ1sflANnIwIw930rVLQQM5s0QhR4gic+CnJ7Sc9YPadopn1l+JQHy/93ylXruU /+g23QCFS+uQoQZ6HqhJS2AXXworLMTi9IA/YA1PuMXLDpnlhLFh9tkeJeWIR+rX Frr7U35NeZtWyKbHSZttULJGFAtscD0mdHP79Bnqzosyqi92HyjSoIjzOCe4ptkM FMie0i9Rx/NiRVRNOzQrsI7ryr1RR/lXhbcmTYyvMfxBuzbXW3/r1gQEIuJvDpJ/ Us9zl2ayWpFvjgBE9m/4vawZO/+PGVsv74iVcL60KgEuftAPyYHqkYeAf8cI8WOh CgKpR6oyUOFp81kX0GeEJ2b5JJh+lOzmufg4Ow1eLgQWpBY/u02hQ/sLyEpHgqiu ZfgYBNP5horayy6VqIrnw1/oIBg2CUp31RQtJ5sB+AaGHTtd7cw1X8PblLWRJvsn ErdJKRJV1fe/bnwD3EEt5iI8Y9oCOB6mTI5pHWhdIunBEG0//J8qYpqk+U9jfB6Q BsLwO55NOyC5MzRmaXLcLGmmn+ENfPDrcugtmwqOK1SNKK2NOKN6Q0cGuu0Sa+8Z 2fcUX72K8i+6h1sDR1vt =vPh+ -----END PGP SIGNATURE----- --==-=-=-- --=-=-=-- From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756519AbaAIJ1g (ORCPT ); Thu, 9 Jan 2014 04:27:36 -0500 Received: from cantor2.suse.de ([195.135.220.15]:47955 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751556AbaAIJ1b (ORCPT ); Thu, 9 Jan 2014 04:27:31 -0500 Date: Thu, 9 Jan 2014 09:27:20 +0000 From: Mel Gorman To: Joonsoo Kim Cc: Andrew Morton , "Kirill A. Shutemov" , Rik van Riel , Jiang Liu , Cody P Schafer , Johannes Weiner , Michal Hocko , Minchan Kim , Michal Nazarewicz , Andi Kleen , Wei Yongjun , Tang Chen , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Joonsoo Kim Subject: Re: [PATCH 0/7] improve robustness on handling migratetype Message-ID: <20140109092720.GM27046@suse.de> References: <1389251087-10224-1-git-send-email-iamjoonsoo.kim@lge.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <1389251087-10224-1-git-send-email-iamjoonsoo.kim@lge.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jan 09, 2014 at 04:04:40PM +0900, Joonsoo Kim wrote: > Hello, > > I found some weaknesses on handling migratetype during code review and > testing CMA. > > First, we don't have any synchronization method on get/set pageblock > migratetype. When we change migratetype, we hold the zone lock. So > writer-writer race doesn't exist. But while someone changes migratetype, > others can get migratetype. This may introduce totally unintended value > as migratetype. Although I haven't heard of any problem report about > that, it is better to protect properly. > This is deliberate. The migratetypes for the majority of users are advisory and aimed for fragmentation avoidance. It was important that the cost of that be kept as low as possible and the general case is that migration types change very rarely. In many cases, the zone lock is held. In other cases, such as splitting free pages, the cost is simply not justified. I doubt there is any amount of data you could add in support that would justify hammering the free fast paths (which call get_pageblock_type). > Second, (get/set)_freepage_migrate isn't used properly. I guess that it > would be introduced for per cpu page(pcp) performance, but, it is also > used by memory isolation, now. For that case, the information isn't > enough to use, so we need to fix it. > > Third, there is the problem on buddy allocator. It doesn't consider > migratetype when merging buddy, so pages from cma or isolate region can > be moved to other migratetype freelist. It makes CMA failed over and over. > To prevent it, the buddy allocator should consider migratetype if > CMA/ISOLATE is enabled. Without loioing at the patches, this is likely to add some cost to the page free fast path -- heavy cost if it's a pageblock lookup and lighter cost if you are using cached page information which is potentially stale. Why not force CMA regions to be aligned on MAX_ORDER_NR_PAGES boundary instead to avoid any possibility of merging issues? > This patchset is aimed at fixing these problems and based on v3.13-rc7. > > mm/page_alloc: synchronize get/set pageblock cost with no justification. > mm/cma: fix cma free page accounting sounds like it would be a fix but unrelated to the leader and should be seperated out on its own > mm/page_alloc: move set_freepage_migratetype() to better place Very vague. If this does something useful then it could do with a better subject. > mm/isolation: remove invalid check condition Looks harmless. > mm/page_alloc: separate interface to set/get migratetype of freepage > mm/page_alloc: store freelist migratetype to the page on buddy > properly Potentially sounds useful > mm/page_alloc: don't merge MIGRATE_(CMA|ISOLATE) pages on buddy > Sounds unnecessary if CMA regions were MAX_ORDER_NR_PAGES aligned and then the free paths would be unaffected for everybody. I didn't look at the patches because it felt like cost without any supporting justification for the patches. Superficially it looks like patch 1 needs to go away and the last patch could be done without affected !CMA users. The rest are potentially useful but there should have been some supporting data on how it helps CMA with some backup showing that the page allocation paths are not impacted as a result. -- Mel Gorman SUSE Labs From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756777AbaAIOFa (ORCPT ); Thu, 9 Jan 2014 09:05:30 -0500 Received: from mail-we0-f172.google.com ([74.125.82.172]:35993 "EHLO mail-we0-f172.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751530AbaAIOFG (ORCPT ); Thu, 9 Jan 2014 09:05:06 -0500 MIME-Version: 1.0 In-Reply-To: References: <1389251087-10224-1-git-send-email-iamjoonsoo.kim@lge.com> Date: Thu, 9 Jan 2014 23:05:04 +0900 Message-ID: Subject: Re: [PATCH 0/7] improve robustness on handling migratetype From: Joonsoo Kim To: Michal Nazarewicz Cc: Joonsoo Kim , Andrew Morton , "Kirill A. Shutemov" , Rik van Riel , Jiang Liu , Mel Gorman , Cody P Schafer , Johannes Weiner , Michal Hocko , Minchan Kim , Andi Kleen , Wei Yongjun , Tang Chen , Linux Memory Management List , LKML Content-Type: text/plain; charset=ISO-8859-1 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 2014/1/9 Michal Nazarewicz : > On Thu, Jan 09 2014, Joonsoo Kim wrote: >> Third, there is the problem on buddy allocator. It doesn't consider >> migratetype when merging buddy, so pages from cma or isolate region can >> be moved to other migratetype freelist. It makes CMA failed over and over. >> To prevent it, the buddy allocator should consider migratetype if >> CMA/ISOLATE is enabled. > > There should never be situation where a CMA page shares a pageblock (or > a max-order page) with a non-CMA page though, so this should never be an > issue. Right... It never happens. When I ported CMA region reservation code to my own code for testing, I made a mistake. Sorry for noise. Thanks. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757582AbaAIVKj (ORCPT ); Thu, 9 Jan 2014 16:10:39 -0500 Received: from smtp.codeaurora.org ([198.145.11.231]:33943 "EHLO smtp.codeaurora.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756190AbaAIVKc (ORCPT ); Thu, 9 Jan 2014 16:10:32 -0500 Message-ID: <52CF1045.30903@codeaurora.org> Date: Thu, 09 Jan 2014 13:10:29 -0800 From: Laura Abbott User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.2.0 MIME-Version: 1.0 To: Joonsoo Kim , Andrew Morton CC: "Kirill A. Shutemov" , Rik van Riel , Jiang Liu , Mel Gorman , Cody P Schafer , Johannes Weiner , Michal Hocko , Minchan Kim , Michal Nazarewicz , Andi Kleen , Wei Yongjun , Tang Chen , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Joonsoo Kim Subject: Re: [PATCH 2/7] mm/cma: fix cma free page accounting References: <1389251087-10224-1-git-send-email-iamjoonsoo.kim@lge.com> <1389251087-10224-3-git-send-email-iamjoonsoo.kim@lge.com> In-Reply-To: <1389251087-10224-3-git-send-email-iamjoonsoo.kim@lge.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 1/8/2014 11:04 PM, Joonsoo Kim wrote: > Cma pages can be allocated by not only order 0 request but also high order > request. So, we should consider to account free cma page in the both > places. > > Signed-off-by: Joonsoo Kim > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index b36aa5a..1489c301 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -1091,6 +1091,12 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype) > start_migratetype, > migratetype); > > + /* CMA pages cannot be stolen */ > + if (is_migrate_cma(migratetype)) { > + __mod_zone_page_state(zone, > + NR_FREE_CMA_PAGES, -(1 << order)); > + } > + > /* Remove the page from the freelists */ > list_del(&page->lru); > rmv_page_order(page); > @@ -1175,9 +1181,6 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order, > } > set_freepage_migratetype(page, mt); > list = &page->lru; > - if (is_migrate_cma(mt)) > - __mod_zone_page_state(zone, NR_FREE_CMA_PAGES, > - -(1 << order)); > } > __mod_zone_page_state(zone, NR_FREE_PAGES, -(i << order)); > spin_unlock(&zone->lock); > Wouldn't this result in double counting? in the buffered_rmqueue non zero ordered request we call __mod_zone_freepage_state which already accounts for CMA pages if the migrate type is CMA so it seems like we would get hit twice: buffered_rmqueue __rmqueue __rmqueue_fallback decrement __mod_zone_freepage_state decrement Thanks, Laura -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751469AbaAJIsf (ORCPT ); Fri, 10 Jan 2014 03:48:35 -0500 Received: from lgeamrelo02.lge.com ([156.147.1.126]:49558 "EHLO LGEAMRELO02.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750793AbaAJIsd (ORCPT ); Fri, 10 Jan 2014 03:48:33 -0500 X-AuditID: 9c93017e-b7ba2ae000003516-d6-52cfb3deb66e Date: Fri, 10 Jan 2014 17:48:55 +0900 From: Joonsoo Kim To: Mel Gorman Cc: Andrew Morton , "Kirill A. Shutemov" , Rik van Riel , Jiang Liu , Cody P Schafer , Johannes Weiner , Michal Hocko , Minchan Kim , Michal Nazarewicz , Andi Kleen , Wei Yongjun , Tang Chen , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 0/7] improve robustness on handling migratetype Message-ID: <20140110084854.GA22058@lge.com> References: <1389251087-10224-1-git-send-email-iamjoonsoo.kim@lge.com> <20140109092720.GM27046@suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20140109092720.GM27046@suse.de> User-Agent: Mutt/1.5.21 (2010-09-15) X-Brightmail-Tracker: AAAAAA== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jan 09, 2014 at 09:27:20AM +0000, Mel Gorman wrote: > On Thu, Jan 09, 2014 at 04:04:40PM +0900, Joonsoo Kim wrote: > > Hello, > > > > I found some weaknesses on handling migratetype during code review and > > testing CMA. > > > > First, we don't have any synchronization method on get/set pageblock > > migratetype. When we change migratetype, we hold the zone lock. So > > writer-writer race doesn't exist. But while someone changes migratetype, > > others can get migratetype. This may introduce totally unintended value > > as migratetype. Although I haven't heard of any problem report about > > that, it is better to protect properly. > > > > This is deliberate. The migratetypes for the majority of users are advisory > and aimed for fragmentation avoidance. It was important that the cost of > that be kept as low as possible and the general case is that migration types > change very rarely. In many cases, the zone lock is held. In other cases, > such as splitting free pages, the cost is simply not justified. > > I doubt there is any amount of data you could add in support that would > justify hammering the free fast paths (which call get_pageblock_type). Hello, Mel. There is a possibility that we can get unintended value such as 6 as migratetype if reader-writer (get/set pageblock_migratetype) race happends. It can be possible, because we read the value without any synchronization method. And this migratetype, 6, has no place in buddy freelist, so array index overrun can be possible and the system can break, although I haven't heard that it occurs. I think that my solution is too expensive. However, I think that we need solution. aren't we? Do you have any better idea? > > > Second, (get/set)_freepage_migrate isn't used properly. I guess that it > > would be introduced for per cpu page(pcp) performance, but, it is also > > used by memory isolation, now. For that case, the information isn't > > enough to use, so we need to fix it. > > > > Third, there is the problem on buddy allocator. It doesn't consider > > migratetype when merging buddy, so pages from cma or isolate region can > > be moved to other migratetype freelist. It makes CMA failed over and over. > > To prevent it, the buddy allocator should consider migratetype if > > CMA/ISOLATE is enabled. > > Without loioing at the patches, this is likely to add some cost to the > page free fast path -- heavy cost if it's a pageblock lookup and lighter > cost if you are using cached page information which is potentially stale. > Why not force CMA regions to be aligned on MAX_ORDER_NR_PAGES boundary > instead to avoid any possibility of merging issues? > There was my mistake. CMA region is aligned on MAX_ORDER_NR_PAGES, so it can't happed. Sorry for noise. > > This patchset is aimed at fixing these problems and based on v3.13-rc7. > > > > mm/page_alloc: synchronize get/set pageblock > > cost with no justification. > > > mm/cma: fix cma free page accounting > > sounds like it would be a fix but unrelated to the leader and should be > seperated out on its own Yes, it is not related to this topic and it is wrong patch as Laura pointed out, so I will drop it. > > mm/page_alloc: move set_freepage_migratetype() to better place > > Very vague. If this does something useful then it could do with a better > subject. Okay. > > mm/isolation: remove invalid check condition > > Looks harmless. > > > mm/page_alloc: separate interface to set/get migratetype of freepage > > mm/page_alloc: store freelist migratetype to the page on buddy > > properly > > Potentially sounds useful > I made these two patches for last patch to reduce performance effect of it. In case of dropping last patch, it is better to remove the last callsite using freelist migratetype to know the buddy freelist type. I will do respin. Thanks. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751985AbaAJItq (ORCPT ); Fri, 10 Jan 2014 03:49:46 -0500 Received: from LGEMRELSE7Q.lge.com ([156.147.1.151]:42914 "EHLO LGEMRELSE7Q.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751572AbaAJIto (ORCPT ); Fri, 10 Jan 2014 03:49:44 -0500 X-AuditID: 9c930197-b7c20ae000001031-55-52cfb4259b34 Date: Fri, 10 Jan 2014 17:50:05 +0900 From: Joonsoo Kim To: Laura Abbott Cc: Andrew Morton , "Kirill A. Shutemov" , Rik van Riel , Jiang Liu , Mel Gorman , Cody P Schafer , Johannes Weiner , Michal Hocko , Minchan Kim , Michal Nazarewicz , Andi Kleen , Wei Yongjun , Tang Chen , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 2/7] mm/cma: fix cma free page accounting Message-ID: <20140110085005.GB22058@lge.com> References: <1389251087-10224-1-git-send-email-iamjoonsoo.kim@lge.com> <1389251087-10224-3-git-send-email-iamjoonsoo.kim@lge.com> <52CF1045.30903@codeaurora.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <52CF1045.30903@codeaurora.org> User-Agent: Mutt/1.5.21 (2010-09-15) X-Brightmail-Tracker: AAAAAA== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jan 09, 2014 at 01:10:29PM -0800, Laura Abbott wrote: > On 1/8/2014 11:04 PM, Joonsoo Kim wrote: > >Cma pages can be allocated by not only order 0 request but also high order > >request. So, we should consider to account free cma page in the both > >places. > > > >Signed-off-by: Joonsoo Kim > > > >diff --git a/mm/page_alloc.c b/mm/page_alloc.c > >index b36aa5a..1489c301 100644 > >--- a/mm/page_alloc.c > >+++ b/mm/page_alloc.c > >@@ -1091,6 +1091,12 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype) > > start_migratetype, > > migratetype); > > > >+ /* CMA pages cannot be stolen */ > >+ if (is_migrate_cma(migratetype)) { > >+ __mod_zone_page_state(zone, > >+ NR_FREE_CMA_PAGES, -(1 << order)); > >+ } > >+ > > /* Remove the page from the freelists */ > > list_del(&page->lru); > > rmv_page_order(page); > >@@ -1175,9 +1181,6 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order, > > } > > set_freepage_migratetype(page, mt); > > list = &page->lru; > >- if (is_migrate_cma(mt)) > >- __mod_zone_page_state(zone, NR_FREE_CMA_PAGES, > >- -(1 << order)); > > } > > __mod_zone_page_state(zone, NR_FREE_PAGES, -(i << order)); > > spin_unlock(&zone->lock); > > > > Wouldn't this result in double counting? in the buffered_rmqueue non > zero ordered request we call __mod_zone_freepage_state which already > accounts for CMA pages if the migrate type is CMA so it seems like > we would get hit twice: > > buffered_rmqueue > __rmqueue > __rmqueue_fallback > decrement > __mod_zone_freepage_state > decrement > Hello, Laura. You are right. I missed it. I will drop this patch. Thanks. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753254AbaAJJso (ORCPT ); Fri, 10 Jan 2014 04:48:44 -0500 Received: from cantor2.suse.de ([195.135.220.15]:41397 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751216AbaAJJsk (ORCPT ); Fri, 10 Jan 2014 04:48:40 -0500 Date: Fri, 10 Jan 2014 09:48:34 +0000 From: Mel Gorman To: Joonsoo Kim Cc: Andrew Morton , "Kirill A. Shutemov" , Rik van Riel , Jiang Liu , Cody P Schafer , Johannes Weiner , Michal Hocko , Minchan Kim , Michal Nazarewicz , Andi Kleen , Wei Yongjun , Tang Chen , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 0/7] improve robustness on handling migratetype Message-ID: <20140110094834.GV27046@suse.de> References: <1389251087-10224-1-git-send-email-iamjoonsoo.kim@lge.com> <20140109092720.GM27046@suse.de> <20140110084854.GA22058@lge.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <20140110084854.GA22058@lge.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jan 10, 2014 at 05:48:55PM +0900, Joonsoo Kim wrote: > On Thu, Jan 09, 2014 at 09:27:20AM +0000, Mel Gorman wrote: > > On Thu, Jan 09, 2014 at 04:04:40PM +0900, Joonsoo Kim wrote: > > > Hello, > > > > > > I found some weaknesses on handling migratetype during code review and > > > testing CMA. > > > > > > First, we don't have any synchronization method on get/set pageblock > > > migratetype. When we change migratetype, we hold the zone lock. So > > > writer-writer race doesn't exist. But while someone changes migratetype, > > > others can get migratetype. This may introduce totally unintended value > > > as migratetype. Although I haven't heard of any problem report about > > > that, it is better to protect properly. > > > > > > > This is deliberate. The migratetypes for the majority of users are advisory > > and aimed for fragmentation avoidance. It was important that the cost of > > that be kept as low as possible and the general case is that migration types > > change very rarely. In many cases, the zone lock is held. In other cases, > > such as splitting free pages, the cost is simply not justified. > > > > I doubt there is any amount of data you could add in support that would > > justify hammering the free fast paths (which call get_pageblock_type). > > Hello, Mel. > > There is a possibility that we can get unintended value such as 6 as migratetype > if reader-writer (get/set pageblock_migratetype) race happends. It can be > possible, because we read the value without any synchronization method. And > this migratetype, 6, has no place in buddy freelist, so array index overrun can > be possible and the system can break, although I haven't heard that it occurs. > > I think that my solution is too expensive. However, I think that we need > solution. aren't we? Do you have any better idea? > It's not something I have ever heard or seen of occurring but if you've identified that it's a real possibility then split get_pageblock_migratetype into locked and unlocked versions. Ensure that calls to set_pageblock_migratetype is always under zone->lock and get_pageblock_migratetype is also under zone->lock which both should be true in the majority of cases. Use the unlocked version otherwise but instead of synchronoing, check if it's returning >= MIGRATE_TYPES and return MIGRATE_MOVABLE in the unlikely event of a race. This will avoid harming the fast paths for the majority of users and limit the damage if a MIGRATE_CMA region is accidentally treated as MIGRATe_MOVABLE -- Mel Gorman SUSE Labs From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751270AbaAMB4a (ORCPT ); Sun, 12 Jan 2014 20:56:30 -0500 Received: from LGEMRELSE7Q.lge.com ([156.147.1.151]:54080 "EHLO LGEMRELSE7Q.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751110AbaAMB42 (ORCPT ); Sun, 12 Jan 2014 20:56:28 -0500 X-AuditID: 9c930197-b7c20ae000001031-a9-52d347c9494c Date: Mon, 13 Jan 2014 10:57:00 +0900 From: Joonsoo Kim To: Mel Gorman Cc: Andrew Morton , "Kirill A. Shutemov" , Rik van Riel , Jiang Liu , Cody P Schafer , Johannes Weiner , Michal Hocko , Minchan Kim , Michal Nazarewicz , Andi Kleen , Wei Yongjun , Tang Chen , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 0/7] improve robustness on handling migratetype Message-ID: <20140113015659.GA28140@lge.com> References: <1389251087-10224-1-git-send-email-iamjoonsoo.kim@lge.com> <20140109092720.GM27046@suse.de> <20140110084854.GA22058@lge.com> <20140110094834.GV27046@suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20140110094834.GV27046@suse.de> User-Agent: Mutt/1.5.21 (2010-09-15) X-Brightmail-Tracker: AAAAAA== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jan 10, 2014 at 09:48:34AM +0000, Mel Gorman wrote: > On Fri, Jan 10, 2014 at 05:48:55PM +0900, Joonsoo Kim wrote: > > On Thu, Jan 09, 2014 at 09:27:20AM +0000, Mel Gorman wrote: > > > On Thu, Jan 09, 2014 at 04:04:40PM +0900, Joonsoo Kim wrote: > > > > Hello, > > > > > > > > I found some weaknesses on handling migratetype during code review and > > > > testing CMA. > > > > > > > > First, we don't have any synchronization method on get/set pageblock > > > > migratetype. When we change migratetype, we hold the zone lock. So > > > > writer-writer race doesn't exist. But while someone changes migratetype, > > > > others can get migratetype. This may introduce totally unintended value > > > > as migratetype. Although I haven't heard of any problem report about > > > > that, it is better to protect properly. > > > > > > > > > > This is deliberate. The migratetypes for the majority of users are advisory > > > and aimed for fragmentation avoidance. It was important that the cost of > > > that be kept as low as possible and the general case is that migration types > > > change very rarely. In many cases, the zone lock is held. In other cases, > > > such as splitting free pages, the cost is simply not justified. > > > > > > I doubt there is any amount of data you could add in support that would > > > justify hammering the free fast paths (which call get_pageblock_type). > > > > Hello, Mel. > > > > There is a possibility that we can get unintended value such as 6 as migratetype > > if reader-writer (get/set pageblock_migratetype) race happends. It can be > > possible, because we read the value without any synchronization method. And > > this migratetype, 6, has no place in buddy freelist, so array index overrun can > > be possible and the system can break, although I haven't heard that it occurs. > > > > I think that my solution is too expensive. However, I think that we need > > solution. aren't we? Do you have any better idea? > > > > It's not something I have ever heard or seen of occurring but > if you've identified that it's a real possibility then split > get_pageblock_migratetype into locked and unlocked versions. Ensure > that calls to set_pageblock_migratetype is always under zone->lock and > get_pageblock_migratetype is also under zone->lock which both should be > true in the majority of cases. Use the unlocked version otherwise but > instead of synchronoing, check if it's returning >= MIGRATE_TYPES and > return MIGRATE_MOVABLE in the unlikely event of a race. This will avoid > harming the fast paths for the majority of users and limit the damage if > a MIGRATE_CMA region is accidentally treated as MIGRATe_MOVABLE Okay. I will re-investigate it and if I have indentified that it's a real possiblity, I will re-make this patch according to your advice. Thanks for comment! From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753265AbaA2Qwu (ORCPT ); Wed, 29 Jan 2014 11:52:50 -0500 Received: from cantor2.suse.de ([195.135.220.15]:55264 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751867AbaA2Qws (ORCPT ); Wed, 29 Jan 2014 11:52:48 -0500 Message-ID: <52E931D9.8050002@suse.cz> Date: Wed, 29 Jan 2014 17:52:41 +0100 From: Vlastimil Babka User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.2.0 MIME-Version: 1.0 To: Joonsoo Kim , Mel Gorman CC: Andrew Morton , "Kirill A. Shutemov" , Rik van Riel , Jiang Liu , Cody P Schafer , Johannes Weiner , Michal Hocko , Minchan Kim , Michal Nazarewicz , Andi Kleen , Wei Yongjun , Tang Chen , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 0/7] improve robustness on handling migratetype References: <1389251087-10224-1-git-send-email-iamjoonsoo.kim@lge.com> <20140109092720.GM27046@suse.de> <20140110084854.GA22058@lge.com> In-Reply-To: <20140110084854.GA22058@lge.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 01/10/2014 09:48 AM, Joonsoo Kim wrote: > On Thu, Jan 09, 2014 at 09:27:20AM +0000, Mel Gorman wrote: >> On Thu, Jan 09, 2014 at 04:04:40PM +0900, Joonsoo Kim wrote: >>> Hello, >>> >>> I found some weaknesses on handling migratetype during code review and >>> testing CMA. >>> >>> First, we don't have any synchronization method on get/set pageblock >>> migratetype. When we change migratetype, we hold the zone lock. So >>> writer-writer race doesn't exist. But while someone changes migratetype, >>> others can get migratetype. This may introduce totally unintended value >>> as migratetype. Although I haven't heard of any problem report about >>> that, it is better to protect properly. >>> >> >> This is deliberate. The migratetypes for the majority of users are advisory >> and aimed for fragmentation avoidance. It was important that the cost of >> that be kept as low as possible and the general case is that migration types >> change very rarely. In many cases, the zone lock is held. In other cases, >> such as splitting free pages, the cost is simply not justified. >> >> I doubt there is any amount of data you could add in support that would >> justify hammering the free fast paths (which call get_pageblock_type). > > Hello, Mel. > > There is a possibility that we can get unintended value such as 6 as migratetype > if reader-writer (get/set pageblock_migratetype) race happends. It can be > possible, because we read the value without any synchronization method. And > this migratetype, 6, has no place in buddy freelist, so array index overrun can > be possible and the system can break, although I haven't heard that it occurs. Hello, it seems this can indeed happen. I'm working on memory compaction improvements and in a prototype patch, I'm basically adding calls of start_isolate_page_range() undo_isolate_page_range() some functions under compact_zone(). With this I've seen occurrences of NULL pointers in move_freepages(), free_one_page() in places where free_list[migratetype] is manipulated by e.g. list_move(). That lead me to question the value of migratetype and I found this thread. Adding some debugging in get_pageblock_migratetype() and voila, I get a value of 6 being read. So is it just my patch adding a dangerous situation, or does it exist in mainline as well? By looking at free_one_page(), it uses zone->lock, but get_pageblock_migratetype() is called by its callers (free_hot_cold_page() or __free_pages_ok()) outside of the lock. This determined migratetype is then used under free_one_page() to access a free_list. It seems that this could race with set_pageblock_migratetype() called from try_to_steal_freepages() (despite the latter being properly locked). There are also other callers but those seem to be either limited to initialization and isolation, which should be rare (?). However, try_to_steal_freepages can occur repeatedly. So I assume that the race happens but never manifests as a fatal error as long as MIGRATE_UNMOVABLE, MIGRATE_RECLAIMABLE and MIGRATE_MOVABLE values are used. Only MIGRATE_CMA and MIGRATE_ISOLATE have values with bit 4 enabled and can thus result in invalid values due to non-atomic access. Does that make sense to you and should we thus proceed with patching this race? Vlastimil > I think that my solution is too expensive. However, I think that we need > solution. aren't we? Do you have any better idea? > >> >>> Second, (get/set)_freepage_migrate isn't used properly. I guess that it >>> would be introduced for per cpu page(pcp) performance, but, it is also >>> used by memory isolation, now. For that case, the information isn't >>> enough to use, so we need to fix it. >>> >>> Third, there is the problem on buddy allocator. It doesn't consider >>> migratetype when merging buddy, so pages from cma or isolate region can >>> be moved to other migratetype freelist. It makes CMA failed over and over. >>> To prevent it, the buddy allocator should consider migratetype if >>> CMA/ISOLATE is enabled. >> >> Without loioing at the patches, this is likely to add some cost to the >> page free fast path -- heavy cost if it's a pageblock lookup and lighter >> cost if you are using cached page information which is potentially stale. >> Why not force CMA regions to be aligned on MAX_ORDER_NR_PAGES boundary >> instead to avoid any possibility of merging issues? >> > > There was my mistake. CMA region is aligned on MAX_ORDER_NR_PAGES, so it > can't happed. Sorry for noise. > >>> This patchset is aimed at fixing these problems and based on v3.13-rc7. >>> >>> mm/page_alloc: synchronize get/set pageblock >> >> cost with no justification. >> >>> mm/cma: fix cma free page accounting >> >> sounds like it would be a fix but unrelated to the leader and should be >> seperated out on its own > > Yes, it is not related to this topic and it is wrong patch as Laura > pointed out, so I will drop it. > >>> mm/page_alloc: move set_freepage_migratetype() to better place >> >> Very vague. If this does something useful then it could do with a better >> subject. > > Okay. > >>> mm/isolation: remove invalid check condition >> >> Looks harmless. >> >>> mm/page_alloc: separate interface to set/get migratetype of freepage >>> mm/page_alloc: store freelist migratetype to the page on buddy >>> properly >> >> Potentially sounds useful >> > > I made these two patches for last patch to reduce performance effect of it. > In case of dropping last patch, it is better to remove the last callsite > using freelist migratetype to know the buddy freelist type. I will do respin. > > Thanks. > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932665AbaAaPjT (ORCPT ); Fri, 31 Jan 2014 10:39:19 -0500 Received: from cantor2.suse.de ([195.135.220.15]:46809 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754006AbaAaPjR (ORCPT ); Fri, 31 Jan 2014 10:39:17 -0500 Date: Fri, 31 Jan 2014 15:39:08 +0000 From: Mel Gorman To: Vlastimil Babka Cc: Joonsoo Kim , Andrew Morton , "Kirill A. Shutemov" , Rik van Riel , Jiang Liu , Cody P Schafer , Johannes Weiner , Michal Hocko , Minchan Kim , Michal Nazarewicz , Andi Kleen , Wei Yongjun , Tang Chen , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 0/7] improve robustness on handling migratetype Message-ID: <20140131153908.GA14581@suse.de> References: <1389251087-10224-1-git-send-email-iamjoonsoo.kim@lge.com> <20140109092720.GM27046@suse.de> <20140110084854.GA22058@lge.com> <52E931D9.8050002@suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <52E931D9.8050002@suse.cz> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jan 29, 2014 at 05:52:41PM +0100, Vlastimil Babka wrote: > On 01/10/2014 09:48 AM, Joonsoo Kim wrote: > >On Thu, Jan 09, 2014 at 09:27:20AM +0000, Mel Gorman wrote: > >>On Thu, Jan 09, 2014 at 04:04:40PM +0900, Joonsoo Kim wrote: > >>>Hello, > >>> > >>>I found some weaknesses on handling migratetype during code review and > >>>testing CMA. > >>> > >>>First, we don't have any synchronization method on get/set pageblock > >>>migratetype. When we change migratetype, we hold the zone lock. So > >>>writer-writer race doesn't exist. But while someone changes migratetype, > >>>others can get migratetype. This may introduce totally unintended value > >>>as migratetype. Although I haven't heard of any problem report about > >>>that, it is better to protect properly. > >>> > >> > >>This is deliberate. The migratetypes for the majority of users are advisory > >>and aimed for fragmentation avoidance. It was important that the cost of > >>that be kept as low as possible and the general case is that migration types > >>change very rarely. In many cases, the zone lock is held. In other cases, > >>such as splitting free pages, the cost is simply not justified. > >> > >>I doubt there is any amount of data you could add in support that would > >>justify hammering the free fast paths (which call get_pageblock_type). > > > >Hello, Mel. > > > >There is a possibility that we can get unintended value such as 6 as migratetype > >if reader-writer (get/set pageblock_migratetype) race happends. It can be > >possible, because we read the value without any synchronization method. And > >this migratetype, 6, has no place in buddy freelist, so array index overrun can > >be possible and the system can break, although I haven't heard that it occurs. > > Hello, > > it seems this can indeed happen. I'm working on memory compaction > improvements and in a prototype patch, I'm basically adding calls of > start_isolate_page_range() undo_isolate_page_range() some functions > under compact_zone(). With this I've seen occurrences of NULL > pointers in move_freepages(), free_one_page() in places where > free_list[migratetype] is manipulated by e.g. list_move(). That lead > me to question the value of migratetype and I found this thread. > Adding some debugging in get_pageblock_migratetype() and voila, I > get a value of 6 being read. > > So is it just my patch adding a dangerous situation, or does it exist in > mainline as well? By looking at free_one_page(), it uses zone->lock, but > get_pageblock_migratetype() is called by its callers > (free_hot_cold_page() or __free_pages_ok()) outside of the lock. > This determined migratetype is then used under free_one_page() to > access a free_list. > > It seems that this could race with set_pageblock_migratetype() > called from try_to_steal_freepages() (despite the latter being > properly locked). There are also other callers but those seem to be > either limited to initialization and isolation, which should be rare > (?). > However, try_to_steal_freepages can occur repeatedly. > So I assume that the race happens but never manifests as a fatal > error as long as MIGRATE_UNMOVABLE, MIGRATE_RECLAIMABLE and > MIGRATE_MOVABLE > values are used. Only MIGRATE_CMA and MIGRATE_ISOLATE have values > with bit 4 enabled and can thus result in invalid values due to > non-atomic access. > > Does that make sense to you and should we thus proceed with patching > this race? > If you have direct evidence then it is indeed a problem. the key would be to avoid taking the zone->lock just to stabilise this and instead modify when get_pageblock_pagetype is called to make it safe. Looking at the callers of get_pageblock_pagetype it would appear that 1. __free_pages_ok's call to get_pageblock_pagetype can move into free_one_page() under the zone lock as long as you also move the set_freepage_migratetype call. The migratetype will be read twice by the free_hot_cold_page->free_one_page call but that's ok because you have established that it is necessary 2. rmqueue_bulk calls under zone->lock 3. free_hot_cold_page cannot take zone->lock to stabilise the migratetype read but if it gets a bad read due to a race, it enters the slow path. Force it to call free_one_page() there and take the lock in the event of a race instead of only calling in there due to is_migrate_isolatetype. Consider adding a debug patch that counts with vmstat how often this race occurs and check the value with and without the compaction patches you've added 4. It's not obvious but __isolate_free_page should already hold the zone lock 5. buffered_rmqueue, move the call to get_pageblock_migratetype under the zone lock. It'll just cost a local variable. 6. A race in setup_zone_migrate_reserve is relatively harmless. Check system_state == SYSTEM_BOOTING and take the zone->lock if the system is live. Release, resched and reacquire if need_resched() 7. has_unmovable_pages is harmless, the range should be isolated and not racing against other updates -- Mel Gorman SUSE Labs From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753057AbaBCHpL (ORCPT ); Mon, 3 Feb 2014 02:45:11 -0500 Received: from LGEMRELSE7Q.lge.com ([156.147.1.151]:42599 "EHLO lgemrelse7q.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752964AbaBCHpJ (ORCPT ); Mon, 3 Feb 2014 02:45:09 -0500 X-Original-SENDERIP: 10.177.222.146 X-Original-MAILFROM: iamjoonsoo.kim@lge.com Date: Mon, 3 Feb 2014 16:45:07 +0900 From: Joonsoo Kim To: Vlastimil Babka Cc: Mel Gorman , Andrew Morton , "Kirill A. Shutemov" , Rik van Riel , Jiang Liu , Cody P Schafer , Johannes Weiner , Michal Hocko , Minchan Kim , Michal Nazarewicz , Andi Kleen , Wei Yongjun , Tang Chen , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 0/7] improve robustness on handling migratetype Message-ID: <20140203074507.GB2360@lge.com> References: <1389251087-10224-1-git-send-email-iamjoonsoo.kim@lge.com> <20140109092720.GM27046@suse.de> <20140110084854.GA22058@lge.com> <52E931D9.8050002@suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <52E931D9.8050002@suse.cz> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jan 29, 2014 at 05:52:41PM +0100, Vlastimil Babka wrote: > On 01/10/2014 09:48 AM, Joonsoo Kim wrote: > >On Thu, Jan 09, 2014 at 09:27:20AM +0000, Mel Gorman wrote: > >>On Thu, Jan 09, 2014 at 04:04:40PM +0900, Joonsoo Kim wrote: > >>>Hello, > >>> > >>>I found some weaknesses on handling migratetype during code review and > >>>testing CMA. > >>> > >>>First, we don't have any synchronization method on get/set pageblock > >>>migratetype. When we change migratetype, we hold the zone lock. So > >>>writer-writer race doesn't exist. But while someone changes migratetype, > >>>others can get migratetype. This may introduce totally unintended value > >>>as migratetype. Although I haven't heard of any problem report about > >>>that, it is better to protect properly. > >>> > >> > >>This is deliberate. The migratetypes for the majority of users are advisory > >>and aimed for fragmentation avoidance. It was important that the cost of > >>that be kept as low as possible and the general case is that migration types > >>change very rarely. In many cases, the zone lock is held. In other cases, > >>such as splitting free pages, the cost is simply not justified. > >> > >>I doubt there is any amount of data you could add in support that would > >>justify hammering the free fast paths (which call get_pageblock_type). > > > >Hello, Mel. > > > >There is a possibility that we can get unintended value such as 6 as migratetype > >if reader-writer (get/set pageblock_migratetype) race happends. It can be > >possible, because we read the value without any synchronization method. And > >this migratetype, 6, has no place in buddy freelist, so array index overrun can > >be possible and the system can break, although I haven't heard that it occurs. > > Hello, > > it seems this can indeed happen. I'm working on memory compaction > improvements and in a prototype patch, I'm basically adding calls of > start_isolate_page_range() undo_isolate_page_range() some functions > under compact_zone(). With this I've seen occurrences of NULL > pointers in move_freepages(), free_one_page() in places where > free_list[migratetype] is manipulated by e.g. list_move(). That lead > me to question the value of migratetype and I found this thread. > Adding some debugging in get_pageblock_migratetype() and voila, I > get a value of 6 being read. > > So is it just my patch adding a dangerous situation, or does it exist in > mainline as well? By looking at free_one_page(), it uses zone->lock, but > get_pageblock_migratetype() is called by its callers > (free_hot_cold_page() or __free_pages_ok()) outside of the lock. > This determined migratetype is then used under free_one_page() to > access a free_list. > > It seems that this could race with set_pageblock_migratetype() > called from try_to_steal_freepages() (despite the latter being > properly locked). There are also other callers but those seem to be > either limited to initialization and isolation, which should be rare > (?). > However, try_to_steal_freepages can occur repeatedly. > So I assume that the race happens but never manifests as a fatal > error as long as MIGRATE_UNMOVABLE, MIGRATE_RECLAIMABLE and > MIGRATE_MOVABLE > values are used. Only MIGRATE_CMA and MIGRATE_ISOLATE have values > with bit 4 enabled and can thus result in invalid values due to > non-atomic access. > > Does that make sense to you and should we thus proceed with patching > this race? > Hello, This race is possible without your prototype patch, however, on very low probability. Some codes related to memory failure use set_migratetype_isolate() which could result in this race. Although it may be very rare case and not critical, it is better to fix this race. I prefer that we don't depend on luck. :) Mel's suggestion looks good to me. Do you have another idea? Thanks. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751483AbaBCJRB (ORCPT ); Mon, 3 Feb 2014 04:17:01 -0500 Received: from cantor2.suse.de ([195.135.220.15]:54892 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750894AbaBCJRA (ORCPT ); Mon, 3 Feb 2014 04:17:00 -0500 Message-ID: <52EF5E82.4060003@suse.cz> Date: Mon, 03 Feb 2014 10:16:50 +0100 From: Vlastimil Babka User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.2.0 MIME-Version: 1.0 To: Joonsoo Kim CC: Mel Gorman , Andrew Morton , "Kirill A. Shutemov" , Rik van Riel , Jiang Liu , Cody P Schafer , Johannes Weiner , Michal Hocko , Minchan Kim , Michal Nazarewicz , Andi Kleen , Wei Yongjun , Tang Chen , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 0/7] improve robustness on handling migratetype References: <1389251087-10224-1-git-send-email-iamjoonsoo.kim@lge.com> <20140109092720.GM27046@suse.de> <20140110084854.GA22058@lge.com> <52E931D9.8050002@suse.cz> <20140203074507.GB2360@lge.com> In-Reply-To: <20140203074507.GB2360@lge.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 02/03/2014 08:45 AM, Joonsoo Kim wrote: > On Wed, Jan 29, 2014 at 05:52:41PM +0100, Vlastimil Babka wrote: >> On 01/10/2014 09:48 AM, Joonsoo Kim wrote: >>> On Thu, Jan 09, 2014 at 09:27:20AM +0000, Mel Gorman wrote: >>>> On Thu, Jan 09, 2014 at 04:04:40PM +0900, Joonsoo Kim wrote: >>>>> Hello, >>>>> >>>>> I found some weaknesses on handling migratetype during code review and >>>>> testing CMA. >>>>> >>>>> First, we don't have any synchronization method on get/set pageblock >>>>> migratetype. When we change migratetype, we hold the zone lock. So >>>>> writer-writer race doesn't exist. But while someone changes migratetype, >>>>> others can get migratetype. This may introduce totally unintended value >>>>> as migratetype. Although I haven't heard of any problem report about >>>>> that, it is better to protect properly. >>>>> >>>> >>>> This is deliberate. The migratetypes for the majority of users are advisory >>>> and aimed for fragmentation avoidance. It was important that the cost of >>>> that be kept as low as possible and the general case is that migration types >>>> change very rarely. In many cases, the zone lock is held. In other cases, >>>> such as splitting free pages, the cost is simply not justified. >>>> >>>> I doubt there is any amount of data you could add in support that would >>>> justify hammering the free fast paths (which call get_pageblock_type). >>> >>> Hello, Mel. >>> >>> There is a possibility that we can get unintended value such as 6 as migratetype >>> if reader-writer (get/set pageblock_migratetype) race happends. It can be >>> possible, because we read the value without any synchronization method. And >>> this migratetype, 6, has no place in buddy freelist, so array index overrun can >>> be possible and the system can break, although I haven't heard that it occurs. >> >> Hello, >> >> it seems this can indeed happen. I'm working on memory compaction >> improvements and in a prototype patch, I'm basically adding calls of >> start_isolate_page_range() undo_isolate_page_range() some functions >> under compact_zone(). With this I've seen occurrences of NULL >> pointers in move_freepages(), free_one_page() in places where >> free_list[migratetype] is manipulated by e.g. list_move(). That lead >> me to question the value of migratetype and I found this thread. >> Adding some debugging in get_pageblock_migratetype() and voila, I >> get a value of 6 being read. >> >> So is it just my patch adding a dangerous situation, or does it exist in >> mainline as well? By looking at free_one_page(), it uses zone->lock, but >> get_pageblock_migratetype() is called by its callers >> (free_hot_cold_page() or __free_pages_ok()) outside of the lock. >> This determined migratetype is then used under free_one_page() to >> access a free_list. >> >> It seems that this could race with set_pageblock_migratetype() >> called from try_to_steal_freepages() (despite the latter being >> properly locked). There are also other callers but those seem to be >> either limited to initialization and isolation, which should be rare >> (?). >> However, try_to_steal_freepages can occur repeatedly. >> So I assume that the race happens but never manifests as a fatal >> error as long as MIGRATE_UNMOVABLE, MIGRATE_RECLAIMABLE and >> MIGRATE_MOVABLE >> values are used. Only MIGRATE_CMA and MIGRATE_ISOLATE have values >> with bit 4 enabled and can thus result in invalid values due to >> non-atomic access. >> >> Does that make sense to you and should we thus proceed with patching >> this race? >> > > Hello, > > This race is possible without your prototype patch, however, on very low > probability. Some codes related to memory failure use set_migratetype_isolate() > which could result in this race. > > Although it may be very rare case and not critical, it is better to fix > this race. I prefer that we don't depend on luck. :) I agree :) I also don't like the possibility that the non-fatal type of race (where higher-order bits are not involved) occurs and can hurt anti-fragmentation, or even suddenly become a problem in the future if e.g. more migratetypes are added. I'll try to quantify that with a debug patch. > Mel's suggestion looks good to me. Do you have another idea? No, it sounds good so I'm going to work on this as outlined. > Thanks. >