* [PATCH 0/3] memory hotplug: updates and bugfix for is_removable
@ 2010-09-06 5:40 KAMEZAWA Hiroyuki
2010-09-06 5:42 ` [BUGFIX][PATCH 1/3] memory hotplug: fix next block calculation in is_removable KAMEZAWA Hiroyuki
` (3 more replies)
0 siblings, 4 replies; 18+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-09-06 5:40 UTC (permalink / raw)
To: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org, Michal Hocko, fengguang.wu,
akpm@linux-foundation.org, Mel Gorman, andi.kleen, Dave Hansen
Problem:
/sys/devices/system/memory/memoryX/removable file shows whether the section
can be offlined or not. Returns "1" if it seems removable.
Now, the file uses a similar logic to one offline_pages() uses.
Problem here is.
- removable detection logics of is_removable() and offline_pages() is
different from each other.
- The logic, which check MIGRATE_TYPE, tend to be incorrect once fragmented.
MIGRATE_TYPE of a pageblock is just a hint, no guarantee.
Then, this patch set does.
- use the same logic between is_removable() and offline_pages().
- don't use MIGRATE_TYPE, check the memmap itself directly rather than hint.
Brief patch description:
1. bugfix for is_removable() check. I think this should be back ported.
2. bugfix for callback at counting immobile pages.
I think the old logic rarely hits this bug..so, not necessary to backport.
3. the unified new logic for is_remobable.
Only patch1 is CCed to stable for now and the patch series itself is onto
mmotm-08-27.
Thanks,
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 18+ messages in thread* [BUGFIX][PATCH 1/3] memory hotplug: fix next block calculation in is_removable 2010-09-06 5:40 [PATCH 0/3] memory hotplug: updates and bugfix for is_removable KAMEZAWA Hiroyuki @ 2010-09-06 5:42 ` KAMEZAWA Hiroyuki 2010-09-06 13:39 ` Mel Gorman 2010-09-06 5:44 ` [PATCH 2/3] memory hotplug: fix set_migratetype_isolate wrong callback result check KAMEZAWA Hiroyuki ` (2 subsequent siblings) 3 siblings, 1 reply; 18+ messages in thread From: KAMEZAWA Hiroyuki @ 2010-09-06 5:42 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Michal Hocko, fengguang.wu, akpm@linux-foundation.org, Mel Gorman, andi.kleen, Dave Hansen, stable From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> next_active_pageblock() is for finding next _used_ freeblock. It skips several blocks when it finds there are a chunk of free pages lager than pageblock. But it has 2 bugs. 1. We have no lock. page_order(page) - pageblock_order can be minus. 2. pageblocks_stride += is wrong. it should skip page_order(p) of pages. CC: stable@kernel.org Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> --- mm/memory_hotplug.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) Index: kametest/mm/memory_hotplug.c =================================================================== --- kametest.orig/mm/memory_hotplug.c +++ kametest/mm/memory_hotplug.c @@ -584,19 +584,19 @@ static inline int pageblock_free(struct /* Return the start of the next active pageblock after a given page */ static struct page *next_active_pageblock(struct page *page) { - int pageblocks_stride; - /* Ensure the starting page is pageblock-aligned */ BUG_ON(page_to_pfn(page) & (pageblock_nr_pages - 1)); - /* Move forward by at least 1 * pageblock_nr_pages */ - pageblocks_stride = 1; - /* If the entire pageblock is free, move to the end of free page */ - if (pageblock_free(page)) - pageblocks_stride += page_order(page) - pageblock_order; + if (pageblock_free(page)) { + int order; + /* be careful. we don't have locks, page_order can be changed.*/ + order = page_order(page); + if (order > pageblock_order) + return page + (1 << order); + } - return page + (pageblocks_stride * pageblock_nr_pages); + return page + pageblock_nr_pages; } /* Checks if this range of memory is likely to be hot-removable. */ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [BUGFIX][PATCH 1/3] memory hotplug: fix next block calculation in is_removable 2010-09-06 5:42 ` [BUGFIX][PATCH 1/3] memory hotplug: fix next block calculation in is_removable KAMEZAWA Hiroyuki @ 2010-09-06 13:39 ` Mel Gorman 2010-09-06 17:15 ` Hiroyuki Kamezawa 0 siblings, 1 reply; 18+ messages in thread From: Mel Gorman @ 2010-09-06 13:39 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Michal Hocko, fengguang.wu, akpm@linux-foundation.org, andi.kleen, Dave Hansen, stable On Mon, Sep 06, 2010 at 02:42:28PM +0900, KAMEZAWA Hiroyuki wrote: > > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> > > next_active_pageblock() is for finding next _used_ freeblock. It skips > several blocks when it finds there are a chunk of free pages lager than > pageblock. But it has 2 bugs. > > 1. We have no lock. page_order(page) - pageblock_order can be minus. > 2. pageblocks_stride += is wrong. it should skip page_order(p) of pages. > > CC: stable@kernel.org > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> > --- > mm/memory_hotplug.c | 16 ++++++++-------- > 1 file changed, 8 insertions(+), 8 deletions(-) > > Index: kametest/mm/memory_hotplug.c > =================================================================== > --- kametest.orig/mm/memory_hotplug.c > +++ kametest/mm/memory_hotplug.c > @@ -584,19 +584,19 @@ static inline int pageblock_free(struct > /* Return the start of the next active pageblock after a given page */ > static struct page *next_active_pageblock(struct page *page) > { > - int pageblocks_stride; > - > /* Ensure the starting page is pageblock-aligned */ > BUG_ON(page_to_pfn(page) & (pageblock_nr_pages - 1)); > > - /* Move forward by at least 1 * pageblock_nr_pages */ > - pageblocks_stride = 1; > - > /* If the entire pageblock is free, move to the end of free page */ > - if (pageblock_free(page)) > - pageblocks_stride += page_order(page) - pageblock_order; > + if (pageblock_free(page)) { > + int order; > + /* be careful. we don't have locks, page_order can be changed.*/ > + order = page_order(page); > + if (order > pageblock_order) > + return page + (1 << order); > + } As you note in your changelog, page_order() is unsafe because we do not have the zone lock but you don't check if order is somewhere between pageblock_order and MAX_ORDER_NR_PAGES. How is this safer? > > - return page + (pageblocks_stride * pageblock_nr_pages); > + return page + pageblock_nr_pages; > } > > /* Checks if this range of memory is likely to be hot-removable. */ > -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [BUGFIX][PATCH 1/3] memory hotplug: fix next block calculation in is_removable 2010-09-06 13:39 ` Mel Gorman @ 2010-09-06 17:15 ` Hiroyuki Kamezawa 2010-09-07 9:30 ` Mel Gorman 0 siblings, 1 reply; 18+ messages in thread From: Hiroyuki Kamezawa @ 2010-09-06 17:15 UTC (permalink / raw) To: Mel Gorman Cc: KAMEZAWA Hiroyuki, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Michal Hocko, fengguang.wu, akpm@linux-foundation.org, andi.kleen, Dave Hansen, stable 2010/9/6 Mel Gorman <mel@csn.ul.ie>: > On Mon, Sep 06, 2010 at 02:42:28PM +0900, KAMEZAWA Hiroyuki wrote: >> >> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> >> >> next_active_pageblock() is for finding next _used_ freeblock. It skips >> several blocks when it finds there are a chunk of free pages lager than >> pageblock. But it has 2 bugs. >> >> 1. We have no lock. page_order(page) - pageblock_order can be minus. >> 2. pageblocks_stride += is wrong. it should skip page_order(p) of pages. >> >> CC: stable@kernel.org >> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> >> --- >> mm/memory_hotplug.c | 16 ++++++++-------- >> 1 file changed, 8 insertions(+), 8 deletions(-) >> >> Index: kametest/mm/memory_hotplug.c >> =================================================================== >> --- kametest.orig/mm/memory_hotplug.c >> +++ kametest/mm/memory_hotplug.c >> @@ -584,19 +584,19 @@ static inline int pageblock_free(struct >> /* Return the start of the next active pageblock after a given page */ >> static struct page *next_active_pageblock(struct page *page) >> { >> - int pageblocks_stride; >> - >> /* Ensure the starting page is pageblock-aligned */ >> BUG_ON(page_to_pfn(page) & (pageblock_nr_pages - 1)); >> >> - /* Move forward by at least 1 * pageblock_nr_pages */ >> - pageblocks_stride = 1; >> - >> /* If the entire pageblock is free, move to the end of free page */ >> - if (pageblock_free(page)) >> - pageblocks_stride += page_order(page) - pageblock_order; >> + if (pageblock_free(page)) { >> + int order; >> + /* be careful. we don't have locks, page_order can be changed.*/ >> + order = page_order(page); >> + if (order > pageblock_order) >> + return page + (1 << order); >> + } > > As you note in your changelog, page_order() is unsafe because we do not have > the zone lock but you don't check if order is somewhere between pageblock_order > and MAX_ORDER_NR_PAGES. How is this safer? > Ah, I missed that. if ((pageblock_order <= order) && (order < MAX_ORDER)) return page + (1 << order); ok ? Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [BUGFIX][PATCH 1/3] memory hotplug: fix next block calculation in is_removable 2010-09-06 17:15 ` Hiroyuki Kamezawa @ 2010-09-07 9:30 ` Mel Gorman 0 siblings, 0 replies; 18+ messages in thread From: Mel Gorman @ 2010-09-07 9:30 UTC (permalink / raw) To: Hiroyuki Kamezawa Cc: KAMEZAWA Hiroyuki, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Michal Hocko, fengguang.wu, akpm@linux-foundation.org, andi.kleen, Dave Hansen, stable On Tue, Sep 07, 2010 at 02:15:01AM +0900, Hiroyuki Kamezawa wrote: > 2010/9/6 Mel Gorman <mel@csn.ul.ie>: > > On Mon, Sep 06, 2010 at 02:42:28PM +0900, KAMEZAWA Hiroyuki wrote: > >> > >> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> > >> > >> next_active_pageblock() is for finding next _used_ freeblock. It skips > >> several blocks when it finds there are a chunk of free pages lager than > >> pageblock. But it has 2 bugs. > >> > >> 1. We have no lock. page_order(page) - pageblock_order can be minus. > >> 2. pageblocks_stride += is wrong. it should skip page_order(p) of pages. > >> > >> CC: stable@kernel.org > >> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> > >> --- > >> mm/memory_hotplug.c | 16 ++++++++-------- > >> 1 file changed, 8 insertions(+), 8 deletions(-) > >> > >> Index: kametest/mm/memory_hotplug.c > >> =================================================================== > >> --- kametest.orig/mm/memory_hotplug.c > >> +++ kametest/mm/memory_hotplug.c > >> @@ -584,19 +584,19 @@ static inline int pageblock_free(struct > >> /* Return the start of the next active pageblock after a given page */ > >> static struct page *next_active_pageblock(struct page *page) > >> { > >> - int pageblocks_stride; > >> - > >> /* Ensure the starting page is pageblock-aligned */ > >> BUG_ON(page_to_pfn(page) & (pageblock_nr_pages - 1)); > >> > >> - /* Move forward by at least 1 * pageblock_nr_pages */ > >> - pageblocks_stride = 1; > >> - > >> /* If the entire pageblock is free, move to the end of free page */ > >> - if (pageblock_free(page)) > >> - pageblocks_stride += page_order(page) - pageblock_order; > >> + if (pageblock_free(page)) { > >> + int order; > >> + /* be careful. we don't have locks, page_order can be changed.*/ > >> + order = page_order(page); > >> + if (order > pageblock_order) > >> + return page + (1 << order); > >> + } > > > > As you note in your changelog, page_order() is unsafe because we do not have > > the zone lock but you don't check if order is somewhere between pageblock_order > > and MAX_ORDER_NR_PAGES. How is this safer? > > > Ah, I missed that. > > if ((pageblock_order <= order) && (order < MAX_ORDER)) > return page + (1 << order); > ok ? > Seems ok. There will still be some false usage of order but it should be harmless. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 2/3] memory hotplug: fix set_migratetype_isolate wrong callback result check 2010-09-06 5:40 [PATCH 0/3] memory hotplug: updates and bugfix for is_removable KAMEZAWA Hiroyuki 2010-09-06 5:42 ` [BUGFIX][PATCH 1/3] memory hotplug: fix next block calculation in is_removable KAMEZAWA Hiroyuki @ 2010-09-06 5:44 ` KAMEZAWA Hiroyuki 2010-09-06 5:47 ` [PATCH 3/3] memory hotplug: use unified logic for is_removable and offline_pages KAMEZAWA Hiroyuki 2010-09-07 1:28 ` [PATCH 0/3] memory hotplug: updates and bugfix for is_removable v3 KAMEZAWA Hiroyuki 3 siblings, 0 replies; 18+ messages in thread From: KAMEZAWA Hiroyuki @ 2010-09-06 5:44 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Michal Hocko, fengguang.wu, akpm@linux-foundation.org, Mel Gorman, andi.kleen, Dave Hansen From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Even if notifier cannot find any pages, it doesn't mean no pages are available...And, if there are no notifiers registered, this condition will be always true and memory hotplug will show -EBUSY. Clarification:This is a bug but not critical In most case, a pageblock which will be offlined is MIGRATE_MOVABLE This "notifier" is called only when the pageblock is _not_ MIGRATE_MOVABLE. If not MIGRATE_MOVABLE, it's common case that memory hotplug will fail. So, no one notice this bug. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> --- mm/page_alloc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Index: kametest/mm/page_alloc.c =================================================================== --- kametest.orig/mm/page_alloc.c +++ kametest/mm/page_alloc.c @@ -5313,7 +5313,7 @@ int set_migratetype_isolate(struct page */ notifier_ret = memory_isolate_notify(MEM_ISOLATE_COUNT, &arg); notifier_ret = notifier_to_errno(notifier_ret); - if (notifier_ret || !arg.pages_found) + if (notifier_ret) goto out; for (iter = pfn; iter < (pfn + pageblock_nr_pages); iter++) { -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 3/3] memory hotplug: use unified logic for is_removable and offline_pages 2010-09-06 5:40 [PATCH 0/3] memory hotplug: updates and bugfix for is_removable KAMEZAWA Hiroyuki 2010-09-06 5:42 ` [BUGFIX][PATCH 1/3] memory hotplug: fix next block calculation in is_removable KAMEZAWA Hiroyuki 2010-09-06 5:44 ` [PATCH 2/3] memory hotplug: fix set_migratetype_isolate wrong callback result check KAMEZAWA Hiroyuki @ 2010-09-06 5:47 ` KAMEZAWA Hiroyuki 2010-09-06 9:30 ` Michal Hocko 2010-09-06 13:58 ` Mel Gorman 2010-09-07 1:28 ` [PATCH 0/3] memory hotplug: updates and bugfix for is_removable v3 KAMEZAWA Hiroyuki 3 siblings, 2 replies; 18+ messages in thread From: KAMEZAWA Hiroyuki @ 2010-09-06 5:47 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Michal Hocko, fengguang.wu, akpm@linux-foundation.org, Mel Gorman, andi.kleen, Dave Hansen Now, sysfs interface of memory hotplug shows whether the section is removable or not. But it checks only migrateype of pages and doesn't check details of cluster of pages. Next, memory hotplug's set_migratetype_isolate() has the same kind of check, too. But the migrate-type is just a "hint" and the pageblock can contain several types of pages if fragmentation is very heavy. To get precise information, we need to check - the pageblock only contains free pages or LRU pages. This patch adds the function __count_unmovable_pages() and makes above 2 checks to use the same logic. This will improve user experience of memory hotplug because sysfs interface tells accurate information. Note: it may be better to check MIGRATE_UNMOVABLE for making failure case quick. Changelog: 2010/09/06 - added comments. - removed zone->lock. - changed the name of the function to be is_pageblock_removable_async(). because I removed the zone->lock. Reported-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> --- include/linux/memory_hotplug.h | 1 mm/memory_hotplug.c | 15 ------- mm/page_alloc.c | 77 ++++++++++++++++++++++++++++++----------- 3 files changed, 60 insertions(+), 33 deletions(-) Index: kametest/mm/page_alloc.c =================================================================== --- kametest.orig/mm/page_alloc.c +++ kametest/mm/page_alloc.c @@ -5274,11 +5274,61 @@ void set_pageblock_flags_group(struct pa * page allocater never alloc memory from ISOLATE block. */ +static int __count_immobile_pages(struct zone *zone, struct page *page) +{ + unsigned long pfn, iter, found; + /* + * For avoiding noise data, lru_add_drain_all() should be called + * If ZONE_MOVABLE, the zone never contains immobile pages + */ + if (zone_idx(zone) == ZONE_MOVABLE) + return 0; + + pfn = page_to_pfn(page); + for (found = 0, iter = 0; iter < pageblock_nr_pages; iter++) { + unsigned long check = pfn + iter; + + if (!pfn_valid_within(check)) { + iter++; + continue; + } + page = pfn_to_page(check); + if (!page_count(page)) { + if (PageBuddy(page)) + iter += (1 << page_order(page)) - 1; + continue; + } + if (!PageLRU(page)) + found++; + /* + * If the page is not RAM, page_count()should be 0. + * we don't need more check. This is an _used_ not-movable page. + * + * The problematic thing here is PG_reserved pages. PG_reserved + * is set to both of a memory hole page and a _used_ kernel + * page at boot. + */ + } + return found; +} + +bool is_pageblock_removable_async(struct page *page) +{ + struct zone *zone = page_zone(page); + unsigned long flags; + int num; + /* Don't take zone->lock interntionally. */ + num = __count_immobile_pages(zone, page); + + if (num) + return false; + return true; +} + int set_migratetype_isolate(struct page *page) { struct zone *zone; - struct page *curr_page; - unsigned long flags, pfn, iter; + unsigned long flags, pfn; unsigned long immobile = 0; struct memory_isolate_notify arg; int notifier_ret; @@ -5289,11 +5339,6 @@ int set_migratetype_isolate(struct page zone_idx = zone_idx(zone); spin_lock_irqsave(&zone->lock, flags); - if (get_pageblock_migratetype(page) == MIGRATE_MOVABLE || - zone_idx == ZONE_MOVABLE) { - ret = 0; - goto out; - } pfn = page_to_pfn(page); arg.start_pfn = pfn; @@ -5315,19 +5360,13 @@ int set_migratetype_isolate(struct page notifier_ret = notifier_to_errno(notifier_ret); if (notifier_ret) goto out; + immobile = __count_immobile_pages(zone ,page); - for (iter = pfn; iter < (pfn + pageblock_nr_pages); iter++) { - if (!pfn_valid_within(pfn)) - continue; - - curr_page = pfn_to_page(iter); - if (!page_count(curr_page) || PageLRU(curr_page)) - continue; - - immobile++; - } - - if (arg.pages_found == immobile) + /* + * immobile means "not-on-lru" paes. If immobile is larger than + * removable-by-driver pages reported by notifier, we'll fail. + */ + if (!immobile || arg.pages_found >= immobile) ret = 0; out: Index: kametest/mm/memory_hotplug.c =================================================================== --- kametest.orig/mm/memory_hotplug.c +++ kametest/mm/memory_hotplug.c @@ -602,27 +602,14 @@ static struct page *next_active_pagebloc /* Checks if this range of memory is likely to be hot-removable. */ int is_mem_section_removable(unsigned long start_pfn, unsigned long nr_pages) { - int type; struct page *page = pfn_to_page(start_pfn); struct page *end_page = page + nr_pages; /* Check the starting page of each pageblock within the range */ for (; page < end_page; page = next_active_pageblock(page)) { - type = get_pageblock_migratetype(page); - - /* - * A pageblock containing MOVABLE or free pages is considered - * removable - */ - if (type != MIGRATE_MOVABLE && !pageblock_free(page)) + if (!is_pageblock_removable_async(page)) return 0; - /* - * A pageblock starting with a PageReserved page is not - * considered removable. - */ - if (PageReserved(page)) - return 0; } /* All pageblocks in the memory block are likely to be hot-removable */ Index: kametest/include/linux/memory_hotplug.h =================================================================== --- kametest.orig/include/linux/memory_hotplug.h +++ kametest/include/linux/memory_hotplug.h @@ -69,6 +69,7 @@ extern void online_page(struct page *pag /* VM interface that may be used by firmware interface */ extern int online_pages(unsigned long, unsigned long); extern void __offline_isolated_pages(unsigned long, unsigned long); +extern bool is_pageblock_removable_async(struct page *page); /* reasonably generic interface to expand the physical pages in a zone */ extern int __add_pages(int nid, struct zone *zone, unsigned long start_pfn, -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 3/3] memory hotplug: use unified logic for is_removable and offline_pages 2010-09-06 5:47 ` [PATCH 3/3] memory hotplug: use unified logic for is_removable and offline_pages KAMEZAWA Hiroyuki @ 2010-09-06 9:30 ` Michal Hocko 2010-09-06 13:30 ` Hiroyuki Kamezawa 2010-09-06 13:58 ` Mel Gorman 1 sibling, 1 reply; 18+ messages in thread From: Michal Hocko @ 2010-09-06 9:30 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, fengguang.wu, akpm@linux-foundation.org, Mel Gorman, andi.kleen, Dave Hansen On Mon 06-09-10 14:47:16, KAMEZAWA Hiroyuki wrote: > > Now, sysfs interface of memory hotplug shows whether the section is > removable or not. But it checks only migrateype of pages and doesn't > check details of cluster of pages. > > Next, memory hotplug's set_migratetype_isolate() has the same kind > of check, too. But the migrate-type is just a "hint" and the pageblock > can contain several types of pages if fragmentation is very heavy. > > To get precise information, we need to check > - the pageblock only contains free pages or LRU pages. > > This patch adds the function __count_unmovable_pages() and makes > above 2 checks to use the same logic. This will improve user experience > of memory hotplug because sysfs interface tells accurate information. > > Note: > it may be better to check MIGRATE_UNMOVABLE for making failure case quick. > > Changelog: 2010/09/06 > - added comments. > - removed zone->lock. > - changed the name of the function to be is_pageblock_removable_async(). > because I removed the zone->lock. wouldn't be __is_pageblock_removable a better name? _async suffix is usually used for asynchronous operations and this is just a function withtout locks. > > Reported-by: Michal Hocko <mhocko@suse.cz> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> > --- > include/linux/memory_hotplug.h | 1 > mm/memory_hotplug.c | 15 ------- > mm/page_alloc.c | 77 ++++++++++++++++++++++++++++++----------- > 3 files changed, 60 insertions(+), 33 deletions(-) > > Index: kametest/mm/page_alloc.c > =================================================================== > --- kametest.orig/mm/page_alloc.c > +++ kametest/mm/page_alloc.c > @@ -5274,11 +5274,61 @@ void set_pageblock_flags_group(struct pa > * page allocater never alloc memory from ISOLATE block. > */ > Can we add a comment on the locking? Something like: Caller should hold zone->lock if he needs consistent results. > +static int __count_immobile_pages(struct zone *zone, struct page *page) > +{ > + unsigned long pfn, iter, found; > + /* > + * For avoiding noise data, lru_add_drain_all() should be called > + * If ZONE_MOVABLE, the zone never contains immobile pages > + */ > + if (zone_idx(zone) == ZONE_MOVABLE) > + return 0; > + > + pfn = page_to_pfn(page); > + for (found = 0, iter = 0; iter < pageblock_nr_pages; iter++) { > + unsigned long check = pfn + iter; > + > + if (!pfn_valid_within(check)) { > + iter++; > + continue; > + } > + page = pfn_to_page(check); > + if (!page_count(page)) { > + if (PageBuddy(page)) > + iter += (1 << page_order(page)) - 1; > + continue; > + } > + if (!PageLRU(page)) > + found++; > + /* > + * If the page is not RAM, page_count()should be 0. > + * we don't need more check. This is an _used_ not-movable page. > + * > + * The problematic thing here is PG_reserved pages. PG_reserved > + * is set to both of a memory hole page and a _used_ kernel > + * page at boot. > + */ > + } > + return found; > +} > + > +bool is_pageblock_removable_async(struct page *page) > +{ > + struct zone *zone = page_zone(page); > + unsigned long flags; > + int num; > + /* Don't take zone->lock interntionally. */ Could you add the reason? Don't take zone-> lock intentionally because we are called from the userspace (sysfs interface). [...] > /* All pageblocks in the memory block are likely to be hot-removable */ > Index: kametest/include/linux/memory_hotplug.h > =================================================================== > --- kametest.orig/include/linux/memory_hotplug.h > +++ kametest/include/linux/memory_hotplug.h > @@ -69,6 +69,7 @@ extern void online_page(struct page *pag > /* VM interface that may be used by firmware interface */ > extern int online_pages(unsigned long, unsigned long); > extern void __offline_isolated_pages(unsigned long, unsigned long); #ifdef CONFIG_HOTREMOVE > +extern bool is_pageblock_removable_async(struct page *page); #else #define is_pageblock_removable_async(p) 0 #endif ? Thanks! -- Michal Hocko L3 team SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 3/3] memory hotplug: use unified logic for is_removable and offline_pages 2010-09-06 9:30 ` Michal Hocko @ 2010-09-06 13:30 ` Hiroyuki Kamezawa 2010-09-07 13:16 ` Michal Hocko 0 siblings, 1 reply; 18+ messages in thread From: Hiroyuki Kamezawa @ 2010-09-06 13:30 UTC (permalink / raw) To: Michal Hocko Cc: KAMEZAWA Hiroyuki, linux-mm@kvack.org, linux-kernel@vger.kernel.org, fengguang.wu, akpm@linux-foundation.org, Mel Gorman, andi.kleen, Dave Hansen 2010/9/6 Michal Hocko <mhocko@suse.cz>: > On Mon 06-09-10 14:47:16, KAMEZAWA Hiroyuki wrote: >> >> Now, sysfs interface of memory hotplug shows whether the section is >> removable or not. But it checks only migrateype of pages and doesn't >> check details of cluster of pages. >> >> Next, memory hotplug's set_migratetype_isolate() has the same kind >> of check, too. But the migrate-type is just a "hint" and the pageblock >> can contain several types of pages if fragmentation is very heavy. >> >> To get precise information, we need to check >> - the pageblock only contains free pages or LRU pages. >> >> This patch adds the function __count_unmovable_pages() and makes >> above 2 checks to use the same logic. This will improve user experience >> of memory hotplug because sysfs interface tells accurate information. >> >> Note: >> it may be better to check MIGRATE_UNMOVABLE for making failure case quick. >> >> Changelog: 2010/09/06 >> - added comments. >> - removed zone->lock. >> - changed the name of the function to be is_pageblock_removable_async(). >> because I removed the zone->lock. > > wouldn't be __is_pageblock_removable a better name? _async suffix is > usually used for asynchronous operations and this is just a function > withtout locks. > rename as _is_pagebloc_removable_nolock(). >> >> Reported-by: Michal Hocko <mhocko@suse.cz> >> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> >> --- >> include/linux/memory_hotplug.h | 1 >> mm/memory_hotplug.c | 15 ------- >> mm/page_alloc.c | 77 ++++++++++++++++++++++++++++++----------- >> 3 files changed, 60 insertions(+), 33 deletions(-) >> >> Index: kametest/mm/page_alloc.c >> =================================================================== >> --- kametest.orig/mm/page_alloc.c >> +++ kametest/mm/page_alloc.c >> @@ -5274,11 +5274,61 @@ void set_pageblock_flags_group(struct pa >> * page allocater never alloc memory from ISOLATE block. >> */ >> > > Can we add a comment on the locking? Something like: > Caller should hold zone->lock if he needs consistent results. > Hmm. ok. >> +static int __count_immobile_pages(struct zone *zone, struct page *page) >> +{ >> + unsigned long pfn, iter, found; >> + /* >> + * For avoiding noise data, lru_add_drain_all() should be called >> + * If ZONE_MOVABLE, the zone never contains immobile pages >> + */ >> + if (zone_idx(zone) == ZONE_MOVABLE) >> + return 0; >> + >> + pfn = page_to_pfn(page); >> + for (found = 0, iter = 0; iter < pageblock_nr_pages; iter++) { >> + unsigned long check = pfn + iter; >> + >> + if (!pfn_valid_within(check)) { >> + iter++; >> + continue; >> + } >> + page = pfn_to_page(check); >> + if (!page_count(page)) { >> + if (PageBuddy(page)) >> + iter += (1 << page_order(page)) - 1; >> + continue; >> + } >> + if (!PageLRU(page)) >> + found++; >> + /* >> + * If the page is not RAM, page_count()should be 0. >> + * we don't need more check. This is an _used_ not-movable page. >> + * >> + * The problematic thing here is PG_reserved pages. PG_reserved >> + * is set to both of a memory hole page and a _used_ kernel >> + * page at boot. >> + */ >> + } >> + return found; >> +} >> + >> +bool is_pageblock_removable_async(struct page *page) >> +{ >> + struct zone *zone = page_zone(page); >> + unsigned long flags; >> + int num; >> + /* Don't take zone->lock interntionally. */ > > Could you add the reason? > Don't take zone-> lock intentionally because we are called from the > userspace (sysfs interface). > I don't like to assume caller context which will limit the callers. /* holding zone->lock or not is caller's job. */ > [...] >> /* All pageblocks in the memory block are likely to be hot-removable */ >> Index: kametest/include/linux/memory_hotplug.h >> =================================================================== >> --- kametest.orig/include/linux/memory_hotplug.h >> +++ kametest/include/linux/memory_hotplug.h >> @@ -69,6 +69,7 @@ extern void online_page(struct page *pag >> /* VM interface that may be used by firmware interface */ >> extern int online_pages(unsigned long, unsigned long); >> extern void __offline_isolated_pages(unsigned long, unsigned long); > > #ifdef CONFIG_HOTREMOVE > >> +extern bool is_pageblock_removable_async(struct page *page); > > #else > #define is_pageblock_removable_async(p) 0 > #endif > ? Is this function is called even if HOTREMOVE is off ? If so, the caller is buggy. I'll check tomorrow. Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 3/3] memory hotplug: use unified logic for is_removable and offline_pages 2010-09-06 13:30 ` Hiroyuki Kamezawa @ 2010-09-07 13:16 ` Michal Hocko 0 siblings, 0 replies; 18+ messages in thread From: Michal Hocko @ 2010-09-07 13:16 UTC (permalink / raw) To: Hiroyuki Kamezawa Cc: KAMEZAWA Hiroyuki, linux-mm@kvack.org, linux-kernel@vger.kernel.org, fengguang.wu, akpm@linux-foundation.org, Mel Gorman, andi.kleen, Dave Hansen On Mon 06-09-10 22:30:43, Hiroyuki Kamezawa wrote: > 2010/9/6 Michal Hocko <mhocko@suse.cz>: > > On Mon 06-09-10 14:47:16, KAMEZAWA Hiroyuki wrote: [...] > >> Changelog: 2010/09/06 > >> ?- added comments. > >> ?- removed zone->lock. > >> ?- changed the name of the function to be is_pageblock_removable_async(). > >> ? ?because I removed the zone->lock. > > > > wouldn't be __is_pageblock_removable a better name? _async suffix is > > usually used for asynchronous operations and this is just a function > > withtout locks. > > > rename as _is_pagebloc_removable_nolock(). Sounds good as well. [...] > >> +bool is_pageblock_removable_async(struct page *page) > >> +{ > >> + ? ? struct zone *zone = page_zone(page); > >> + ? ? unsigned long flags; > >> + ? ? int num; > >> + ? ? /* Don't take zone->lock interntionally. */ > > > > Could you add the reason? > > Don't take zone-> lock intentionally because we are called from the > > userspace (sysfs interface). > > > I don't like to assume caller context which will limit the callers. > > /* holding zone->lock or not is caller's job. */ Sure, but I think that if you explicitely mention that the lock is not held intentionaly then it would be good to provide some reasonining. > > > > [...] > >> ? ? ? /* All pageblocks in the memory block are likely to be hot-removable */ > >> Index: kametest/include/linux/memory_hotplug.h > >> =================================================================== > >> --- kametest.orig/include/linux/memory_hotplug.h > >> +++ kametest/include/linux/memory_hotplug.h > >> @@ -69,6 +69,7 @@ extern void online_page(struct page *pag > >> ?/* VM interface that may be used by firmware interface */ > >> ?extern int online_pages(unsigned long, unsigned long); > >> ?extern void __offline_isolated_pages(unsigned long, unsigned long); > > > > #ifdef CONFIG_HOTREMOVE > > > >> +extern bool is_pageblock_removable_async(struct page *page); > > > > #else > > #define is_pageblock_removable_async(p) 0 > > #endif > > ? > > Is this function is called even if HOTREMOVE is off ? > If so, the caller is buggy. I'll check tomorrow. It is not, but then it should be defined under CONFIG_HOTREMOVE without #else part, shoudln't it? > > Thanks, > -Kame Thanks! -- Michal Hocko L3 team SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 3/3] memory hotplug: use unified logic for is_removable and offline_pages 2010-09-06 5:47 ` [PATCH 3/3] memory hotplug: use unified logic for is_removable and offline_pages KAMEZAWA Hiroyuki 2010-09-06 9:30 ` Michal Hocko @ 2010-09-06 13:58 ` Mel Gorman 2010-09-06 23:58 ` KAMEZAWA Hiroyuki 1 sibling, 1 reply; 18+ messages in thread From: Mel Gorman @ 2010-09-06 13:58 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Michal Hocko, fengguang.wu, akpm@linux-foundation.org, andi.kleen, Dave Hansen On Mon, Sep 06, 2010 at 02:47:16PM +0900, KAMEZAWA Hiroyuki wrote: > > Now, sysfs interface of memory hotplug shows whether the section is > removable or not. But it checks only migrateype of pages and doesn't > check details of cluster of pages. > This was deliberate at the time. The intention was to avoid an expensive linear page scan where possible. > Next, memory hotplug's set_migratetype_isolate() has the same kind > of check, too. But the migrate-type is just a "hint" and the pageblock > can contain several types of pages if fragmentation is very heavy. > If fragmentation is very heavy on a system that requires memory hot-plug, I'd also be checking the value of min_free_kbytes. If it's low, I suggest an init script runs hugeadm --set-recommended-min_free_kbytes because it'll keep fragmentation-related events to a minimum. The mm_page_alloc_extfrag tracepoint can be used to measure fragmentation events if you want to see the effect of altering min_free_kbytes like this. > To get precise information, we need to check > - the pageblock only contains free pages or LRU pages. > > This patch adds the function __count_unmovable_pages() and makes > above 2 checks to use the same logic. This will improve user experience > of memory hotplug because sysfs interface tells accurate information. > > Note: > it may be better to check MIGRATE_UNMOVABLE for making failure case quick. > > Changelog: 2010/09/06 > - added comments. > - removed zone->lock. > - changed the name of the function to be is_pageblock_removable_async(). > because I removed the zone->lock. > > Reported-by: Michal Hocko <mhocko@suse.cz> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> > --- > include/linux/memory_hotplug.h | 1 > mm/memory_hotplug.c | 15 ------- > mm/page_alloc.c | 77 ++++++++++++++++++++++++++++++----------- > 3 files changed, 60 insertions(+), 33 deletions(-) > > Index: kametest/mm/page_alloc.c > =================================================================== > --- kametest.orig/mm/page_alloc.c > +++ kametest/mm/page_alloc.c > @@ -5274,11 +5274,61 @@ void set_pageblock_flags_group(struct pa > * page allocater never alloc memory from ISOLATE block. > */ > > +static int __count_immobile_pages(struct zone *zone, struct page *page) > +{ This will also count RECLAIMABLE pages belonging to some slab objects. These are potentially hot-removable if slab is shrunk. Your function gives a more accurate count but not necessarily a better user-experience with respect to finding sections to hot-remove. You might like to detect PageSlab pages that belong to a RECLAIMABLE slab and not count these as immobile. > + unsigned long pfn, iter, found; > + /* > + * For avoiding noise data, lru_add_drain_all() should be called > + * If ZONE_MOVABLE, the zone never contains immobile pages > + */ > + if (zone_idx(zone) == ZONE_MOVABLE) > + return 0; > + > + pfn = page_to_pfn(page); > + for (found = 0, iter = 0; iter < pageblock_nr_pages; iter++) { > + unsigned long check = pfn + iter; > + > + if (!pfn_valid_within(check)) { > + iter++; > + continue; > + } > + page = pfn_to_page(check); > + if (!page_count(page)) { > + if (PageBuddy(page)) > + iter += (1 << page_order(page)) - 1; > + continue; > + } > + if (!PageLRU(page)) > + found++; Arguably, you do not care how many pages there are, you just care if there is one truely unmovable page. If you find one of them, then have this function return fail to avoid the rest of the scan. > + /* > + * If the page is not RAM, page_count()should be 0. > + * we don't need more check. This is an _used_ not-movable page. > + * > + * The problematic thing here is PG_reserved pages. PG_reserved > + * is set to both of a memory hole page and a _used_ kernel > + * page at boot. > + */ > + } > + return found; > +} > + > +bool is_pageblock_removable_async(struct page *page) > +{ > + struct zone *zone = page_zone(page); > + unsigned long flags; > + int num; > + /* Don't take zone->lock interntionally. */ intentionally? > + num = __count_immobile_pages(zone, page); > + > + if (num) > + return false; > + return true; > +} > + > int set_migratetype_isolate(struct page *page) > { > struct zone *zone; > - struct page *curr_page; > - unsigned long flags, pfn, iter; > + unsigned long flags, pfn; > unsigned long immobile = 0; > struct memory_isolate_notify arg; > int notifier_ret; > @@ -5289,11 +5339,6 @@ int set_migratetype_isolate(struct page > zone_idx = zone_idx(zone); > > spin_lock_irqsave(&zone->lock, flags); > - if (get_pageblock_migratetype(page) == MIGRATE_MOVABLE || > - zone_idx == ZONE_MOVABLE) { > - ret = 0; > - goto out; > - } > This will result in more scanning and a potentially more expensive memory hot-remove operation. I'm not massively concerned as such because memory hot-remove is not cheap but it's worth mentioning in the changelog that this is a consequence. > pfn = page_to_pfn(page); > arg.start_pfn = pfn; > @@ -5315,19 +5360,13 @@ int set_migratetype_isolate(struct page > notifier_ret = notifier_to_errno(notifier_ret); > if (notifier_ret) > goto out; > + immobile = __count_immobile_pages(zone ,page); > > - for (iter = pfn; iter < (pfn + pageblock_nr_pages); iter++) { > - if (!pfn_valid_within(pfn)) > - continue; > - > - curr_page = pfn_to_page(iter); > - if (!page_count(curr_page) || PageLRU(curr_page)) > - continue; > - > - immobile++; > - } > - > - if (arg.pages_found == immobile) > + /* > + * immobile means "not-on-lru" paes. If immobile is larger than > + * removable-by-driver pages reported by notifier, we'll fail. > + */ > + if (!immobile || arg.pages_found >= immobile) > ret = 0; > Here is where I'd suggest reimplementing __count_immobile_pages as pageblock_any_immobile() that returns true if it detects an immobile page in a given PFN range. > out: > Index: kametest/mm/memory_hotplug.c > =================================================================== > --- kametest.orig/mm/memory_hotplug.c > +++ kametest/mm/memory_hotplug.c > @@ -602,27 +602,14 @@ static struct page *next_active_pagebloc > /* Checks if this range of memory is likely to be hot-removable. */ > int is_mem_section_removable(unsigned long start_pfn, unsigned long nr_pages) > { > - int type; > struct page *page = pfn_to_page(start_pfn); > struct page *end_page = page + nr_pages; > > /* Check the starting page of each pageblock within the range */ > for (; page < end_page; page = next_active_pageblock(page)) { > - type = get_pageblock_migratetype(page); > - > - /* > - * A pageblock containing MOVABLE or free pages is considered > - * removable > - */ > - if (type != MIGRATE_MOVABLE && !pageblock_free(page)) > + if (!is_pageblock_removable_async(page)) > return 0; > > - /* > - * A pageblock starting with a PageReserved page is not > - * considered removable. > - */ > - if (PageReserved(page)) > - return 0; > } Bear in mind that a user or bad application constantly reading the sysfs file potentially causes a lot of cache trashing as a result of the linear scan instead of the pageblock type check. > > /* All pageblocks in the memory block are likely to be hot-removable */ > Index: kametest/include/linux/memory_hotplug.h > =================================================================== > --- kametest.orig/include/linux/memory_hotplug.h > +++ kametest/include/linux/memory_hotplug.h > @@ -69,6 +69,7 @@ extern void online_page(struct page *pag > /* VM interface that may be used by firmware interface */ > extern int online_pages(unsigned long, unsigned long); > extern void __offline_isolated_pages(unsigned long, unsigned long); > +extern bool is_pageblock_removable_async(struct page *page); > > /* reasonably generic interface to expand the physical pages in a zone */ > extern int __add_pages(int nid, struct zone *zone, unsigned long start_pfn, > -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 3/3] memory hotplug: use unified logic for is_removable and offline_pages 2010-09-06 13:58 ` Mel Gorman @ 2010-09-06 23:58 ` KAMEZAWA Hiroyuki 2010-09-07 13:01 ` Mel Gorman 0 siblings, 1 reply; 18+ messages in thread From: KAMEZAWA Hiroyuki @ 2010-09-06 23:58 UTC (permalink / raw) To: Mel Gorman Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Michal Hocko, fengguang.wu, akpm@linux-foundation.org, andi.kleen, Dave Hansen On Mon, 6 Sep 2010 14:58:23 +0100 Mel Gorman <mel@csn.ul.ie> wrote: > On Mon, Sep 06, 2010 at 02:47:16PM +0900, KAMEZAWA Hiroyuki wrote: > > > > Now, sysfs interface of memory hotplug shows whether the section is > > removable or not. But it checks only migrateype of pages and doesn't > > check details of cluster of pages. > > > > This was deliberate at the time. The intention was to avoid an expensive > linear page scan where possible. > > > Next, memory hotplug's set_migratetype_isolate() has the same kind > > of check, too. But the migrate-type is just a "hint" and the pageblock > > can contain several types of pages if fragmentation is very heavy. > > > > If fragmentation is very heavy on a system that requires memory > hot-plug, I'd also be checking the value of min_free_kbytes. If it's > low, I suggest an init script runs > > hugeadm --set-recommended-min_free_kbytes > > because it'll keep fragmentation-related events to a minimum. The > mm_page_alloc_extfrag tracepoint can be used to measure fragmentation > events if you want to see the effect of altering min_free_kbytes like > this. > Hmm, then what should I do in this patch ? > > To get precise information, we need to check > > - the pageblock only contains free pages or LRU pages. > > > > This patch adds the function __count_unmovable_pages() and makes > > above 2 checks to use the same logic. This will improve user experience > > of memory hotplug because sysfs interface tells accurate information. > > > > Note: > > it may be better to check MIGRATE_UNMOVABLE for making failure case quick. > > > > Changelog: 2010/09/06 > > - added comments. > > - removed zone->lock. > > - changed the name of the function to be is_pageblock_removable_async(). > > because I removed the zone->lock. > > > > Reported-by: Michal Hocko <mhocko@suse.cz> > > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> > > --- > > include/linux/memory_hotplug.h | 1 > > mm/memory_hotplug.c | 15 ------- > > mm/page_alloc.c | 77 ++++++++++++++++++++++++++++++----------- > > 3 files changed, 60 insertions(+), 33 deletions(-) > > > > Index: kametest/mm/page_alloc.c > > =================================================================== > > --- kametest.orig/mm/page_alloc.c > > +++ kametest/mm/page_alloc.c > > @@ -5274,11 +5274,61 @@ void set_pageblock_flags_group(struct pa > > * page allocater never alloc memory from ISOLATE block. > > */ > > > > +static int __count_immobile_pages(struct zone *zone, struct page *page) > > +{ > > This will also count RECLAIMABLE pages belonging to some slab objects. > These are potentially hot-removable if slab is shrunk. Your function gives a > more accurate count but not necessarily a better user-experience with respect > to finding sections to hot-remove. You might like to detect PageSlab pages > that belong to a RECLAIMABLE slab and not count these as immobile. > RECLAIMABLE object is not _always_ reclaimable. Should we add "maybe reclaimable" return value ? > > > + unsigned long pfn, iter, found; > > + /* > > + * For avoiding noise data, lru_add_drain_all() should be called > > + * If ZONE_MOVABLE, the zone never contains immobile pages > > + */ > > + if (zone_idx(zone) == ZONE_MOVABLE) > > + return 0; > > + > > + pfn = page_to_pfn(page); > > + for (found = 0, iter = 0; iter < pageblock_nr_pages; iter++) { > > + unsigned long check = pfn + iter; > > + > > + if (!pfn_valid_within(check)) { > > + iter++; > > + continue; > > + } > > + page = pfn_to_page(check); > > + if (!page_count(page)) { > > + if (PageBuddy(page)) > > + iter += (1 << page_order(page)) - 1; > > + continue; > > + } > > + if (!PageLRU(page)) > > + found++; > > Arguably, you do not care how many pages there are, you just care if > there is one truely unmovable page. If you find one of them, then have > this function return fail to avoid the rest of the scan. > This is for IBM guys who added stupid notifier for memory hotplug scanning. Hmm, ok, I'll add some trick. > > + /* > > + * If the page is not RAM, page_count()should be 0. > > + * we don't need more check. This is an _used_ not-movable page. > > + * > > + * The problematic thing here is PG_reserved pages. PG_reserved > > + * is set to both of a memory hole page and a _used_ kernel > > + * page at boot. > > + */ > > + } > > + return found; > > +} > > + > > +bool is_pageblock_removable_async(struct page *page) > > +{ > > + struct zone *zone = page_zone(page); > > + unsigned long flags; > > + int num; > > + /* Don't take zone->lock interntionally. */ > > intentionally? > yes. typo. > > + num = __count_immobile_pages(zone, page); > > + > > + if (num) > > + return false; > > + return true; > > +} > > + > > int set_migratetype_isolate(struct page *page) > > { > > struct zone *zone; > > - struct page *curr_page; > > - unsigned long flags, pfn, iter; > > + unsigned long flags, pfn; > > unsigned long immobile = 0; > > struct memory_isolate_notify arg; > > int notifier_ret; > > @@ -5289,11 +5339,6 @@ int set_migratetype_isolate(struct page > > zone_idx = zone_idx(zone); > > > > spin_lock_irqsave(&zone->lock, flags); > > - if (get_pageblock_migratetype(page) == MIGRATE_MOVABLE || > > - zone_idx == ZONE_MOVABLE) { > > - ret = 0; > > - goto out; > > - } > > > > This will result in more scanning and a potentially more expensive > memory hot-remove operation. I'm not massively concerned as such because > memory hot-remove is not cheap but it's worth mentioning in the > changelog that this is a consequence. > Okay. > > pfn = page_to_pfn(page); > > arg.start_pfn = pfn; > > @@ -5315,19 +5360,13 @@ int set_migratetype_isolate(struct page > > notifier_ret = notifier_to_errno(notifier_ret); > > if (notifier_ret) > > goto out; > > + immobile = __count_immobile_pages(zone ,page); > > > > - for (iter = pfn; iter < (pfn + pageblock_nr_pages); iter++) { > > - if (!pfn_valid_within(pfn)) > > - continue; > > - > > - curr_page = pfn_to_page(iter); > > - if (!page_count(curr_page) || PageLRU(curr_page)) > > - continue; > > - > > - immobile++; > > - } > > - > > - if (arg.pages_found == immobile) > > + /* > > + * immobile means "not-on-lru" paes. If immobile is larger than > > + * removable-by-driver pages reported by notifier, we'll fail. > > + */ > > + if (!immobile || arg.pages_found >= immobile) > > ret = 0; > > > > Here is where I'd suggest reimplementing __count_immobile_pages as > pageblock_any_immobile() that returns true if it detects an immobile page > in a given PFN range. > How about __count_immobile_pages(page, count); returns true if count >= immobile. Maybe we can add extention whether we allow RECLAIMABLE pages in the range or not. > > out: > > Index: kametest/mm/memory_hotplug.c > > =================================================================== > > --- kametest.orig/mm/memory_hotplug.c > > +++ kametest/mm/memory_hotplug.c > > @@ -602,27 +602,14 @@ static struct page *next_active_pagebloc > > /* Checks if this range of memory is likely to be hot-removable. */ > > int is_mem_section_removable(unsigned long start_pfn, unsigned long nr_pages) > > { > > - int type; > > struct page *page = pfn_to_page(start_pfn); > > struct page *end_page = page + nr_pages; > > > > /* Check the starting page of each pageblock within the range */ > > for (; page < end_page; page = next_active_pageblock(page)) { > > - type = get_pageblock_migratetype(page); > > - > > - /* > > - * A pageblock containing MOVABLE or free pages is considered > > - * removable > > - */ > > - if (type != MIGRATE_MOVABLE && !pageblock_free(page)) > > + if (!is_pageblock_removable_async(page)) > > return 0; > > > > - /* > > - * A pageblock starting with a PageReserved page is not > > - * considered removable. > > - */ > > - if (PageReserved(page)) > > - return 0; > > } > > Bear in mind that a user or bad application constantly reading the sysfs > file potentially causes a lot of cache trashing as a result of the > linear scan instead of the pageblock type check. > yes, hmm. I think there are 3 levels of checking. 1. very strict check .... is_removable=true if the range contains only LRU and Free. 2. possibility check .... is_removable=true if the range contains LRU, Free, Slab(reclaimable) 3. fuzzy check .... is_removable=true if type = MIGRATABLE. (but very quick.) I wonder I should add sysctl (or some memory hotplug control file) and make "fuzzy check" as default, which is the same to old behavior. Any idea ? Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 3/3] memory hotplug: use unified logic for is_removable and offline_pages 2010-09-06 23:58 ` KAMEZAWA Hiroyuki @ 2010-09-07 13:01 ` Mel Gorman 0 siblings, 0 replies; 18+ messages in thread From: Mel Gorman @ 2010-09-07 13:01 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Michal Hocko, fengguang.wu, akpm@linux-foundation.org, andi.kleen, Dave Hansen On Tue, Sep 07, 2010 at 08:58:36AM +0900, KAMEZAWA Hiroyuki wrote: > On Mon, 6 Sep 2010 14:58:23 +0100 > Mel Gorman <mel@csn.ul.ie> wrote: > > > On Mon, Sep 06, 2010 at 02:47:16PM +0900, KAMEZAWA Hiroyuki wrote: > > > > > > Now, sysfs interface of memory hotplug shows whether the section is > > > removable or not. But it checks only migrateype of pages and doesn't > > > check details of cluster of pages. > > > > > > > This was deliberate at the time. The intention was to avoid an expensive > > linear page scan where possible. > > > > > > Next, memory hotplug's set_migratetype_isolate() has the same kind > > > of check, too. But the migrate-type is just a "hint" and the pageblock > > > can contain several types of pages if fragmentation is very heavy. > > > > > > > If fragmentation is very heavy on a system that requires memory > > hot-plug, I'd also be checking the value of min_free_kbytes. If it's > > low, I suggest an init script runs > > > > hugeadm --set-recommended-min_free_kbytes > > > > because it'll keep fragmentation-related events to a minimum. The > > mm_page_alloc_extfrag tracepoint can be used to measure fragmentation > > events if you want to see the effect of altering min_free_kbytes like > > this. > > > > Hmm, then what should I do in this patch ? > Nothing, it's simply an observation that maybe you would like to pass on to heavy users of memory hot-remove. > > > To get precise information, we need to check > > > - the pageblock only contains free pages or LRU pages. > > > > > > This patch adds the function __count_unmovable_pages() and makes > > > above 2 checks to use the same logic. This will improve user experience > > > of memory hotplug because sysfs interface tells accurate information. > > > > > > Note: > > > it may be better to check MIGRATE_UNMOVABLE for making failure case quick. > > > > > > Changelog: 2010/09/06 > > > - added comments. > > > - removed zone->lock. > > > - changed the name of the function to be is_pageblock_removable_async(). > > > because I removed the zone->lock. > > > > > > Reported-by: Michal Hocko <mhocko@suse.cz> > > > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> > > > --- > > > include/linux/memory_hotplug.h | 1 > > > mm/memory_hotplug.c | 15 ------- > > > mm/page_alloc.c | 77 ++++++++++++++++++++++++++++++----------- > > > 3 files changed, 60 insertions(+), 33 deletions(-) > > > > > > Index: kametest/mm/page_alloc.c > > > =================================================================== > > > --- kametest.orig/mm/page_alloc.c > > > +++ kametest/mm/page_alloc.c > > > @@ -5274,11 +5274,61 @@ void set_pageblock_flags_group(struct pa > > > * page allocater never alloc memory from ISOLATE block. > > > */ > > > > > > +static int __count_immobile_pages(struct zone *zone, struct page *page) > > > +{ > > > > This will also count RECLAIMABLE pages belonging to some slab objects. > > > These are potentially hot-removable if slab is shrunk. Your function gives a > > more accurate count but not necessarily a better user-experience with respect > > to finding sections to hot-remove. You might like to detect PageSlab pages > > that belong to a RECLAIMABLE slab and not count these as immobile. > > > > RECLAIMABLE object is not _always_ reclaimable. > Should we add "maybe reclaimable" return value ? > Your call but it would seem reasonable. Saying they are certainly not reclaimable just seems a bit aggressive. > > > > > > + unsigned long pfn, iter, found; > > > + /* > > > + * For avoiding noise data, lru_add_drain_all() should be called > > > + * If ZONE_MOVABLE, the zone never contains immobile pages > > > + */ > > > + if (zone_idx(zone) == ZONE_MOVABLE) > > > + return 0; > > > + > > > + pfn = page_to_pfn(page); > > > + for (found = 0, iter = 0; iter < pageblock_nr_pages; iter++) { > > > + unsigned long check = pfn + iter; > > > + > > > + if (!pfn_valid_within(check)) { > > > + iter++; > > > + continue; > > > + } > > > + page = pfn_to_page(check); > > > + if (!page_count(page)) { > > > + if (PageBuddy(page)) > > > + iter += (1 << page_order(page)) - 1; > > > + continue; > > > + } > > > + if (!PageLRU(page)) > > > + found++; > > > > Arguably, you do not care how many pages there are, you just care if > > there is one truely unmovable page. If you find one of them, then have > > this function return fail to avoid the rest of the scan. > > > > This is for IBM guys who added stupid notifier for memory hotplug scanning. I wasn't aware of it, my bad. > > > <SNIP> > > > pfn = page_to_pfn(page); > > > arg.start_pfn = pfn; > > > @@ -5315,19 +5360,13 @@ int set_migratetype_isolate(struct page > > > notifier_ret = notifier_to_errno(notifier_ret); > > > if (notifier_ret) > > > goto out; > > > + immobile = __count_immobile_pages(zone ,page); > > > > > > - for (iter = pfn; iter < (pfn + pageblock_nr_pages); iter++) { > > > - if (!pfn_valid_within(pfn)) > > > - continue; > > > - > > > - curr_page = pfn_to_page(iter); > > > - if (!page_count(curr_page) || PageLRU(curr_page)) > > > - continue; > > > - > > > - immobile++; > > > - } > > > - > > > - if (arg.pages_found == immobile) > > > + /* > > > + * immobile means "not-on-lru" paes. If immobile is larger than > > > + * removable-by-driver pages reported by notifier, we'll fail. > > > + */ > > > + if (!immobile || arg.pages_found >= immobile) > > > ret = 0; > > > > > > > Here is where I'd suggest reimplementing __count_immobile_pages as > > pageblock_any_immobile() that returns true if it detects an immobile page > > in a given PFN range. > > > > How about > > __count_immobile_pages(page, count); > returns true if count >= immobile. > If it's returning a boolean, it's no longer a count so the name is misleading. I think it would be clearer as well to deal with PFN ranges instead of page -> page+count but maybe that's just me. > Maybe we can add extention whether we allow RECLAIMABLE pages in the range or not. > Return an enum for each of unmovable, movable_hard, movable_easy maybe where detecting any RECLAIMABLE page make the pageblock "hard to move". > > > > out: > > > Index: kametest/mm/memory_hotplug.c > > > =================================================================== > > > --- kametest.orig/mm/memory_hotplug.c > > > +++ kametest/mm/memory_hotplug.c > > > @@ -602,27 +602,14 @@ static struct page *next_active_pagebloc > > > /* Checks if this range of memory is likely to be hot-removable. */ > > > int is_mem_section_removable(unsigned long start_pfn, unsigned long nr_pages) > > > { > > > - int type; > > > struct page *page = pfn_to_page(start_pfn); > > > struct page *end_page = page + nr_pages; > > > > > > /* Check the starting page of each pageblock within the range */ > > > for (; page < end_page; page = next_active_pageblock(page)) { > > > - type = get_pageblock_migratetype(page); > > > - > > > - /* > > > - * A pageblock containing MOVABLE or free pages is considered > > > - * removable > > > - */ > > > - if (type != MIGRATE_MOVABLE && !pageblock_free(page)) > > > + if (!is_pageblock_removable_async(page)) > > > return 0; > > > > > > - /* > > > - * A pageblock starting with a PageReserved page is not > > > - * considered removable. > > > - */ > > > - if (PageReserved(page)) > > > - return 0; > > > } > > > > Bear in mind that a user or bad application constantly reading the sysfs > > file potentially causes a lot of cache trashing as a result of the > > linear scan instead of the pageblock type check. > > > yes, hmm. > > I think there are 3 levels of checking. > > 1. very strict check .... is_removable=true if the range contains only LRU and Free. > 2. possibility check .... is_removable=true if the range contains LRU, Free, Slab(reclaimable) > 3. fuzzy check .... is_removable=true if type = MIGRATABLE. (but very quick.) > > I wonder I should add sysctl (or some memory hotplug control file) and make > "fuzzy check" as default, which is the same to old behavior. > A sysctl would be racy as would be a hotplug control file. Maybe three sysfs files for the different levels of checking but allow ordinary users to only do the "fuzzy" check? The downside is that it'd require application modification to use the three sysfs files. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 0/3] memory hotplug: updates and bugfix for is_removable v3 2010-09-06 5:40 [PATCH 0/3] memory hotplug: updates and bugfix for is_removable KAMEZAWA Hiroyuki ` (2 preceding siblings ...) 2010-09-06 5:47 ` [PATCH 3/3] memory hotplug: use unified logic for is_removable and offline_pages KAMEZAWA Hiroyuki @ 2010-09-07 1:28 ` KAMEZAWA Hiroyuki 2010-09-07 1:32 ` [PATCH 1/3] [BUGFIX] memory hotplug: fix next block calculation in is_removable KAMEZAWA Hiroyuki ` (2 more replies) 3 siblings, 3 replies; 18+ messages in thread From: KAMEZAWA Hiroyuki @ 2010-09-07 1:28 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Michal Hocko, fengguang.wu, akpm@linux-foundation.org, Mel Gorman, andi.kleen, Dave Hansen Thank you for review and comments. totally updated. Problem: /sys/devices/system/memory/memoryX/removable file shows whether the section can be offlined or not. Returns "1" if it seems removable. Now, the file uses a similar logic to one offline_pages() uses. Then, it's better to use unified logic. The biggest change from previous one is this patch just unifies is_removable() and offline code's logic. No functional change in offline() code. (is_removable code is updated to be better.) Brief patch description: 1. bugfix for is_removable() check. I think this should be back ported. (updated) 2. bugfix for callback at counting immobile pages. (no change) 3. the unified new logic for is_remobable. (updated) Because I'm moving house, my response may be delayd. Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 1/3] [BUGFIX] memory hotplug: fix next block calculation in is_removable 2010-09-07 1:28 ` [PATCH 0/3] memory hotplug: updates and bugfix for is_removable v3 KAMEZAWA Hiroyuki @ 2010-09-07 1:32 ` KAMEZAWA Hiroyuki 2010-09-08 11:23 ` Mel Gorman 2010-09-07 1:34 ` [PATCH 2/3][BUGFIX] memory hotplug: fix notifier's return value check KAMEZAWA Hiroyuki 2010-09-07 1:36 ` [PATCH 3/3] memory hotplug: unify is_removable and offline detection code KAMEZAWA Hiroyuki 2 siblings, 1 reply; 18+ messages in thread From: KAMEZAWA Hiroyuki @ 2010-09-07 1:32 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Michal Hocko, fengguang.wu, akpm@linux-foundation.org, Mel Gorman, andi.kleen, Dave Hansen From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> next_active_pageblock() is for finding next _used_ freeblock. It skips several blocks when it finds there are a chunk of free pages lager than pageblock. But it has 2 bugs. 1. We have no lock. page_order(page) - pageblock_order can be minus. 2. pageblocks_stride += is wrong. it should skip page_order(p) of pages. Changelog: 2010/09/07 - fix range check of order returned by page_order(). Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> --- mm/memory_hotplug.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) Index: kametest/mm/memory_hotplug.c =================================================================== --- kametest.orig/mm/memory_hotplug.c +++ kametest/mm/memory_hotplug.c @@ -584,19 +584,19 @@ static inline int pageblock_free(struct /* Return the start of the next active pageblock after a given page */ static struct page *next_active_pageblock(struct page *page) { - int pageblocks_stride; - /* Ensure the starting page is pageblock-aligned */ BUG_ON(page_to_pfn(page) & (pageblock_nr_pages - 1)); - /* Move forward by at least 1 * pageblock_nr_pages */ - pageblocks_stride = 1; - /* If the entire pageblock is free, move to the end of free page */ - if (pageblock_free(page)) - pageblocks_stride += page_order(page) - pageblock_order; + if (pageblock_free(page)) { + int order; + /* be careful. we don't have locks, page_order can be changed.*/ + order = page_order(page); + if ((order < MAX_ORDER) && (order >= pageblock_order)) + return page + (1 << order); + } - return page + (pageblocks_stride * pageblock_nr_pages); + return page + pageblock_nr_pages; } /* Checks if this range of memory is likely to be hot-removable. */ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/3] [BUGFIX] memory hotplug: fix next block calculation in is_removable 2010-09-07 1:32 ` [PATCH 1/3] [BUGFIX] memory hotplug: fix next block calculation in is_removable KAMEZAWA Hiroyuki @ 2010-09-08 11:23 ` Mel Gorman 0 siblings, 0 replies; 18+ messages in thread From: Mel Gorman @ 2010-09-08 11:23 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Michal Hocko, fengguang.wu, akpm@linux-foundation.org, andi.kleen, Dave Hansen On Tue, Sep 07, 2010 at 10:32:44AM +0900, KAMEZAWA Hiroyuki wrote: > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> > > next_active_pageblock() is for finding next _used_ freeblock. It skips > several blocks when it finds there are a chunk of free pages lager than > pageblock. But it has 2 bugs. > > 1. We have no lock. page_order(page) - pageblock_order can be minus. > 2. pageblocks_stride += is wrong. it should skip page_order(p) of pages. > > Changelog: 2010/09/07 > - fix range check of order returned by page_order(). > > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Mel Gorman <mel@csn.ul.ie> -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 2/3][BUGFIX] memory hotplug: fix notifier's return value check 2010-09-07 1:28 ` [PATCH 0/3] memory hotplug: updates and bugfix for is_removable v3 KAMEZAWA Hiroyuki 2010-09-07 1:32 ` [PATCH 1/3] [BUGFIX] memory hotplug: fix next block calculation in is_removable KAMEZAWA Hiroyuki @ 2010-09-07 1:34 ` KAMEZAWA Hiroyuki 2010-09-07 1:36 ` [PATCH 3/3] memory hotplug: unify is_removable and offline detection code KAMEZAWA Hiroyuki 2 siblings, 0 replies; 18+ messages in thread From: KAMEZAWA Hiroyuki @ 2010-09-07 1:34 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Michal Hocko, fengguang.wu, akpm@linux-foundation.org, Mel Gorman, andi.kleen, Dave Hansen From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Even if notifier cannot find any pages, it doesn't mean no pages are available...And, if there are no notifiers registered, this condition will be always true and memory hotplug will show -EBUSY. Clarification:This is a bug but not critical In most case, a pageblock which will be offlined is MIGRATE_MOVABLE This "notifier" is called only when the pageblock is _not_ MIGRATE_MOVABLE. But if not MIGRATE_MOVABLE, it's common case that memory hotplug will fail. So, no one notice this bug. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> --- mm/page_alloc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Index: kametest/mm/page_alloc.c =================================================================== --- kametest.orig/mm/page_alloc.c +++ kametest/mm/page_alloc.c @@ -5313,7 +5313,7 @@ int set_migratetype_isolate(struct page */ notifier_ret = memory_isolate_notify(MEM_ISOLATE_COUNT, &arg); notifier_ret = notifier_to_errno(notifier_ret); - if (notifier_ret || !arg.pages_found) + if (notifier_ret) goto out; for (iter = pfn; iter < (pfn + pageblock_nr_pages); iter++) { -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 3/3] memory hotplug: unify is_removable and offline detection code 2010-09-07 1:28 ` [PATCH 0/3] memory hotplug: updates and bugfix for is_removable v3 KAMEZAWA Hiroyuki 2010-09-07 1:32 ` [PATCH 1/3] [BUGFIX] memory hotplug: fix next block calculation in is_removable KAMEZAWA Hiroyuki 2010-09-07 1:34 ` [PATCH 2/3][BUGFIX] memory hotplug: fix notifier's return value check KAMEZAWA Hiroyuki @ 2010-09-07 1:36 ` KAMEZAWA Hiroyuki 2 siblings, 0 replies; 18+ messages in thread From: KAMEZAWA Hiroyuki @ 2010-09-07 1:36 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Michal Hocko, fengguang.wu, akpm@linux-foundation.org, Mel Gorman, andi.kleen, Dave Hansen Now, sysfs interface of memory hotplug shows whether the section is removable or not. But it checks only migrateype of pages and doesn't check details of cluster of pages. Next, memory hotplug's set_migratetype_isolate() has the same kind of check, too. This patch adds the function __count_unmovable_pages() and makes above 2 checks to use the same logic. Then, is_removable and hotremove code uses the same logic. No changes in the hotremove logic itself. TODO: need to find a way to check RECLAMABLE. But, considering bit, calling shrink_slab() against a range before starting memory hotremove sounds better. If so, this patch's logic doesn't need to be changed. Changelog 2010.09.07: - modified __count_movable_pages() to get "count" as arguments. - renamed is_pageblock_removable_async as is_pageblock_removable_nolock. - added cond_resched(). - no changes in offline logic itself. - added #ifdef Reported-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> --- include/linux/memory_hotplug.h | 4 + mm/memory_hotplug.c | 17 -------- mm/page_alloc.c | 87 +++++++++++++++++++++++++++++++---------- 3 files changed, 72 insertions(+), 36 deletions(-) Index: kametest/mm/page_alloc.c =================================================================== --- kametest.orig/mm/page_alloc.c +++ kametest/mm/page_alloc.c @@ -5274,12 +5274,65 @@ void set_pageblock_flags_group(struct pa * page allocater never alloc memory from ISOLATE block. */ +static int +__count_immobile_pages(struct zone *zone, struct page *page, int count) +{ + unsigned long pfn, iter, found; + /* + * For avoiding noise data, lru_add_drain_all() should be called + * If ZONE_MOVABLE, the zone never contains immobile pages + */ + if (zone_idx(zone) == ZONE_MOVABLE) + return true; + + if (get_pageblock_migratetype(page) == MIGRATE_MOVABLE) + return true; + + pfn = page_to_pfn(page); + for (found = 0, iter = 0; iter < pageblock_nr_pages; iter++) { + unsigned long check = pfn + iter; + + if (!pfn_valid_within(check)) { + iter++; + continue; + } + page = pfn_to_page(check); + if (!page_count(page)) { + if (PageBuddy(page)) + iter += (1 << page_order(page)) - 1; + continue; + } + if (!PageLRU(page)) + found++; + /* + * If there are RECLAIMABLE pages, we need to check it. + * But now, memory offline itself doesn't call shrink_slab() + * and it still to be fixed. + */ + /* + * If the page is not RAM, page_count()should be 0. + * we don't need more check. This is an _used_ not-movable page. + * + * The problematic thing here is PG_reserved pages. PG_reserved + * is set to both of a memory hole page and a _used_ kernel + * page at boot. + */ + if (found > count) + return false; + } + return true; +} + +bool is_pageblock_removable_nolock(struct page *page) +{ + struct zone *zone = page_zone(page); + return __count_immobile_pages(zone, page, 0); +} + int set_migratetype_isolate(struct page *page) { struct zone *zone; - struct page *curr_page; - unsigned long flags, pfn, iter; - unsigned long immobile = 0; + unsigned long flags, pfn; struct memory_isolate_notify arg; int notifier_ret; int ret = -EBUSY; @@ -5289,11 +5342,6 @@ int set_migratetype_isolate(struct page zone_idx = zone_idx(zone); spin_lock_irqsave(&zone->lock, flags); - if (get_pageblock_migratetype(page) == MIGRATE_MOVABLE || - zone_idx == ZONE_MOVABLE) { - ret = 0; - goto out; - } pfn = page_to_pfn(page); arg.start_pfn = pfn; @@ -5315,21 +5363,18 @@ int set_migratetype_isolate(struct page notifier_ret = notifier_to_errno(notifier_ret); if (notifier_ret) goto out; - - for (iter = pfn; iter < (pfn + pageblock_nr_pages); iter++) { - if (!pfn_valid_within(pfn)) - continue; - - curr_page = pfn_to_page(iter); - if (!page_count(curr_page) || PageLRU(curr_page)) - continue; - - immobile++; - } - - if (arg.pages_found == immobile) + /* + * FIXME: Now, memory hotplug doesn't call shrink_slab() by itself. + * We just check MOVABLE pages. + */ + if(__count_immobile_pages(zone ,page, arg.pages_found)) ret = 0; + /* + * immobile means "not-on-lru" paes. If immobile is larger than + * removable-by-driver pages reported by notifier, we'll fail. + */ + out: if (!ret) { set_pageblock_migratetype(page, MIGRATE_ISOLATE); Index: kametest/mm/memory_hotplug.c =================================================================== --- kametest.orig/mm/memory_hotplug.c +++ kametest/mm/memory_hotplug.c @@ -602,27 +602,14 @@ static struct page *next_active_pagebloc /* Checks if this range of memory is likely to be hot-removable. */ int is_mem_section_removable(unsigned long start_pfn, unsigned long nr_pages) { - int type; struct page *page = pfn_to_page(start_pfn); struct page *end_page = page + nr_pages; /* Check the starting page of each pageblock within the range */ for (; page < end_page; page = next_active_pageblock(page)) { - type = get_pageblock_migratetype(page); - - /* - * A pageblock containing MOVABLE or free pages is considered - * removable - */ - if (type != MIGRATE_MOVABLE && !pageblock_free(page)) - return 0; - - /* - * A pageblock starting with a PageReserved page is not - * considered removable. - */ - if (PageReserved(page)) + if (!is_pageblock_removable_nolock(page)) return 0; + cond_resched(); } /* All pageblocks in the memory block are likely to be hot-removable */ Index: kametest/include/linux/memory_hotplug.h =================================================================== --- kametest.orig/include/linux/memory_hotplug.h +++ kametest/include/linux/memory_hotplug.h @@ -70,6 +70,10 @@ extern void online_page(struct page *pag extern int online_pages(unsigned long, unsigned long); extern void __offline_isolated_pages(unsigned long, unsigned long); +#ifdef CONFIG_MEMORY_HOTREMOVE +extern bool is_pageblock_removable_nolock(struct page *page); +#endif /* CONFIG_MEMORY_HOTREMOVE */ + /* reasonably generic interface to expand the physical pages in a zone */ extern int __add_pages(int nid, struct zone *zone, unsigned long start_pfn, unsigned long nr_pages); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2010-09-08 11:23 UTC | newest] Thread overview: 18+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-09-06 5:40 [PATCH 0/3] memory hotplug: updates and bugfix for is_removable KAMEZAWA Hiroyuki 2010-09-06 5:42 ` [BUGFIX][PATCH 1/3] memory hotplug: fix next block calculation in is_removable KAMEZAWA Hiroyuki 2010-09-06 13:39 ` Mel Gorman 2010-09-06 17:15 ` Hiroyuki Kamezawa 2010-09-07 9:30 ` Mel Gorman 2010-09-06 5:44 ` [PATCH 2/3] memory hotplug: fix set_migratetype_isolate wrong callback result check KAMEZAWA Hiroyuki 2010-09-06 5:47 ` [PATCH 3/3] memory hotplug: use unified logic for is_removable and offline_pages KAMEZAWA Hiroyuki 2010-09-06 9:30 ` Michal Hocko 2010-09-06 13:30 ` Hiroyuki Kamezawa 2010-09-07 13:16 ` Michal Hocko 2010-09-06 13:58 ` Mel Gorman 2010-09-06 23:58 ` KAMEZAWA Hiroyuki 2010-09-07 13:01 ` Mel Gorman 2010-09-07 1:28 ` [PATCH 0/3] memory hotplug: updates and bugfix for is_removable v3 KAMEZAWA Hiroyuki 2010-09-07 1:32 ` [PATCH 1/3] [BUGFIX] memory hotplug: fix next block calculation in is_removable KAMEZAWA Hiroyuki 2010-09-08 11:23 ` Mel Gorman 2010-09-07 1:34 ` [PATCH 2/3][BUGFIX] memory hotplug: fix notifier's return value check KAMEZAWA Hiroyuki 2010-09-07 1:36 ` [PATCH 3/3] memory hotplug: unify is_removable and offline detection code KAMEZAWA Hiroyuki
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).