* [PATCH 0/3] memory hotplug: updates and bugfix for is_removable
@ 2010-09-06 5:40 KAMEZAWA Hiroyuki
2010-09-06 5:42 ` [BUGFIX][PATCH 1/3] memory hotplug: fix next block calculation in is_removable KAMEZAWA Hiroyuki
` (3 more replies)
0 siblings, 4 replies; 18+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-09-06 5:40 UTC (permalink / raw)
To: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org, Michal Hocko, fengguang.wu,
akpm@linux-foundation.org, Mel Gorman, andi.kleen, Dave Hansen
Problem:
/sys/devices/system/memory/memoryX/removable file shows whether the section
can be offlined or not. Returns "1" if it seems removable.
Now, the file uses a similar logic to one offline_pages() uses.
Problem here is.
- removable detection logics of is_removable() and offline_pages() is
different from each other.
- The logic, which check MIGRATE_TYPE, tend to be incorrect once fragmented.
MIGRATE_TYPE of a pageblock is just a hint, no guarantee.
Then, this patch set does.
- use the same logic between is_removable() and offline_pages().
- don't use MIGRATE_TYPE, check the memmap itself directly rather than hint.
Brief patch description:
1. bugfix for is_removable() check. I think this should be back ported.
2. bugfix for callback at counting immobile pages.
I think the old logic rarely hits this bug..so, not necessary to backport.
3. the unified new logic for is_remobable.
Only patch1 is CCed to stable for now and the patch series itself is onto
mmotm-08-27.
Thanks,
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 18+ messages in thread
* [BUGFIX][PATCH 1/3] memory hotplug: fix next block calculation in is_removable
2010-09-06 5:40 [PATCH 0/3] memory hotplug: updates and bugfix for is_removable KAMEZAWA Hiroyuki
@ 2010-09-06 5:42 ` KAMEZAWA Hiroyuki
2010-09-06 13:39 ` Mel Gorman
2010-09-06 5:44 ` [PATCH 2/3] memory hotplug: fix set_migratetype_isolate wrong callback result check KAMEZAWA Hiroyuki
` (2 subsequent siblings)
3 siblings, 1 reply; 18+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-09-06 5:42 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Michal Hocko,
fengguang.wu, akpm@linux-foundation.org, Mel Gorman, andi.kleen,
Dave Hansen, stable
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
next_active_pageblock() is for finding next _used_ freeblock. It skips
several blocks when it finds there are a chunk of free pages lager than
pageblock. But it has 2 bugs.
1. We have no lock. page_order(page) - pageblock_order can be minus.
2. pageblocks_stride += is wrong. it should skip page_order(p) of pages.
CC: stable@kernel.org
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
mm/memory_hotplug.c | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)
Index: kametest/mm/memory_hotplug.c
===================================================================
--- kametest.orig/mm/memory_hotplug.c
+++ kametest/mm/memory_hotplug.c
@@ -584,19 +584,19 @@ static inline int pageblock_free(struct
/* Return the start of the next active pageblock after a given page */
static struct page *next_active_pageblock(struct page *page)
{
- int pageblocks_stride;
-
/* Ensure the starting page is pageblock-aligned */
BUG_ON(page_to_pfn(page) & (pageblock_nr_pages - 1));
- /* Move forward by at least 1 * pageblock_nr_pages */
- pageblocks_stride = 1;
-
/* If the entire pageblock is free, move to the end of free page */
- if (pageblock_free(page))
- pageblocks_stride += page_order(page) - pageblock_order;
+ if (pageblock_free(page)) {
+ int order;
+ /* be careful. we don't have locks, page_order can be changed.*/
+ order = page_order(page);
+ if (order > pageblock_order)
+ return page + (1 << order);
+ }
- return page + (pageblocks_stride * pageblock_nr_pages);
+ return page + pageblock_nr_pages;
}
/* Checks if this range of memory is likely to be hot-removable. */
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 2/3] memory hotplug: fix set_migratetype_isolate wrong callback result check
2010-09-06 5:40 [PATCH 0/3] memory hotplug: updates and bugfix for is_removable KAMEZAWA Hiroyuki
2010-09-06 5:42 ` [BUGFIX][PATCH 1/3] memory hotplug: fix next block calculation in is_removable KAMEZAWA Hiroyuki
@ 2010-09-06 5:44 ` KAMEZAWA Hiroyuki
2010-09-06 5:47 ` [PATCH 3/3] memory hotplug: use unified logic for is_removable and offline_pages KAMEZAWA Hiroyuki
2010-09-07 1:28 ` [PATCH 0/3] memory hotplug: updates and bugfix for is_removable v3 KAMEZAWA Hiroyuki
3 siblings, 0 replies; 18+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-09-06 5:44 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Michal Hocko,
fengguang.wu, akpm@linux-foundation.org, Mel Gorman, andi.kleen,
Dave Hansen
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Even if notifier cannot find any pages, it doesn't mean
no pages are available...And, if there are no notifiers registered,
this condition will be always true and memory hotplug will show -EBUSY.
Clarification:This is a bug but not critical
In most case, a pageblock which will be offlined is MIGRATE_MOVABLE
This "notifier" is called only when the pageblock is _not_ MIGRATE_MOVABLE.
If not MIGRATE_MOVABLE, it's common case that memory hotplug will fail.
So, no one notice this bug.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
mm/page_alloc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
Index: kametest/mm/page_alloc.c
===================================================================
--- kametest.orig/mm/page_alloc.c
+++ kametest/mm/page_alloc.c
@@ -5313,7 +5313,7 @@ int set_migratetype_isolate(struct page
*/
notifier_ret = memory_isolate_notify(MEM_ISOLATE_COUNT, &arg);
notifier_ret = notifier_to_errno(notifier_ret);
- if (notifier_ret || !arg.pages_found)
+ if (notifier_ret)
goto out;
for (iter = pfn; iter < (pfn + pageblock_nr_pages); iter++) {
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 3/3] memory hotplug: use unified logic for is_removable and offline_pages
2010-09-06 5:40 [PATCH 0/3] memory hotplug: updates and bugfix for is_removable KAMEZAWA Hiroyuki
2010-09-06 5:42 ` [BUGFIX][PATCH 1/3] memory hotplug: fix next block calculation in is_removable KAMEZAWA Hiroyuki
2010-09-06 5:44 ` [PATCH 2/3] memory hotplug: fix set_migratetype_isolate wrong callback result check KAMEZAWA Hiroyuki
@ 2010-09-06 5:47 ` KAMEZAWA Hiroyuki
2010-09-06 9:30 ` Michal Hocko
2010-09-06 13:58 ` Mel Gorman
2010-09-07 1:28 ` [PATCH 0/3] memory hotplug: updates and bugfix for is_removable v3 KAMEZAWA Hiroyuki
3 siblings, 2 replies; 18+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-09-06 5:47 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Michal Hocko,
fengguang.wu, akpm@linux-foundation.org, Mel Gorman, andi.kleen,
Dave Hansen
Now, sysfs interface of memory hotplug shows whether the section is
removable or not. But it checks only migrateype of pages and doesn't
check details of cluster of pages.
Next, memory hotplug's set_migratetype_isolate() has the same kind
of check, too. But the migrate-type is just a "hint" and the pageblock
can contain several types of pages if fragmentation is very heavy.
To get precise information, we need to check
- the pageblock only contains free pages or LRU pages.
This patch adds the function __count_unmovable_pages() and makes
above 2 checks to use the same logic. This will improve user experience
of memory hotplug because sysfs interface tells accurate information.
Note:
it may be better to check MIGRATE_UNMOVABLE for making failure case quick.
Changelog: 2010/09/06
- added comments.
- removed zone->lock.
- changed the name of the function to be is_pageblock_removable_async().
because I removed the zone->lock.
Reported-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
include/linux/memory_hotplug.h | 1
mm/memory_hotplug.c | 15 -------
mm/page_alloc.c | 77 ++++++++++++++++++++++++++++++-----------
3 files changed, 60 insertions(+), 33 deletions(-)
Index: kametest/mm/page_alloc.c
===================================================================
--- kametest.orig/mm/page_alloc.c
+++ kametest/mm/page_alloc.c
@@ -5274,11 +5274,61 @@ void set_pageblock_flags_group(struct pa
* page allocater never alloc memory from ISOLATE block.
*/
+static int __count_immobile_pages(struct zone *zone, struct page *page)
+{
+ unsigned long pfn, iter, found;
+ /*
+ * For avoiding noise data, lru_add_drain_all() should be called
+ * If ZONE_MOVABLE, the zone never contains immobile pages
+ */
+ if (zone_idx(zone) == ZONE_MOVABLE)
+ return 0;
+
+ pfn = page_to_pfn(page);
+ for (found = 0, iter = 0; iter < pageblock_nr_pages; iter++) {
+ unsigned long check = pfn + iter;
+
+ if (!pfn_valid_within(check)) {
+ iter++;
+ continue;
+ }
+ page = pfn_to_page(check);
+ if (!page_count(page)) {
+ if (PageBuddy(page))
+ iter += (1 << page_order(page)) - 1;
+ continue;
+ }
+ if (!PageLRU(page))
+ found++;
+ /*
+ * If the page is not RAM, page_count()should be 0.
+ * we don't need more check. This is an _used_ not-movable page.
+ *
+ * The problematic thing here is PG_reserved pages. PG_reserved
+ * is set to both of a memory hole page and a _used_ kernel
+ * page at boot.
+ */
+ }
+ return found;
+}
+
+bool is_pageblock_removable_async(struct page *page)
+{
+ struct zone *zone = page_zone(page);
+ unsigned long flags;
+ int num;
+ /* Don't take zone->lock interntionally. */
+ num = __count_immobile_pages(zone, page);
+
+ if (num)
+ return false;
+ return true;
+}
+
int set_migratetype_isolate(struct page *page)
{
struct zone *zone;
- struct page *curr_page;
- unsigned long flags, pfn, iter;
+ unsigned long flags, pfn;
unsigned long immobile = 0;
struct memory_isolate_notify arg;
int notifier_ret;
@@ -5289,11 +5339,6 @@ int set_migratetype_isolate(struct page
zone_idx = zone_idx(zone);
spin_lock_irqsave(&zone->lock, flags);
- if (get_pageblock_migratetype(page) == MIGRATE_MOVABLE ||
- zone_idx == ZONE_MOVABLE) {
- ret = 0;
- goto out;
- }
pfn = page_to_pfn(page);
arg.start_pfn = pfn;
@@ -5315,19 +5360,13 @@ int set_migratetype_isolate(struct page
notifier_ret = notifier_to_errno(notifier_ret);
if (notifier_ret)
goto out;
+ immobile = __count_immobile_pages(zone ,page);
- for (iter = pfn; iter < (pfn + pageblock_nr_pages); iter++) {
- if (!pfn_valid_within(pfn))
- continue;
-
- curr_page = pfn_to_page(iter);
- if (!page_count(curr_page) || PageLRU(curr_page))
- continue;
-
- immobile++;
- }
-
- if (arg.pages_found == immobile)
+ /*
+ * immobile means "not-on-lru" paes. If immobile is larger than
+ * removable-by-driver pages reported by notifier, we'll fail.
+ */
+ if (!immobile || arg.pages_found >= immobile)
ret = 0;
out:
Index: kametest/mm/memory_hotplug.c
===================================================================
--- kametest.orig/mm/memory_hotplug.c
+++ kametest/mm/memory_hotplug.c
@@ -602,27 +602,14 @@ static struct page *next_active_pagebloc
/* Checks if this range of memory is likely to be hot-removable. */
int is_mem_section_removable(unsigned long start_pfn, unsigned long nr_pages)
{
- int type;
struct page *page = pfn_to_page(start_pfn);
struct page *end_page = page + nr_pages;
/* Check the starting page of each pageblock within the range */
for (; page < end_page; page = next_active_pageblock(page)) {
- type = get_pageblock_migratetype(page);
-
- /*
- * A pageblock containing MOVABLE or free pages is considered
- * removable
- */
- if (type != MIGRATE_MOVABLE && !pageblock_free(page))
+ if (!is_pageblock_removable_async(page))
return 0;
- /*
- * A pageblock starting with a PageReserved page is not
- * considered removable.
- */
- if (PageReserved(page))
- return 0;
}
/* All pageblocks in the memory block are likely to be hot-removable */
Index: kametest/include/linux/memory_hotplug.h
===================================================================
--- kametest.orig/include/linux/memory_hotplug.h
+++ kametest/include/linux/memory_hotplug.h
@@ -69,6 +69,7 @@ extern void online_page(struct page *pag
/* VM interface that may be used by firmware interface */
extern int online_pages(unsigned long, unsigned long);
extern void __offline_isolated_pages(unsigned long, unsigned long);
+extern bool is_pageblock_removable_async(struct page *page);
/* reasonably generic interface to expand the physical pages in a zone */
extern int __add_pages(int nid, struct zone *zone, unsigned long start_pfn,
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 3/3] memory hotplug: use unified logic for is_removable and offline_pages
2010-09-06 5:47 ` [PATCH 3/3] memory hotplug: use unified logic for is_removable and offline_pages KAMEZAWA Hiroyuki
@ 2010-09-06 9:30 ` Michal Hocko
2010-09-06 13:30 ` Hiroyuki Kamezawa
2010-09-06 13:58 ` Mel Gorman
1 sibling, 1 reply; 18+ messages in thread
From: Michal Hocko @ 2010-09-06 9:30 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, fengguang.wu,
akpm@linux-foundation.org, Mel Gorman, andi.kleen, Dave Hansen
On Mon 06-09-10 14:47:16, KAMEZAWA Hiroyuki wrote:
>
> Now, sysfs interface of memory hotplug shows whether the section is
> removable or not. But it checks only migrateype of pages and doesn't
> check details of cluster of pages.
>
> Next, memory hotplug's set_migratetype_isolate() has the same kind
> of check, too. But the migrate-type is just a "hint" and the pageblock
> can contain several types of pages if fragmentation is very heavy.
>
> To get precise information, we need to check
> - the pageblock only contains free pages or LRU pages.
>
> This patch adds the function __count_unmovable_pages() and makes
> above 2 checks to use the same logic. This will improve user experience
> of memory hotplug because sysfs interface tells accurate information.
>
> Note:
> it may be better to check MIGRATE_UNMOVABLE for making failure case quick.
>
> Changelog: 2010/09/06
> - added comments.
> - removed zone->lock.
> - changed the name of the function to be is_pageblock_removable_async().
> because I removed the zone->lock.
wouldn't be __is_pageblock_removable a better name? _async suffix is
usually used for asynchronous operations and this is just a function
withtout locks.
>
> Reported-by: Michal Hocko <mhocko@suse.cz>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
> include/linux/memory_hotplug.h | 1
> mm/memory_hotplug.c | 15 -------
> mm/page_alloc.c | 77 ++++++++++++++++++++++++++++++-----------
> 3 files changed, 60 insertions(+), 33 deletions(-)
>
> Index: kametest/mm/page_alloc.c
> ===================================================================
> --- kametest.orig/mm/page_alloc.c
> +++ kametest/mm/page_alloc.c
> @@ -5274,11 +5274,61 @@ void set_pageblock_flags_group(struct pa
> * page allocater never alloc memory from ISOLATE block.
> */
>
Can we add a comment on the locking? Something like:
Caller should hold zone->lock if he needs consistent results.
> +static int __count_immobile_pages(struct zone *zone, struct page *page)
> +{
> + unsigned long pfn, iter, found;
> + /*
> + * For avoiding noise data, lru_add_drain_all() should be called
> + * If ZONE_MOVABLE, the zone never contains immobile pages
> + */
> + if (zone_idx(zone) == ZONE_MOVABLE)
> + return 0;
> +
> + pfn = page_to_pfn(page);
> + for (found = 0, iter = 0; iter < pageblock_nr_pages; iter++) {
> + unsigned long check = pfn + iter;
> +
> + if (!pfn_valid_within(check)) {
> + iter++;
> + continue;
> + }
> + page = pfn_to_page(check);
> + if (!page_count(page)) {
> + if (PageBuddy(page))
> + iter += (1 << page_order(page)) - 1;
> + continue;
> + }
> + if (!PageLRU(page))
> + found++;
> + /*
> + * If the page is not RAM, page_count()should be 0.
> + * we don't need more check. This is an _used_ not-movable page.
> + *
> + * The problematic thing here is PG_reserved pages. PG_reserved
> + * is set to both of a memory hole page and a _used_ kernel
> + * page at boot.
> + */
> + }
> + return found;
> +}
> +
> +bool is_pageblock_removable_async(struct page *page)
> +{
> + struct zone *zone = page_zone(page);
> + unsigned long flags;
> + int num;
> + /* Don't take zone->lock interntionally. */
Could you add the reason?
Don't take zone-> lock intentionally because we are called from the
userspace (sysfs interface).
[...]
> /* All pageblocks in the memory block are likely to be hot-removable */
> Index: kametest/include/linux/memory_hotplug.h
> ===================================================================
> --- kametest.orig/include/linux/memory_hotplug.h
> +++ kametest/include/linux/memory_hotplug.h
> @@ -69,6 +69,7 @@ extern void online_page(struct page *pag
> /* VM interface that may be used by firmware interface */
> extern int online_pages(unsigned long, unsigned long);
> extern void __offline_isolated_pages(unsigned long, unsigned long);
#ifdef CONFIG_HOTREMOVE
> +extern bool is_pageblock_removable_async(struct page *page);
#else
#define is_pageblock_removable_async(p) 0
#endif
?
Thanks!
--
Michal Hocko
L3 team
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 3/3] memory hotplug: use unified logic for is_removable and offline_pages
2010-09-06 9:30 ` Michal Hocko
@ 2010-09-06 13:30 ` Hiroyuki Kamezawa
2010-09-07 13:16 ` Michal Hocko
0 siblings, 1 reply; 18+ messages in thread
From: Hiroyuki Kamezawa @ 2010-09-06 13:30 UTC (permalink / raw)
To: Michal Hocko
Cc: KAMEZAWA Hiroyuki, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, fengguang.wu,
akpm@linux-foundation.org, Mel Gorman, andi.kleen, Dave Hansen
2010/9/6 Michal Hocko <mhocko@suse.cz>:
> On Mon 06-09-10 14:47:16, KAMEZAWA Hiroyuki wrote:
>>
>> Now, sysfs interface of memory hotplug shows whether the section is
>> removable or not. But it checks only migrateype of pages and doesn't
>> check details of cluster of pages.
>>
>> Next, memory hotplug's set_migratetype_isolate() has the same kind
>> of check, too. But the migrate-type is just a "hint" and the pageblock
>> can contain several types of pages if fragmentation is very heavy.
>>
>> To get precise information, we need to check
>> - the pageblock only contains free pages or LRU pages.
>>
>> This patch adds the function __count_unmovable_pages() and makes
>> above 2 checks to use the same logic. This will improve user experience
>> of memory hotplug because sysfs interface tells accurate information.
>>
>> Note:
>> it may be better to check MIGRATE_UNMOVABLE for making failure case quick.
>>
>> Changelog: 2010/09/06
>> - added comments.
>> - removed zone->lock.
>> - changed the name of the function to be is_pageblock_removable_async().
>> because I removed the zone->lock.
>
> wouldn't be __is_pageblock_removable a better name? _async suffix is
> usually used for asynchronous operations and this is just a function
> withtout locks.
>
rename as _is_pagebloc_removable_nolock().
>>
>> Reported-by: Michal Hocko <mhocko@suse.cz>
>> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>> ---
>> include/linux/memory_hotplug.h | 1
>> mm/memory_hotplug.c | 15 -------
>> mm/page_alloc.c | 77 ++++++++++++++++++++++++++++++-----------
>> 3 files changed, 60 insertions(+), 33 deletions(-)
>>
>> Index: kametest/mm/page_alloc.c
>> ===================================================================
>> --- kametest.orig/mm/page_alloc.c
>> +++ kametest/mm/page_alloc.c
>> @@ -5274,11 +5274,61 @@ void set_pageblock_flags_group(struct pa
>> * page allocater never alloc memory from ISOLATE block.
>> */
>>
>
> Can we add a comment on the locking? Something like:
> Caller should hold zone->lock if he needs consistent results.
>
Hmm. ok.
>> +static int __count_immobile_pages(struct zone *zone, struct page *page)
>> +{
>> + unsigned long pfn, iter, found;
>> + /*
>> + * For avoiding noise data, lru_add_drain_all() should be called
>> + * If ZONE_MOVABLE, the zone never contains immobile pages
>> + */
>> + if (zone_idx(zone) == ZONE_MOVABLE)
>> + return 0;
>> +
>> + pfn = page_to_pfn(page);
>> + for (found = 0, iter = 0; iter < pageblock_nr_pages; iter++) {
>> + unsigned long check = pfn + iter;
>> +
>> + if (!pfn_valid_within(check)) {
>> + iter++;
>> + continue;
>> + }
>> + page = pfn_to_page(check);
>> + if (!page_count(page)) {
>> + if (PageBuddy(page))
>> + iter += (1 << page_order(page)) - 1;
>> + continue;
>> + }
>> + if (!PageLRU(page))
>> + found++;
>> + /*
>> + * If the page is not RAM, page_count()should be 0.
>> + * we don't need more check. This is an _used_ not-movable page.
>> + *
>> + * The problematic thing here is PG_reserved pages. PG_reserved
>> + * is set to both of a memory hole page and a _used_ kernel
>> + * page at boot.
>> + */
>> + }
>> + return found;
>> +}
>> +
>> +bool is_pageblock_removable_async(struct page *page)
>> +{
>> + struct zone *zone = page_zone(page);
>> + unsigned long flags;
>> + int num;
>> + /* Don't take zone->lock interntionally. */
>
> Could you add the reason?
> Don't take zone-> lock intentionally because we are called from the
> userspace (sysfs interface).
>
I don't like to assume caller context which will limit the callers.
/* holding zone->lock or not is caller's job. */
> [...]
>> /* All pageblocks in the memory block are likely to be hot-removable */
>> Index: kametest/include/linux/memory_hotplug.h
>> ===================================================================
>> --- kametest.orig/include/linux/memory_hotplug.h
>> +++ kametest/include/linux/memory_hotplug.h
>> @@ -69,6 +69,7 @@ extern void online_page(struct page *pag
>> /* VM interface that may be used by firmware interface */
>> extern int online_pages(unsigned long, unsigned long);
>> extern void __offline_isolated_pages(unsigned long, unsigned long);
>
> #ifdef CONFIG_HOTREMOVE
>
>> +extern bool is_pageblock_removable_async(struct page *page);
>
> #else
> #define is_pageblock_removable_async(p) 0
> #endif
> ?
Is this function is called even if HOTREMOVE is off ?
If so, the caller is buggy. I'll check tomorrow.
Thanks,
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [BUGFIX][PATCH 1/3] memory hotplug: fix next block calculation in is_removable
2010-09-06 5:42 ` [BUGFIX][PATCH 1/3] memory hotplug: fix next block calculation in is_removable KAMEZAWA Hiroyuki
@ 2010-09-06 13:39 ` Mel Gorman
2010-09-06 17:15 ` Hiroyuki Kamezawa
0 siblings, 1 reply; 18+ messages in thread
From: Mel Gorman @ 2010-09-06 13:39 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Michal Hocko,
fengguang.wu, akpm@linux-foundation.org, andi.kleen, Dave Hansen,
stable
On Mon, Sep 06, 2010 at 02:42:28PM +0900, KAMEZAWA Hiroyuki wrote:
>
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>
> next_active_pageblock() is for finding next _used_ freeblock. It skips
> several blocks when it finds there are a chunk of free pages lager than
> pageblock. But it has 2 bugs.
>
> 1. We have no lock. page_order(page) - pageblock_order can be minus.
> 2. pageblocks_stride += is wrong. it should skip page_order(p) of pages.
>
> CC: stable@kernel.org
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
> mm/memory_hotplug.c | 16 ++++++++--------
> 1 file changed, 8 insertions(+), 8 deletions(-)
>
> Index: kametest/mm/memory_hotplug.c
> ===================================================================
> --- kametest.orig/mm/memory_hotplug.c
> +++ kametest/mm/memory_hotplug.c
> @@ -584,19 +584,19 @@ static inline int pageblock_free(struct
> /* Return the start of the next active pageblock after a given page */
> static struct page *next_active_pageblock(struct page *page)
> {
> - int pageblocks_stride;
> -
> /* Ensure the starting page is pageblock-aligned */
> BUG_ON(page_to_pfn(page) & (pageblock_nr_pages - 1));
>
> - /* Move forward by at least 1 * pageblock_nr_pages */
> - pageblocks_stride = 1;
> -
> /* If the entire pageblock is free, move to the end of free page */
> - if (pageblock_free(page))
> - pageblocks_stride += page_order(page) - pageblock_order;
> + if (pageblock_free(page)) {
> + int order;
> + /* be careful. we don't have locks, page_order can be changed.*/
> + order = page_order(page);
> + if (order > pageblock_order)
> + return page + (1 << order);
> + }
As you note in your changelog, page_order() is unsafe because we do not have
the zone lock but you don't check if order is somewhere between pageblock_order
and MAX_ORDER_NR_PAGES. How is this safer?
>
> - return page + (pageblocks_stride * pageblock_nr_pages);
> + return page + pageblock_nr_pages;
> }
>
> /* Checks if this range of memory is likely to be hot-removable. */
>
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 3/3] memory hotplug: use unified logic for is_removable and offline_pages
2010-09-06 5:47 ` [PATCH 3/3] memory hotplug: use unified logic for is_removable and offline_pages KAMEZAWA Hiroyuki
2010-09-06 9:30 ` Michal Hocko
@ 2010-09-06 13:58 ` Mel Gorman
2010-09-06 23:58 ` KAMEZAWA Hiroyuki
1 sibling, 1 reply; 18+ messages in thread
From: Mel Gorman @ 2010-09-06 13:58 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Michal Hocko,
fengguang.wu, akpm@linux-foundation.org, andi.kleen, Dave Hansen
On Mon, Sep 06, 2010 at 02:47:16PM +0900, KAMEZAWA Hiroyuki wrote:
>
> Now, sysfs interface of memory hotplug shows whether the section is
> removable or not. But it checks only migrateype of pages and doesn't
> check details of cluster of pages.
>
This was deliberate at the time. The intention was to avoid an expensive
linear page scan where possible.
> Next, memory hotplug's set_migratetype_isolate() has the same kind
> of check, too. But the migrate-type is just a "hint" and the pageblock
> can contain several types of pages if fragmentation is very heavy.
>
If fragmentation is very heavy on a system that requires memory
hot-plug, I'd also be checking the value of min_free_kbytes. If it's
low, I suggest an init script runs
hugeadm --set-recommended-min_free_kbytes
because it'll keep fragmentation-related events to a minimum. The
mm_page_alloc_extfrag tracepoint can be used to measure fragmentation
events if you want to see the effect of altering min_free_kbytes like
this.
> To get precise information, we need to check
> - the pageblock only contains free pages or LRU pages.
>
> This patch adds the function __count_unmovable_pages() and makes
> above 2 checks to use the same logic. This will improve user experience
> of memory hotplug because sysfs interface tells accurate information.
>
> Note:
> it may be better to check MIGRATE_UNMOVABLE for making failure case quick.
>
> Changelog: 2010/09/06
> - added comments.
> - removed zone->lock.
> - changed the name of the function to be is_pageblock_removable_async().
> because I removed the zone->lock.
>
> Reported-by: Michal Hocko <mhocko@suse.cz>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
> include/linux/memory_hotplug.h | 1
> mm/memory_hotplug.c | 15 -------
> mm/page_alloc.c | 77 ++++++++++++++++++++++++++++++-----------
> 3 files changed, 60 insertions(+), 33 deletions(-)
>
> Index: kametest/mm/page_alloc.c
> ===================================================================
> --- kametest.orig/mm/page_alloc.c
> +++ kametest/mm/page_alloc.c
> @@ -5274,11 +5274,61 @@ void set_pageblock_flags_group(struct pa
> * page allocater never alloc memory from ISOLATE block.
> */
>
> +static int __count_immobile_pages(struct zone *zone, struct page *page)
> +{
This will also count RECLAIMABLE pages belonging to some slab objects.
These are potentially hot-removable if slab is shrunk. Your function gives a
more accurate count but not necessarily a better user-experience with respect
to finding sections to hot-remove. You might like to detect PageSlab pages
that belong to a RECLAIMABLE slab and not count these as immobile.
> + unsigned long pfn, iter, found;
> + /*
> + * For avoiding noise data, lru_add_drain_all() should be called
> + * If ZONE_MOVABLE, the zone never contains immobile pages
> + */
> + if (zone_idx(zone) == ZONE_MOVABLE)
> + return 0;
> +
> + pfn = page_to_pfn(page);
> + for (found = 0, iter = 0; iter < pageblock_nr_pages; iter++) {
> + unsigned long check = pfn + iter;
> +
> + if (!pfn_valid_within(check)) {
> + iter++;
> + continue;
> + }
> + page = pfn_to_page(check);
> + if (!page_count(page)) {
> + if (PageBuddy(page))
> + iter += (1 << page_order(page)) - 1;
> + continue;
> + }
> + if (!PageLRU(page))
> + found++;
Arguably, you do not care how many pages there are, you just care if
there is one truely unmovable page. If you find one of them, then have
this function return fail to avoid the rest of the scan.
> + /*
> + * If the page is not RAM, page_count()should be 0.
> + * we don't need more check. This is an _used_ not-movable page.
> + *
> + * The problematic thing here is PG_reserved pages. PG_reserved
> + * is set to both of a memory hole page and a _used_ kernel
> + * page at boot.
> + */
> + }
> + return found;
> +}
> +
> +bool is_pageblock_removable_async(struct page *page)
> +{
> + struct zone *zone = page_zone(page);
> + unsigned long flags;
> + int num;
> + /* Don't take zone->lock interntionally. */
intentionally?
> + num = __count_immobile_pages(zone, page);
> +
> + if (num)
> + return false;
> + return true;
> +}
> +
> int set_migratetype_isolate(struct page *page)
> {
> struct zone *zone;
> - struct page *curr_page;
> - unsigned long flags, pfn, iter;
> + unsigned long flags, pfn;
> unsigned long immobile = 0;
> struct memory_isolate_notify arg;
> int notifier_ret;
> @@ -5289,11 +5339,6 @@ int set_migratetype_isolate(struct page
> zone_idx = zone_idx(zone);
>
> spin_lock_irqsave(&zone->lock, flags);
> - if (get_pageblock_migratetype(page) == MIGRATE_MOVABLE ||
> - zone_idx == ZONE_MOVABLE) {
> - ret = 0;
> - goto out;
> - }
>
This will result in more scanning and a potentially more expensive
memory hot-remove operation. I'm not massively concerned as such because
memory hot-remove is not cheap but it's worth mentioning in the
changelog that this is a consequence.
> pfn = page_to_pfn(page);
> arg.start_pfn = pfn;
> @@ -5315,19 +5360,13 @@ int set_migratetype_isolate(struct page
> notifier_ret = notifier_to_errno(notifier_ret);
> if (notifier_ret)
> goto out;
> + immobile = __count_immobile_pages(zone ,page);
>
> - for (iter = pfn; iter < (pfn + pageblock_nr_pages); iter++) {
> - if (!pfn_valid_within(pfn))
> - continue;
> -
> - curr_page = pfn_to_page(iter);
> - if (!page_count(curr_page) || PageLRU(curr_page))
> - continue;
> -
> - immobile++;
> - }
> -
> - if (arg.pages_found == immobile)
> + /*
> + * immobile means "not-on-lru" paes. If immobile is larger than
> + * removable-by-driver pages reported by notifier, we'll fail.
> + */
> + if (!immobile || arg.pages_found >= immobile)
> ret = 0;
>
Here is where I'd suggest reimplementing __count_immobile_pages as
pageblock_any_immobile() that returns true if it detects an immobile page
in a given PFN range.
> out:
> Index: kametest/mm/memory_hotplug.c
> ===================================================================
> --- kametest.orig/mm/memory_hotplug.c
> +++ kametest/mm/memory_hotplug.c
> @@ -602,27 +602,14 @@ static struct page *next_active_pagebloc
> /* Checks if this range of memory is likely to be hot-removable. */
> int is_mem_section_removable(unsigned long start_pfn, unsigned long nr_pages)
> {
> - int type;
> struct page *page = pfn_to_page(start_pfn);
> struct page *end_page = page + nr_pages;
>
> /* Check the starting page of each pageblock within the range */
> for (; page < end_page; page = next_active_pageblock(page)) {
> - type = get_pageblock_migratetype(page);
> -
> - /*
> - * A pageblock containing MOVABLE or free pages is considered
> - * removable
> - */
> - if (type != MIGRATE_MOVABLE && !pageblock_free(page))
> + if (!is_pageblock_removable_async(page))
> return 0;
>
> - /*
> - * A pageblock starting with a PageReserved page is not
> - * considered removable.
> - */
> - if (PageReserved(page))
> - return 0;
> }
Bear in mind that a user or bad application constantly reading the sysfs
file potentially causes a lot of cache trashing as a result of the
linear scan instead of the pageblock type check.
>
> /* All pageblocks in the memory block are likely to be hot-removable */
> Index: kametest/include/linux/memory_hotplug.h
> ===================================================================
> --- kametest.orig/include/linux/memory_hotplug.h
> +++ kametest/include/linux/memory_hotplug.h
> @@ -69,6 +69,7 @@ extern void online_page(struct page *pag
> /* VM interface that may be used by firmware interface */
> extern int online_pages(unsigned long, unsigned long);
> extern void __offline_isolated_pages(unsigned long, unsigned long);
> +extern bool is_pageblock_removable_async(struct page *page);
>
> /* reasonably generic interface to expand the physical pages in a zone */
> extern int __add_pages(int nid, struct zone *zone, unsigned long start_pfn,
>
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [BUGFIX][PATCH 1/3] memory hotplug: fix next block calculation in is_removable
2010-09-06 13:39 ` Mel Gorman
@ 2010-09-06 17:15 ` Hiroyuki Kamezawa
2010-09-07 9:30 ` Mel Gorman
0 siblings, 1 reply; 18+ messages in thread
From: Hiroyuki Kamezawa @ 2010-09-06 17:15 UTC (permalink / raw)
To: Mel Gorman
Cc: KAMEZAWA Hiroyuki, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, Michal Hocko, fengguang.wu,
akpm@linux-foundation.org, andi.kleen, Dave Hansen, stable
2010/9/6 Mel Gorman <mel@csn.ul.ie>:
> On Mon, Sep 06, 2010 at 02:42:28PM +0900, KAMEZAWA Hiroyuki wrote:
>>
>> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>>
>> next_active_pageblock() is for finding next _used_ freeblock. It skips
>> several blocks when it finds there are a chunk of free pages lager than
>> pageblock. But it has 2 bugs.
>>
>> 1. We have no lock. page_order(page) - pageblock_order can be minus.
>> 2. pageblocks_stride += is wrong. it should skip page_order(p) of pages.
>>
>> CC: stable@kernel.org
>> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>> ---
>> mm/memory_hotplug.c | 16 ++++++++--------
>> 1 file changed, 8 insertions(+), 8 deletions(-)
>>
>> Index: kametest/mm/memory_hotplug.c
>> ===================================================================
>> --- kametest.orig/mm/memory_hotplug.c
>> +++ kametest/mm/memory_hotplug.c
>> @@ -584,19 +584,19 @@ static inline int pageblock_free(struct
>> /* Return the start of the next active pageblock after a given page */
>> static struct page *next_active_pageblock(struct page *page)
>> {
>> - int pageblocks_stride;
>> -
>> /* Ensure the starting page is pageblock-aligned */
>> BUG_ON(page_to_pfn(page) & (pageblock_nr_pages - 1));
>>
>> - /* Move forward by at least 1 * pageblock_nr_pages */
>> - pageblocks_stride = 1;
>> -
>> /* If the entire pageblock is free, move to the end of free page */
>> - if (pageblock_free(page))
>> - pageblocks_stride += page_order(page) - pageblock_order;
>> + if (pageblock_free(page)) {
>> + int order;
>> + /* be careful. we don't have locks, page_order can be changed.*/
>> + order = page_order(page);
>> + if (order > pageblock_order)
>> + return page + (1 << order);
>> + }
>
> As you note in your changelog, page_order() is unsafe because we do not have
> the zone lock but you don't check if order is somewhere between pageblock_order
> and MAX_ORDER_NR_PAGES. How is this safer?
>
Ah, I missed that.
if ((pageblock_order <= order) && (order < MAX_ORDER))
return page + (1 << order);
ok ?
Thanks,
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 3/3] memory hotplug: use unified logic for is_removable and offline_pages
2010-09-06 13:58 ` Mel Gorman
@ 2010-09-06 23:58 ` KAMEZAWA Hiroyuki
2010-09-07 13:01 ` Mel Gorman
0 siblings, 1 reply; 18+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-09-06 23:58 UTC (permalink / raw)
To: Mel Gorman
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Michal Hocko,
fengguang.wu, akpm@linux-foundation.org, andi.kleen, Dave Hansen
On Mon, 6 Sep 2010 14:58:23 +0100
Mel Gorman <mel@csn.ul.ie> wrote:
> On Mon, Sep 06, 2010 at 02:47:16PM +0900, KAMEZAWA Hiroyuki wrote:
> >
> > Now, sysfs interface of memory hotplug shows whether the section is
> > removable or not. But it checks only migrateype of pages and doesn't
> > check details of cluster of pages.
> >
>
> This was deliberate at the time. The intention was to avoid an expensive
> linear page scan where possible.
>
> > Next, memory hotplug's set_migratetype_isolate() has the same kind
> > of check, too. But the migrate-type is just a "hint" and the pageblock
> > can contain several types of pages if fragmentation is very heavy.
> >
>
> If fragmentation is very heavy on a system that requires memory
> hot-plug, I'd also be checking the value of min_free_kbytes. If it's
> low, I suggest an init script runs
>
> hugeadm --set-recommended-min_free_kbytes
>
> because it'll keep fragmentation-related events to a minimum. The
> mm_page_alloc_extfrag tracepoint can be used to measure fragmentation
> events if you want to see the effect of altering min_free_kbytes like
> this.
>
Hmm, then what should I do in this patch ?
> > To get precise information, we need to check
> > - the pageblock only contains free pages or LRU pages.
> >
> > This patch adds the function __count_unmovable_pages() and makes
> > above 2 checks to use the same logic. This will improve user experience
> > of memory hotplug because sysfs interface tells accurate information.
> >
> > Note:
> > it may be better to check MIGRATE_UNMOVABLE for making failure case quick.
> >
> > Changelog: 2010/09/06
> > - added comments.
> > - removed zone->lock.
> > - changed the name of the function to be is_pageblock_removable_async().
> > because I removed the zone->lock.
> >
> > Reported-by: Michal Hocko <mhocko@suse.cz>
> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > ---
> > include/linux/memory_hotplug.h | 1
> > mm/memory_hotplug.c | 15 -------
> > mm/page_alloc.c | 77 ++++++++++++++++++++++++++++++-----------
> > 3 files changed, 60 insertions(+), 33 deletions(-)
> >
> > Index: kametest/mm/page_alloc.c
> > ===================================================================
> > --- kametest.orig/mm/page_alloc.c
> > +++ kametest/mm/page_alloc.c
> > @@ -5274,11 +5274,61 @@ void set_pageblock_flags_group(struct pa
> > * page allocater never alloc memory from ISOLATE block.
> > */
> >
> > +static int __count_immobile_pages(struct zone *zone, struct page *page)
> > +{
>
> This will also count RECLAIMABLE pages belonging to some slab objects.
> These are potentially hot-removable if slab is shrunk. Your function gives a
> more accurate count but not necessarily a better user-experience with respect
> to finding sections to hot-remove. You might like to detect PageSlab pages
> that belong to a RECLAIMABLE slab and not count these as immobile.
>
RECLAIMABLE object is not _always_ reclaimable.
Should we add "maybe reclaimable" return value ?
>
> > + unsigned long pfn, iter, found;
> > + /*
> > + * For avoiding noise data, lru_add_drain_all() should be called
> > + * If ZONE_MOVABLE, the zone never contains immobile pages
> > + */
> > + if (zone_idx(zone) == ZONE_MOVABLE)
> > + return 0;
> > +
> > + pfn = page_to_pfn(page);
> > + for (found = 0, iter = 0; iter < pageblock_nr_pages; iter++) {
> > + unsigned long check = pfn + iter;
> > +
> > + if (!pfn_valid_within(check)) {
> > + iter++;
> > + continue;
> > + }
> > + page = pfn_to_page(check);
> > + if (!page_count(page)) {
> > + if (PageBuddy(page))
> > + iter += (1 << page_order(page)) - 1;
> > + continue;
> > + }
> > + if (!PageLRU(page))
> > + found++;
>
> Arguably, you do not care how many pages there are, you just care if
> there is one truely unmovable page. If you find one of them, then have
> this function return fail to avoid the rest of the scan.
>
This is for IBM guys who added stupid notifier for memory hotplug scanning.
Hmm, ok, I'll add some trick.
> > + /*
> > + * If the page is not RAM, page_count()should be 0.
> > + * we don't need more check. This is an _used_ not-movable page.
> > + *
> > + * The problematic thing here is PG_reserved pages. PG_reserved
> > + * is set to both of a memory hole page and a _used_ kernel
> > + * page at boot.
> > + */
> > + }
> > + return found;
> > +}
> > +
> > +bool is_pageblock_removable_async(struct page *page)
> > +{
> > + struct zone *zone = page_zone(page);
> > + unsigned long flags;
> > + int num;
> > + /* Don't take zone->lock interntionally. */
>
> intentionally?
>
yes. typo.
> > + num = __count_immobile_pages(zone, page);
> > +
> > + if (num)
> > + return false;
> > + return true;
> > +}
> > +
> > int set_migratetype_isolate(struct page *page)
> > {
> > struct zone *zone;
> > - struct page *curr_page;
> > - unsigned long flags, pfn, iter;
> > + unsigned long flags, pfn;
> > unsigned long immobile = 0;
> > struct memory_isolate_notify arg;
> > int notifier_ret;
> > @@ -5289,11 +5339,6 @@ int set_migratetype_isolate(struct page
> > zone_idx = zone_idx(zone);
> >
> > spin_lock_irqsave(&zone->lock, flags);
> > - if (get_pageblock_migratetype(page) == MIGRATE_MOVABLE ||
> > - zone_idx == ZONE_MOVABLE) {
> > - ret = 0;
> > - goto out;
> > - }
> >
>
> This will result in more scanning and a potentially more expensive
> memory hot-remove operation. I'm not massively concerned as such because
> memory hot-remove is not cheap but it's worth mentioning in the
> changelog that this is a consequence.
>
Okay.
> > pfn = page_to_pfn(page);
> > arg.start_pfn = pfn;
> > @@ -5315,19 +5360,13 @@ int set_migratetype_isolate(struct page
> > notifier_ret = notifier_to_errno(notifier_ret);
> > if (notifier_ret)
> > goto out;
> > + immobile = __count_immobile_pages(zone ,page);
> >
> > - for (iter = pfn; iter < (pfn + pageblock_nr_pages); iter++) {
> > - if (!pfn_valid_within(pfn))
> > - continue;
> > -
> > - curr_page = pfn_to_page(iter);
> > - if (!page_count(curr_page) || PageLRU(curr_page))
> > - continue;
> > -
> > - immobile++;
> > - }
> > -
> > - if (arg.pages_found == immobile)
> > + /*
> > + * immobile means "not-on-lru" paes. If immobile is larger than
> > + * removable-by-driver pages reported by notifier, we'll fail.
> > + */
> > + if (!immobile || arg.pages_found >= immobile)
> > ret = 0;
> >
>
> Here is where I'd suggest reimplementing __count_immobile_pages as
> pageblock_any_immobile() that returns true if it detects an immobile page
> in a given PFN range.
>
How about
__count_immobile_pages(page, count);
returns true if count >= immobile.
Maybe we can add extention whether we allow RECLAIMABLE pages in the range or not.
> > out:
> > Index: kametest/mm/memory_hotplug.c
> > ===================================================================
> > --- kametest.orig/mm/memory_hotplug.c
> > +++ kametest/mm/memory_hotplug.c
> > @@ -602,27 +602,14 @@ static struct page *next_active_pagebloc
> > /* Checks if this range of memory is likely to be hot-removable. */
> > int is_mem_section_removable(unsigned long start_pfn, unsigned long nr_pages)
> > {
> > - int type;
> > struct page *page = pfn_to_page(start_pfn);
> > struct page *end_page = page + nr_pages;
> >
> > /* Check the starting page of each pageblock within the range */
> > for (; page < end_page; page = next_active_pageblock(page)) {
> > - type = get_pageblock_migratetype(page);
> > -
> > - /*
> > - * A pageblock containing MOVABLE or free pages is considered
> > - * removable
> > - */
> > - if (type != MIGRATE_MOVABLE && !pageblock_free(page))
> > + if (!is_pageblock_removable_async(page))
> > return 0;
> >
> > - /*
> > - * A pageblock starting with a PageReserved page is not
> > - * considered removable.
> > - */
> > - if (PageReserved(page))
> > - return 0;
> > }
>
> Bear in mind that a user or bad application constantly reading the sysfs
> file potentially causes a lot of cache trashing as a result of the
> linear scan instead of the pageblock type check.
>
yes, hmm.
I think there are 3 levels of checking.
1. very strict check .... is_removable=true if the range contains only LRU and Free.
2. possibility check .... is_removable=true if the range contains LRU, Free, Slab(reclaimable)
3. fuzzy check .... is_removable=true if type = MIGRATABLE. (but very quick.)
I wonder I should add sysctl (or some memory hotplug control file) and make
"fuzzy check" as default, which is the same to old behavior.
Any idea ?
Thanks,
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 0/3] memory hotplug: updates and bugfix for is_removable v3
2010-09-06 5:40 [PATCH 0/3] memory hotplug: updates and bugfix for is_removable KAMEZAWA Hiroyuki
` (2 preceding siblings ...)
2010-09-06 5:47 ` [PATCH 3/3] memory hotplug: use unified logic for is_removable and offline_pages KAMEZAWA Hiroyuki
@ 2010-09-07 1:28 ` KAMEZAWA Hiroyuki
2010-09-07 1:32 ` [PATCH 1/3] [BUGFIX] memory hotplug: fix next block calculation in is_removable KAMEZAWA Hiroyuki
` (2 more replies)
3 siblings, 3 replies; 18+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-09-07 1:28 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Michal Hocko,
fengguang.wu, akpm@linux-foundation.org, Mel Gorman, andi.kleen,
Dave Hansen
Thank you for review and comments. totally updated.
Problem:
/sys/devices/system/memory/memoryX/removable file shows whether the section
can be offlined or not. Returns "1" if it seems removable.
Now, the file uses a similar logic to one offline_pages() uses.
Then, it's better to use unified logic.
The biggest change from previous one is this patch just unifies is_removable()
and offline code's logic. No functional change in offline() code.
(is_removable code is updated to be better.)
Brief patch description:
1. bugfix for is_removable() check. I think this should be back ported. (updated)
2. bugfix for callback at counting immobile pages. (no change)
3. the unified new logic for is_remobable. (updated)
Because I'm moving house, my response may be delayd.
Thanks,
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 1/3] [BUGFIX] memory hotplug: fix next block calculation in is_removable
2010-09-07 1:28 ` [PATCH 0/3] memory hotplug: updates and bugfix for is_removable v3 KAMEZAWA Hiroyuki
@ 2010-09-07 1:32 ` KAMEZAWA Hiroyuki
2010-09-08 11:23 ` Mel Gorman
2010-09-07 1:34 ` [PATCH 2/3][BUGFIX] memory hotplug: fix notifier's return value check KAMEZAWA Hiroyuki
2010-09-07 1:36 ` [PATCH 3/3] memory hotplug: unify is_removable and offline detection code KAMEZAWA Hiroyuki
2 siblings, 1 reply; 18+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-09-07 1:32 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Michal Hocko,
fengguang.wu, akpm@linux-foundation.org, Mel Gorman, andi.kleen,
Dave Hansen
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
next_active_pageblock() is for finding next _used_ freeblock. It skips
several blocks when it finds there are a chunk of free pages lager than
pageblock. But it has 2 bugs.
1. We have no lock. page_order(page) - pageblock_order can be minus.
2. pageblocks_stride += is wrong. it should skip page_order(p) of pages.
Changelog: 2010/09/07
- fix range check of order returned by page_order().
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
mm/memory_hotplug.c | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)
Index: kametest/mm/memory_hotplug.c
===================================================================
--- kametest.orig/mm/memory_hotplug.c
+++ kametest/mm/memory_hotplug.c
@@ -584,19 +584,19 @@ static inline int pageblock_free(struct
/* Return the start of the next active pageblock after a given page */
static struct page *next_active_pageblock(struct page *page)
{
- int pageblocks_stride;
-
/* Ensure the starting page is pageblock-aligned */
BUG_ON(page_to_pfn(page) & (pageblock_nr_pages - 1));
- /* Move forward by at least 1 * pageblock_nr_pages */
- pageblocks_stride = 1;
-
/* If the entire pageblock is free, move to the end of free page */
- if (pageblock_free(page))
- pageblocks_stride += page_order(page) - pageblock_order;
+ if (pageblock_free(page)) {
+ int order;
+ /* be careful. we don't have locks, page_order can be changed.*/
+ order = page_order(page);
+ if ((order < MAX_ORDER) && (order >= pageblock_order))
+ return page + (1 << order);
+ }
- return page + (pageblocks_stride * pageblock_nr_pages);
+ return page + pageblock_nr_pages;
}
/* Checks if this range of memory is likely to be hot-removable. */
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 2/3][BUGFIX] memory hotplug: fix notifier's return value check
2010-09-07 1:28 ` [PATCH 0/3] memory hotplug: updates and bugfix for is_removable v3 KAMEZAWA Hiroyuki
2010-09-07 1:32 ` [PATCH 1/3] [BUGFIX] memory hotplug: fix next block calculation in is_removable KAMEZAWA Hiroyuki
@ 2010-09-07 1:34 ` KAMEZAWA Hiroyuki
2010-09-07 1:36 ` [PATCH 3/3] memory hotplug: unify is_removable and offline detection code KAMEZAWA Hiroyuki
2 siblings, 0 replies; 18+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-09-07 1:34 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Michal Hocko,
fengguang.wu, akpm@linux-foundation.org, Mel Gorman, andi.kleen,
Dave Hansen
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Even if notifier cannot find any pages, it doesn't mean
no pages are available...And, if there are no notifiers registered,
this condition will be always true and memory hotplug will show -EBUSY.
Clarification:This is a bug but not critical
In most case, a pageblock which will be offlined is MIGRATE_MOVABLE
This "notifier" is called only when the pageblock is _not_ MIGRATE_MOVABLE.
But if not MIGRATE_MOVABLE, it's common case that memory hotplug will fail.
So, no one notice this bug.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
mm/page_alloc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
Index: kametest/mm/page_alloc.c
===================================================================
--- kametest.orig/mm/page_alloc.c
+++ kametest/mm/page_alloc.c
@@ -5313,7 +5313,7 @@ int set_migratetype_isolate(struct page
*/
notifier_ret = memory_isolate_notify(MEM_ISOLATE_COUNT, &arg);
notifier_ret = notifier_to_errno(notifier_ret);
- if (notifier_ret || !arg.pages_found)
+ if (notifier_ret)
goto out;
for (iter = pfn; iter < (pfn + pageblock_nr_pages); iter++) {
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 3/3] memory hotplug: unify is_removable and offline detection code
2010-09-07 1:28 ` [PATCH 0/3] memory hotplug: updates and bugfix for is_removable v3 KAMEZAWA Hiroyuki
2010-09-07 1:32 ` [PATCH 1/3] [BUGFIX] memory hotplug: fix next block calculation in is_removable KAMEZAWA Hiroyuki
2010-09-07 1:34 ` [PATCH 2/3][BUGFIX] memory hotplug: fix notifier's return value check KAMEZAWA Hiroyuki
@ 2010-09-07 1:36 ` KAMEZAWA Hiroyuki
2 siblings, 0 replies; 18+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-09-07 1:36 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Michal Hocko,
fengguang.wu, akpm@linux-foundation.org, Mel Gorman, andi.kleen,
Dave Hansen
Now, sysfs interface of memory hotplug shows whether the section is
removable or not. But it checks only migrateype of pages and doesn't
check details of cluster of pages.
Next, memory hotplug's set_migratetype_isolate() has the same kind
of check, too.
This patch adds the function __count_unmovable_pages() and makes
above 2 checks to use the same logic. Then, is_removable and
hotremove code uses the same logic. No changes in the hotremove
logic itself.
TODO: need to find a way to check RECLAMABLE. But, considering bit,
calling shrink_slab() against a range before starting memory hotremove
sounds better. If so, this patch's logic doesn't need to be changed.
Changelog 2010.09.07:
- modified __count_movable_pages() to get "count" as arguments.
- renamed is_pageblock_removable_async as is_pageblock_removable_nolock.
- added cond_resched().
- no changes in offline logic itself.
- added #ifdef
Reported-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
include/linux/memory_hotplug.h | 4 +
mm/memory_hotplug.c | 17 --------
mm/page_alloc.c | 87 +++++++++++++++++++++++++++++++----------
3 files changed, 72 insertions(+), 36 deletions(-)
Index: kametest/mm/page_alloc.c
===================================================================
--- kametest.orig/mm/page_alloc.c
+++ kametest/mm/page_alloc.c
@@ -5274,12 +5274,65 @@ void set_pageblock_flags_group(struct pa
* page allocater never alloc memory from ISOLATE block.
*/
+static int
+__count_immobile_pages(struct zone *zone, struct page *page, int count)
+{
+ unsigned long pfn, iter, found;
+ /*
+ * For avoiding noise data, lru_add_drain_all() should be called
+ * If ZONE_MOVABLE, the zone never contains immobile pages
+ */
+ if (zone_idx(zone) == ZONE_MOVABLE)
+ return true;
+
+ if (get_pageblock_migratetype(page) == MIGRATE_MOVABLE)
+ return true;
+
+ pfn = page_to_pfn(page);
+ for (found = 0, iter = 0; iter < pageblock_nr_pages; iter++) {
+ unsigned long check = pfn + iter;
+
+ if (!pfn_valid_within(check)) {
+ iter++;
+ continue;
+ }
+ page = pfn_to_page(check);
+ if (!page_count(page)) {
+ if (PageBuddy(page))
+ iter += (1 << page_order(page)) - 1;
+ continue;
+ }
+ if (!PageLRU(page))
+ found++;
+ /*
+ * If there are RECLAIMABLE pages, we need to check it.
+ * But now, memory offline itself doesn't call shrink_slab()
+ * and it still to be fixed.
+ */
+ /*
+ * If the page is not RAM, page_count()should be 0.
+ * we don't need more check. This is an _used_ not-movable page.
+ *
+ * The problematic thing here is PG_reserved pages. PG_reserved
+ * is set to both of a memory hole page and a _used_ kernel
+ * page at boot.
+ */
+ if (found > count)
+ return false;
+ }
+ return true;
+}
+
+bool is_pageblock_removable_nolock(struct page *page)
+{
+ struct zone *zone = page_zone(page);
+ return __count_immobile_pages(zone, page, 0);
+}
+
int set_migratetype_isolate(struct page *page)
{
struct zone *zone;
- struct page *curr_page;
- unsigned long flags, pfn, iter;
- unsigned long immobile = 0;
+ unsigned long flags, pfn;
struct memory_isolate_notify arg;
int notifier_ret;
int ret = -EBUSY;
@@ -5289,11 +5342,6 @@ int set_migratetype_isolate(struct page
zone_idx = zone_idx(zone);
spin_lock_irqsave(&zone->lock, flags);
- if (get_pageblock_migratetype(page) == MIGRATE_MOVABLE ||
- zone_idx == ZONE_MOVABLE) {
- ret = 0;
- goto out;
- }
pfn = page_to_pfn(page);
arg.start_pfn = pfn;
@@ -5315,21 +5363,18 @@ int set_migratetype_isolate(struct page
notifier_ret = notifier_to_errno(notifier_ret);
if (notifier_ret)
goto out;
-
- for (iter = pfn; iter < (pfn + pageblock_nr_pages); iter++) {
- if (!pfn_valid_within(pfn))
- continue;
-
- curr_page = pfn_to_page(iter);
- if (!page_count(curr_page) || PageLRU(curr_page))
- continue;
-
- immobile++;
- }
-
- if (arg.pages_found == immobile)
+ /*
+ * FIXME: Now, memory hotplug doesn't call shrink_slab() by itself.
+ * We just check MOVABLE pages.
+ */
+ if(__count_immobile_pages(zone ,page, arg.pages_found))
ret = 0;
+ /*
+ * immobile means "not-on-lru" paes. If immobile is larger than
+ * removable-by-driver pages reported by notifier, we'll fail.
+ */
+
out:
if (!ret) {
set_pageblock_migratetype(page, MIGRATE_ISOLATE);
Index: kametest/mm/memory_hotplug.c
===================================================================
--- kametest.orig/mm/memory_hotplug.c
+++ kametest/mm/memory_hotplug.c
@@ -602,27 +602,14 @@ static struct page *next_active_pagebloc
/* Checks if this range of memory is likely to be hot-removable. */
int is_mem_section_removable(unsigned long start_pfn, unsigned long nr_pages)
{
- int type;
struct page *page = pfn_to_page(start_pfn);
struct page *end_page = page + nr_pages;
/* Check the starting page of each pageblock within the range */
for (; page < end_page; page = next_active_pageblock(page)) {
- type = get_pageblock_migratetype(page);
-
- /*
- * A pageblock containing MOVABLE or free pages is considered
- * removable
- */
- if (type != MIGRATE_MOVABLE && !pageblock_free(page))
- return 0;
-
- /*
- * A pageblock starting with a PageReserved page is not
- * considered removable.
- */
- if (PageReserved(page))
+ if (!is_pageblock_removable_nolock(page))
return 0;
+ cond_resched();
}
/* All pageblocks in the memory block are likely to be hot-removable */
Index: kametest/include/linux/memory_hotplug.h
===================================================================
--- kametest.orig/include/linux/memory_hotplug.h
+++ kametest/include/linux/memory_hotplug.h
@@ -70,6 +70,10 @@ extern void online_page(struct page *pag
extern int online_pages(unsigned long, unsigned long);
extern void __offline_isolated_pages(unsigned long, unsigned long);
+#ifdef CONFIG_MEMORY_HOTREMOVE
+extern bool is_pageblock_removable_nolock(struct page *page);
+#endif /* CONFIG_MEMORY_HOTREMOVE */
+
/* reasonably generic interface to expand the physical pages in a zone */
extern int __add_pages(int nid, struct zone *zone, unsigned long start_pfn,
unsigned long nr_pages);
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [BUGFIX][PATCH 1/3] memory hotplug: fix next block calculation in is_removable
2010-09-06 17:15 ` Hiroyuki Kamezawa
@ 2010-09-07 9:30 ` Mel Gorman
0 siblings, 0 replies; 18+ messages in thread
From: Mel Gorman @ 2010-09-07 9:30 UTC (permalink / raw)
To: Hiroyuki Kamezawa
Cc: KAMEZAWA Hiroyuki, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, Michal Hocko, fengguang.wu,
akpm@linux-foundation.org, andi.kleen, Dave Hansen, stable
On Tue, Sep 07, 2010 at 02:15:01AM +0900, Hiroyuki Kamezawa wrote:
> 2010/9/6 Mel Gorman <mel@csn.ul.ie>:
> > On Mon, Sep 06, 2010 at 02:42:28PM +0900, KAMEZAWA Hiroyuki wrote:
> >>
> >> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> >>
> >> next_active_pageblock() is for finding next _used_ freeblock. It skips
> >> several blocks when it finds there are a chunk of free pages lager than
> >> pageblock. But it has 2 bugs.
> >>
> >> 1. We have no lock. page_order(page) - pageblock_order can be minus.
> >> 2. pageblocks_stride += is wrong. it should skip page_order(p) of pages.
> >>
> >> CC: stable@kernel.org
> >> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> >> ---
> >> mm/memory_hotplug.c | 16 ++++++++--------
> >> 1 file changed, 8 insertions(+), 8 deletions(-)
> >>
> >> Index: kametest/mm/memory_hotplug.c
> >> ===================================================================
> >> --- kametest.orig/mm/memory_hotplug.c
> >> +++ kametest/mm/memory_hotplug.c
> >> @@ -584,19 +584,19 @@ static inline int pageblock_free(struct
> >> /* Return the start of the next active pageblock after a given page */
> >> static struct page *next_active_pageblock(struct page *page)
> >> {
> >> - int pageblocks_stride;
> >> -
> >> /* Ensure the starting page is pageblock-aligned */
> >> BUG_ON(page_to_pfn(page) & (pageblock_nr_pages - 1));
> >>
> >> - /* Move forward by at least 1 * pageblock_nr_pages */
> >> - pageblocks_stride = 1;
> >> -
> >> /* If the entire pageblock is free, move to the end of free page */
> >> - if (pageblock_free(page))
> >> - pageblocks_stride += page_order(page) - pageblock_order;
> >> + if (pageblock_free(page)) {
> >> + int order;
> >> + /* be careful. we don't have locks, page_order can be changed.*/
> >> + order = page_order(page);
> >> + if (order > pageblock_order)
> >> + return page + (1 << order);
> >> + }
> >
> > As you note in your changelog, page_order() is unsafe because we do not have
> > the zone lock but you don't check if order is somewhere between pageblock_order
> > and MAX_ORDER_NR_PAGES. How is this safer?
> >
> Ah, I missed that.
>
> if ((pageblock_order <= order) && (order < MAX_ORDER))
> return page + (1 << order);
> ok ?
>
Seems ok. There will still be some false usage of order but it should be
harmless.
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 3/3] memory hotplug: use unified logic for is_removable and offline_pages
2010-09-06 23:58 ` KAMEZAWA Hiroyuki
@ 2010-09-07 13:01 ` Mel Gorman
0 siblings, 0 replies; 18+ messages in thread
From: Mel Gorman @ 2010-09-07 13:01 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Michal Hocko,
fengguang.wu, akpm@linux-foundation.org, andi.kleen, Dave Hansen
On Tue, Sep 07, 2010 at 08:58:36AM +0900, KAMEZAWA Hiroyuki wrote:
> On Mon, 6 Sep 2010 14:58:23 +0100
> Mel Gorman <mel@csn.ul.ie> wrote:
>
> > On Mon, Sep 06, 2010 at 02:47:16PM +0900, KAMEZAWA Hiroyuki wrote:
> > >
> > > Now, sysfs interface of memory hotplug shows whether the section is
> > > removable or not. But it checks only migrateype of pages and doesn't
> > > check details of cluster of pages.
> > >
> >
> > This was deliberate at the time. The intention was to avoid an expensive
> > linear page scan where possible.
> >
>
> > > Next, memory hotplug's set_migratetype_isolate() has the same kind
> > > of check, too. But the migrate-type is just a "hint" and the pageblock
> > > can contain several types of pages if fragmentation is very heavy.
> > >
> >
> > If fragmentation is very heavy on a system that requires memory
> > hot-plug, I'd also be checking the value of min_free_kbytes. If it's
> > low, I suggest an init script runs
> >
> > hugeadm --set-recommended-min_free_kbytes
> >
> > because it'll keep fragmentation-related events to a minimum. The
> > mm_page_alloc_extfrag tracepoint can be used to measure fragmentation
> > events if you want to see the effect of altering min_free_kbytes like
> > this.
> >
>
> Hmm, then what should I do in this patch ?
>
Nothing, it's simply an observation that maybe you would like to pass on
to heavy users of memory hot-remove.
> > > To get precise information, we need to check
> > > - the pageblock only contains free pages or LRU pages.
> > >
> > > This patch adds the function __count_unmovable_pages() and makes
> > > above 2 checks to use the same logic. This will improve user experience
> > > of memory hotplug because sysfs interface tells accurate information.
> > >
> > > Note:
> > > it may be better to check MIGRATE_UNMOVABLE for making failure case quick.
> > >
> > > Changelog: 2010/09/06
> > > - added comments.
> > > - removed zone->lock.
> > > - changed the name of the function to be is_pageblock_removable_async().
> > > because I removed the zone->lock.
> > >
> > > Reported-by: Michal Hocko <mhocko@suse.cz>
> > > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > > ---
> > > include/linux/memory_hotplug.h | 1
> > > mm/memory_hotplug.c | 15 -------
> > > mm/page_alloc.c | 77 ++++++++++++++++++++++++++++++-----------
> > > 3 files changed, 60 insertions(+), 33 deletions(-)
> > >
> > > Index: kametest/mm/page_alloc.c
> > > ===================================================================
> > > --- kametest.orig/mm/page_alloc.c
> > > +++ kametest/mm/page_alloc.c
> > > @@ -5274,11 +5274,61 @@ void set_pageblock_flags_group(struct pa
> > > * page allocater never alloc memory from ISOLATE block.
> > > */
> > >
> > > +static int __count_immobile_pages(struct zone *zone, struct page *page)
> > > +{
> >
> > This will also count RECLAIMABLE pages belonging to some slab objects.
>
> > These are potentially hot-removable if slab is shrunk. Your function gives a
> > more accurate count but not necessarily a better user-experience with respect
> > to finding sections to hot-remove. You might like to detect PageSlab pages
> > that belong to a RECLAIMABLE slab and not count these as immobile.
> >
>
> RECLAIMABLE object is not _always_ reclaimable.
> Should we add "maybe reclaimable" return value ?
>
Your call but it would seem reasonable. Saying they are certainly not
reclaimable just seems a bit aggressive.
>
> >
> > > + unsigned long pfn, iter, found;
> > > + /*
> > > + * For avoiding noise data, lru_add_drain_all() should be called
> > > + * If ZONE_MOVABLE, the zone never contains immobile pages
> > > + */
> > > + if (zone_idx(zone) == ZONE_MOVABLE)
> > > + return 0;
> > > +
> > > + pfn = page_to_pfn(page);
> > > + for (found = 0, iter = 0; iter < pageblock_nr_pages; iter++) {
> > > + unsigned long check = pfn + iter;
> > > +
> > > + if (!pfn_valid_within(check)) {
> > > + iter++;
> > > + continue;
> > > + }
> > > + page = pfn_to_page(check);
> > > + if (!page_count(page)) {
> > > + if (PageBuddy(page))
> > > + iter += (1 << page_order(page)) - 1;
> > > + continue;
> > > + }
> > > + if (!PageLRU(page))
> > > + found++;
> >
> > Arguably, you do not care how many pages there are, you just care if
> > there is one truely unmovable page. If you find one of them, then have
> > this function return fail to avoid the rest of the scan.
> >
>
> This is for IBM guys who added stupid notifier for memory hotplug scanning.
I wasn't aware of it, my bad.
> > > <SNIP>
> > > pfn = page_to_pfn(page);
> > > arg.start_pfn = pfn;
> > > @@ -5315,19 +5360,13 @@ int set_migratetype_isolate(struct page
> > > notifier_ret = notifier_to_errno(notifier_ret);
> > > if (notifier_ret)
> > > goto out;
> > > + immobile = __count_immobile_pages(zone ,page);
> > >
> > > - for (iter = pfn; iter < (pfn + pageblock_nr_pages); iter++) {
> > > - if (!pfn_valid_within(pfn))
> > > - continue;
> > > -
> > > - curr_page = pfn_to_page(iter);
> > > - if (!page_count(curr_page) || PageLRU(curr_page))
> > > - continue;
> > > -
> > > - immobile++;
> > > - }
> > > -
> > > - if (arg.pages_found == immobile)
> > > + /*
> > > + * immobile means "not-on-lru" paes. If immobile is larger than
> > > + * removable-by-driver pages reported by notifier, we'll fail.
> > > + */
> > > + if (!immobile || arg.pages_found >= immobile)
> > > ret = 0;
> > >
> >
> > Here is where I'd suggest reimplementing __count_immobile_pages as
> > pageblock_any_immobile() that returns true if it detects an immobile page
> > in a given PFN range.
> >
>
> How about
>
> __count_immobile_pages(page, count);
> returns true if count >= immobile.
>
If it's returning a boolean, it's no longer a count so the name is
misleading. I think it would be clearer as well to deal with PFN ranges
instead of page -> page+count but maybe that's just me.
> Maybe we can add extention whether we allow RECLAIMABLE pages in the range or not.
>
Return an enum for each of unmovable, movable_hard, movable_easy maybe where
detecting any RECLAIMABLE page make the pageblock "hard to move".
>
> > > out:
> > > Index: kametest/mm/memory_hotplug.c
> > > ===================================================================
> > > --- kametest.orig/mm/memory_hotplug.c
> > > +++ kametest/mm/memory_hotplug.c
> > > @@ -602,27 +602,14 @@ static struct page *next_active_pagebloc
> > > /* Checks if this range of memory is likely to be hot-removable. */
> > > int is_mem_section_removable(unsigned long start_pfn, unsigned long nr_pages)
> > > {
> > > - int type;
> > > struct page *page = pfn_to_page(start_pfn);
> > > struct page *end_page = page + nr_pages;
> > >
> > > /* Check the starting page of each pageblock within the range */
> > > for (; page < end_page; page = next_active_pageblock(page)) {
> > > - type = get_pageblock_migratetype(page);
> > > -
> > > - /*
> > > - * A pageblock containing MOVABLE or free pages is considered
> > > - * removable
> > > - */
> > > - if (type != MIGRATE_MOVABLE && !pageblock_free(page))
> > > + if (!is_pageblock_removable_async(page))
> > > return 0;
> > >
> > > - /*
> > > - * A pageblock starting with a PageReserved page is not
> > > - * considered removable.
> > > - */
> > > - if (PageReserved(page))
> > > - return 0;
> > > }
> >
> > Bear in mind that a user or bad application constantly reading the sysfs
> > file potentially causes a lot of cache trashing as a result of the
> > linear scan instead of the pageblock type check.
> >
> yes, hmm.
>
> I think there are 3 levels of checking.
>
> 1. very strict check .... is_removable=true if the range contains only LRU and Free.
> 2. possibility check .... is_removable=true if the range contains LRU, Free, Slab(reclaimable)
> 3. fuzzy check .... is_removable=true if type = MIGRATABLE. (but very quick.)
>
> I wonder I should add sysctl (or some memory hotplug control file) and make
> "fuzzy check" as default, which is the same to old behavior.
>
A sysctl would be racy as would be a hotplug control file. Maybe three
sysfs files for the different levels of checking but allow ordinary users
to only do the "fuzzy" check? The downside is that it'd require application
modification to use the three sysfs files.
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 3/3] memory hotplug: use unified logic for is_removable and offline_pages
2010-09-06 13:30 ` Hiroyuki Kamezawa
@ 2010-09-07 13:16 ` Michal Hocko
0 siblings, 0 replies; 18+ messages in thread
From: Michal Hocko @ 2010-09-07 13:16 UTC (permalink / raw)
To: Hiroyuki Kamezawa
Cc: KAMEZAWA Hiroyuki, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, fengguang.wu,
akpm@linux-foundation.org, Mel Gorman, andi.kleen, Dave Hansen
On Mon 06-09-10 22:30:43, Hiroyuki Kamezawa wrote:
> 2010/9/6 Michal Hocko <mhocko@suse.cz>:
> > On Mon 06-09-10 14:47:16, KAMEZAWA Hiroyuki wrote:
[...]
> >> Changelog: 2010/09/06
> >> ?- added comments.
> >> ?- removed zone->lock.
> >> ?- changed the name of the function to be is_pageblock_removable_async().
> >> ? ?because I removed the zone->lock.
> >
> > wouldn't be __is_pageblock_removable a better name? _async suffix is
> > usually used for asynchronous operations and this is just a function
> > withtout locks.
> >
> rename as _is_pagebloc_removable_nolock().
Sounds good as well.
[...]
> >> +bool is_pageblock_removable_async(struct page *page)
> >> +{
> >> + ? ? struct zone *zone = page_zone(page);
> >> + ? ? unsigned long flags;
> >> + ? ? int num;
> >> + ? ? /* Don't take zone->lock interntionally. */
> >
> > Could you add the reason?
> > Don't take zone-> lock intentionally because we are called from the
> > userspace (sysfs interface).
> >
> I don't like to assume caller context which will limit the callers.
>
> /* holding zone->lock or not is caller's job. */
Sure, but I think that if you explicitely mention that the lock is not
held intentionaly then it would be good to provide some reasonining.
>
>
> > [...]
> >> ? ? ? /* All pageblocks in the memory block are likely to be hot-removable */
> >> Index: kametest/include/linux/memory_hotplug.h
> >> ===================================================================
> >> --- kametest.orig/include/linux/memory_hotplug.h
> >> +++ kametest/include/linux/memory_hotplug.h
> >> @@ -69,6 +69,7 @@ extern void online_page(struct page *pag
> >> ?/* VM interface that may be used by firmware interface */
> >> ?extern int online_pages(unsigned long, unsigned long);
> >> ?extern void __offline_isolated_pages(unsigned long, unsigned long);
> >
> > #ifdef CONFIG_HOTREMOVE
> >
> >> +extern bool is_pageblock_removable_async(struct page *page);
> >
> > #else
> > #define is_pageblock_removable_async(p) 0
> > #endif
> > ?
>
> Is this function is called even if HOTREMOVE is off ?
> If so, the caller is buggy. I'll check tomorrow.
It is not, but then it should be defined under CONFIG_HOTREMOVE without
#else part, shoudln't it?
>
> Thanks,
> -Kame
Thanks!
--
Michal Hocko
L3 team
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/3] [BUGFIX] memory hotplug: fix next block calculation in is_removable
2010-09-07 1:32 ` [PATCH 1/3] [BUGFIX] memory hotplug: fix next block calculation in is_removable KAMEZAWA Hiroyuki
@ 2010-09-08 11:23 ` Mel Gorman
0 siblings, 0 replies; 18+ messages in thread
From: Mel Gorman @ 2010-09-08 11:23 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Michal Hocko,
fengguang.wu, akpm@linux-foundation.org, andi.kleen, Dave Hansen
On Tue, Sep 07, 2010 at 10:32:44AM +0900, KAMEZAWA Hiroyuki wrote:
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>
> next_active_pageblock() is for finding next _used_ freeblock. It skips
> several blocks when it finds there are a chunk of free pages lager than
> pageblock. But it has 2 bugs.
>
> 1. We have no lock. page_order(page) - pageblock_order can be minus.
> 2. pageblocks_stride += is wrong. it should skip page_order(p) of pages.
>
> Changelog: 2010/09/07
> - fix range check of order returned by page_order().
>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2010-09-08 11:23 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-09-06 5:40 [PATCH 0/3] memory hotplug: updates and bugfix for is_removable KAMEZAWA Hiroyuki
2010-09-06 5:42 ` [BUGFIX][PATCH 1/3] memory hotplug: fix next block calculation in is_removable KAMEZAWA Hiroyuki
2010-09-06 13:39 ` Mel Gorman
2010-09-06 17:15 ` Hiroyuki Kamezawa
2010-09-07 9:30 ` Mel Gorman
2010-09-06 5:44 ` [PATCH 2/3] memory hotplug: fix set_migratetype_isolate wrong callback result check KAMEZAWA Hiroyuki
2010-09-06 5:47 ` [PATCH 3/3] memory hotplug: use unified logic for is_removable and offline_pages KAMEZAWA Hiroyuki
2010-09-06 9:30 ` Michal Hocko
2010-09-06 13:30 ` Hiroyuki Kamezawa
2010-09-07 13:16 ` Michal Hocko
2010-09-06 13:58 ` Mel Gorman
2010-09-06 23:58 ` KAMEZAWA Hiroyuki
2010-09-07 13:01 ` Mel Gorman
2010-09-07 1:28 ` [PATCH 0/3] memory hotplug: updates and bugfix for is_removable v3 KAMEZAWA Hiroyuki
2010-09-07 1:32 ` [PATCH 1/3] [BUGFIX] memory hotplug: fix next block calculation in is_removable KAMEZAWA Hiroyuki
2010-09-08 11:23 ` Mel Gorman
2010-09-07 1:34 ` [PATCH 2/3][BUGFIX] memory hotplug: fix notifier's return value check KAMEZAWA Hiroyuki
2010-09-07 1:36 ` [PATCH 3/3] memory hotplug: unify is_removable and offline detection code KAMEZAWA Hiroyuki
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).