All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
To: Lin Feng <linfeng@cn.fujitsu.com>
Cc: <akpm@linux-foundation.org>, <mgorman@suse.de>, <bcrl@kvack.org>,
	<viro@zeniv.linux.org.uk>, <khlebnikov@openvz.org>,
	<walken@google.com>, <kamezawa.hiroyu@jp.fujitsu.com>,
	<minchan@kernel.org>, <riel@redhat.com>, <rientjes@google.com>,
	<wency@cn.fujitsu.com>, <laijs@cn.fujitsu.com>,
	<tangchen@cn.fujitsu.com>, <guz.fnst@cn.fujitsu.com>,
	<jiang.liu@huawei.com>, <zab@redhat.com>, <jmoyer@redhat.com>,
	<linux-mm@kvack.org>, <linux-aio@kvack.org>,
	<linux-fsdevel@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
	<m.szyprowski@samsung.com>
Subject: Re: [PATCH V3 1/2] mm: hotplug: implement non-movable version of get_user_pages() called get_user_pages_non_movable()
Date: Wed, 6 Mar 2013 16:48:34 +0900	[thread overview]
Message-ID: <5136F4D2.8080205@jp.fujitsu.com> (raw)
In-Reply-To: <512C7C13.9050602@cn.fujitsu.com>

Hi Lin,

IMHO, current implementation depends on luck. So even if system has
many non movable memory, get_user_pages_non_movable() may not allocate
non movable memory.

At following thread, Marek Szyprowski implemented similar feature which
allocates non movable memory by "get_user_pages()".

https://lkml.org/lkml/2013/3/5/44

I think Marek's way is better.

Thanks,
Yasuaki Ishimatsu

2013/02/26 18:10, Lin Feng wrote:
> Hi Andrew, Mel and other guys,
>
> How about this V3 patch, any comments?
>
> thanks,
> linfeng
>
> On 02/21/2013 07:01 PM, Lin Feng wrote:
>> get_user_pages() always tries to allocate pages from movable zone, which is not
>>   reliable to memory hotremove framework in some case.
>>
>> This patch introduces a new library function called get_user_pages_non_movable()
>>   to pin pages only from zone non-movable in memory.
>> It's a wrapper of get_user_pages() but it makes sure that all pages come from
>> non-movable zone via additional page migration. But if migration fails it
>> will at least keep the base functionality of get_user_pages().
>>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Mel Gorman <mgorman@suse.de>
>> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
>> Cc: Jeff Moyer <jmoyer@redhat.com>
>> Cc: Minchan Kim <minchan@kernel.org>
>> Cc: Zach Brown <zab@redhat.com>
>> Reviewed-by: Tang Chen <tangchen@cn.fujitsu.com>
>> Reviewed-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
>> Signed-off-by: Lin Feng <linfeng@cn.fujitsu.com>
>> ---
>>   include/linux/mm.h     |   14 ++++++
>>   include/linux/mmzone.h |    4 ++
>>   mm/memory.c            |  103 ++++++++++++++++++++++++++++++++++++++++++++++++
>>   mm/page_isolation.c    |    8 ++++
>>   4 files changed, 129 insertions(+), 0 deletions(-)
>>
>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>> index 5625c1c..737dc39 100644
>> --- a/include/linux/mm.h
>> +++ b/include/linux/mm.h
>> @@ -1025,6 +1025,20 @@ long get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
>>   		    struct vm_area_struct **vmas);
>>   int get_user_pages_fast(unsigned long start, int nr_pages, int write,
>>   			struct page **pages);
>> +#ifdef CONFIG_MEMORY_HOTREMOVE
>> +int get_user_pages_non_movable(struct task_struct *tsk, struct mm_struct *mm,
>> +		unsigned long start, int nr_pages, int write, int force,
>> +		struct page **pages, struct vm_area_struct **vmas);
>> +#else
>> +static inline
>> +int get_user_pages_non_movable(struct task_struct *tsk, struct mm_struct *mm,
>> +		unsigned long start, int nr_pages, int write, int force,
>> +		struct page **pages, struct vm_area_struct **vmas)
>> +{
>> +	return get_user_pages(tsk, mm, start, nr_pages, write, force, pages,
>> +				vmas);
>> +}
>> +#endif
>>   struct kvec;
>>   int get_kernel_pages(const struct kvec *iov, int nr_pages, int write,
>>   			struct page **pages);
>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>> index ab20a60..c31007e 100644
>> --- a/include/linux/mmzone.h
>> +++ b/include/linux/mmzone.h
>> @@ -851,6 +851,10 @@ static inline int is_normal_idx(enum zone_type idx)
>>   	return (idx == ZONE_NORMAL);
>>   }
>>
>> +static inline int zone_is_movable(struct zone *zone)
>> +{
>> +	return zone_idx(zone) == ZONE_MOVABLE;
>> +}
>>   /**
>>    * is_highmem - helper function to quickly check if a struct zone is a
>>    *              highmem zone or not.  This is an attempt to keep references
>> diff --git a/mm/memory.c b/mm/memory.c
>> index 16ca5d0..83db7dd 100644
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -58,6 +58,8 @@
>>   #include <linux/elf.h>
>>   #include <linux/gfp.h>
>>   #include <linux/migrate.h>
>> +#include <linux/page-isolation.h>
>> +#include <linux/mm_inline.h>
>>   #include <linux/string.h>
>>
>>   #include <asm/io.h>
>> @@ -2014,6 +2016,107 @@ long get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
>>   }
>>   EXPORT_SYMBOL(get_user_pages);
>>
>> +#ifdef CONFIG_MEMORY_HOTREMOVE
>> +/**
>> + * It's a wrapper of get_user_pages() but it makes sure that all pages come from
>> + * non-movable zone via additional page migration. It's designed for memory
>> + * hotremove framework.
>> + *
>> + * Currently get_user_pages() always tries to allocate pages from movable zone,
>> + * in some case users of get_user_pages() is easy to pin user pages for a long
>> + * time(for now we found that pages pinned as aio ring pages is such case),
>> + * which is fatal for memory hotremove framework.
>> + *
>> + * This function first calls get_user_pages() to get the candidate pages, and
>> + * then check to ensure all pages are from non movable zone. Otherwise migrate
>> + * them to non movable zone, then retry. It will at most retry once. If
>> + * migration fails, it will keep the base functionality of get_user_pages()
>> + * and issue WARN message for memory hot-remove people.
>> + *
>> + * Fixme: now we don't support non movable version of GUP for hugepage.
>> + */
>> +int get_user_pages_non_movable(struct task_struct *tsk, struct mm_struct *mm,
>> +		unsigned long start, int nr_pages, int write, int force,
>> +		struct page **pages, struct vm_area_struct **vmas)
>> +{
>> +	int ret, i, tried = 0;
>> +	bool isolate_err, migrate_prepped;
>> +	LIST_HEAD(pagelist);
>> +
>> +retry:
>> +	BUG_ON(tried == 2);
>> +	ret = get_user_pages(tsk, mm, start, nr_pages, write, force, pages,
>> +				vmas);
>> +	/* No ZONE_MOVABLE populated, all pages are from non movable zone */
>> +	if (movable_zone == ZONE_MOVABLE || ret <= 0)
>> +		return ret;
>> +
>> +	isolate_err = false;
>> +	migrate_prepped = false;
>> +
>> +	for (i = 0; i < ret; i++) {
>> +		if (zone_is_movable(page_zone(pages[i]))) {
>> +			/* Fixme: improve for hugepage non movable support */
>> +			if (PageHuge(pages[i])) {
>> +				WARN_ONCE(1, "Non movable GUP for hugepages "
>> +					"haven't been implemented yet, it may "
>> +					"lead to memory hot-remove failure.\n");
>> +				continue;
>> +			}
>> +
>> +			/* Hugepage or THP's head page has covered tail pages */
>> +			if (PageTail(pages[i]) && (page_count(pages[i]) == 1))
>> +				continue;
>> +
>> +			if (!migrate_prepped) {
>> +				BUG_ON(migrate_prep());
>> +				migrate_prepped = true;
>> +			}
>> +
>> +			/* Fixme: isolate_lru_page() takes the LRU lock every
>> +			 * time, batching the lock could avoid potential lock
>> +			 * contention problems. -Mel Gorman
>> +			 */
>> +			if (!isolate_lru_page(pages[i])) {
>> +				inc_zone_page_state(pages[i], NR_ISOLATED_ANON +
>> +						 page_is_file_cache(pages[i]));
>> +				list_add(&pages[i]->lru, &pagelist);
>> +			} else {
>> +				isolate_err = true;
>> +				break;
>> +			}
>> +		}
>> +	}
>> +
>> +	/* All pages are non movable, we are done :) */
>> +	if (i == ret && list_empty(&pagelist))
>> +		return ret;
>> +
>> +	/* Undo the effects of former get_user_pages(), ready for another try */
>> +	release_pages(pages, ret, 1);
>> +
>> +	if (!isolate_err) {
>> +		ret = migrate_pages(&pagelist, alloc_migrate_target, 1,
>> +					MIGRATE_SYNC, MR_SYSCALL);
>> +		/* Steal pages from non-movable zone successfully? */
>> +		if (!ret) {
>> +			tried++;
>> +			goto retry;
>> +		}
>> +	}
>> +
>> +	putback_lru_pages(&pagelist);
>> +	/* Migration failed, in order to keep at least the base functionality of
>> +	 * get_user_pages(), we pin pages again but give WARN info to remind
>> +	 * memory hot-remove people, which is a trade-off.
>> +	 */
>> +	WARN_ONCE(1, "Non movable zone migration failed, "
>> +		"it may lead to memroy hot-remove failure.\n");
>> +	return get_user_pages(tsk, mm, start, nr_pages, write, force, pages,
>> +				vmas);
>> +}
>> +EXPORT_SYMBOL(get_user_pages_non_movable);
>> +#endif
>>   /**
>>    * get_dump_page() - pin user page in memory while writing it to core dump
>>    * @addr: user address
>> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
>> index 383bdbb..7823ea5 100644
>> --- a/mm/page_isolation.c
>> +++ b/mm/page_isolation.c
>> @@ -247,6 +247,9 @@ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn,
>>   	return ret ? 0 : -EBUSY;
>>   }
>>
>> +/**
>> + * @private: 0 means page can be alloced from movable zone, otherwise forbidden
>> + */
>>   struct page *alloc_migrate_target(struct page *page, unsigned long private,
>>   				  int **resultp)
>>   {
>> @@ -254,6 +257,11 @@ struct page *alloc_migrate_target(struct page *page, unsigned long private,
>>
>>   	if (PageHighMem(page))
>>   		gfp_mask |= __GFP_HIGHMEM;
>> +#if defined(CONFIG_MEMORY_HOTREMOVE) && defined(CONFIG_HIGHMEM)
>> +	BUILD_BUG_ON(1);
>> +#endif
>> +	if (unlikely(private != 0))
>> +		gfp_mask &= ~__GFP_HIGHMEM;
>>
>>   	return alloc_page(gfp_mask);
>>   }
>>
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
To: Lin Feng <linfeng@cn.fujitsu.com>
Cc: akpm@linux-foundation.org, mgorman@suse.de, bcrl@kvack.org,
	viro@zeniv.linux.org.uk, khlebnikov@openvz.org,
	walken@google.com, kamezawa.hiroyu@jp.fujitsu.com,
	minchan@kernel.org, riel@redhat.com, rientjes@google.com,
	wency@cn.fujitsu.com, laijs@cn.fujitsu.com,
	tangchen@cn.fujitsu.com, guz.fnst@cn.fujitsu.com,
	jiang.liu@huawei.com, zab@redhat.com, jmoyer@redhat.com,
	linux-mm@kvack.org, linux-aio@kvack.org,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	m.szyprowski@samsung.com
Subject: Re: [PATCH V3 1/2] mm: hotplug: implement non-movable version of get_user_pages() called get_user_pages_non_movable()
Date: Wed, 6 Mar 2013 16:48:34 +0900	[thread overview]
Message-ID: <5136F4D2.8080205@jp.fujitsu.com> (raw)
In-Reply-To: <512C7C13.9050602@cn.fujitsu.com>

Hi Lin,

IMHO, current implementation depends on luck. So even if system has
many non movable memory, get_user_pages_non_movable() may not allocate
non movable memory.

At following thread, Marek Szyprowski implemented similar feature which
allocates non movable memory by "get_user_pages()".

https://lkml.org/lkml/2013/3/5/44

I think Marek's way is better.

Thanks,
Yasuaki Ishimatsu

2013/02/26 18:10, Lin Feng wrote:
> Hi Andrew, Mel and other guys,
>
> How about this V3 patch, any comments?
>
> thanks,
> linfeng
>
> On 02/21/2013 07:01 PM, Lin Feng wrote:
>> get_user_pages() always tries to allocate pages from movable zone, which is not
>>   reliable to memory hotremove framework in some case.
>>
>> This patch introduces a new library function called get_user_pages_non_movable()
>>   to pin pages only from zone non-movable in memory.
>> It's a wrapper of get_user_pages() but it makes sure that all pages come from
>> non-movable zone via additional page migration. But if migration fails it
>> will at least keep the base functionality of get_user_pages().
>>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Mel Gorman <mgorman@suse.de>
>> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
>> Cc: Jeff Moyer <jmoyer@redhat.com>
>> Cc: Minchan Kim <minchan@kernel.org>
>> Cc: Zach Brown <zab@redhat.com>
>> Reviewed-by: Tang Chen <tangchen@cn.fujitsu.com>
>> Reviewed-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
>> Signed-off-by: Lin Feng <linfeng@cn.fujitsu.com>
>> ---
>>   include/linux/mm.h     |   14 ++++++
>>   include/linux/mmzone.h |    4 ++
>>   mm/memory.c            |  103 ++++++++++++++++++++++++++++++++++++++++++++++++
>>   mm/page_isolation.c    |    8 ++++
>>   4 files changed, 129 insertions(+), 0 deletions(-)
>>
>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>> index 5625c1c..737dc39 100644
>> --- a/include/linux/mm.h
>> +++ b/include/linux/mm.h
>> @@ -1025,6 +1025,20 @@ long get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
>>   		    struct vm_area_struct **vmas);
>>   int get_user_pages_fast(unsigned long start, int nr_pages, int write,
>>   			struct page **pages);
>> +#ifdef CONFIG_MEMORY_HOTREMOVE
>> +int get_user_pages_non_movable(struct task_struct *tsk, struct mm_struct *mm,
>> +		unsigned long start, int nr_pages, int write, int force,
>> +		struct page **pages, struct vm_area_struct **vmas);
>> +#else
>> +static inline
>> +int get_user_pages_non_movable(struct task_struct *tsk, struct mm_struct *mm,
>> +		unsigned long start, int nr_pages, int write, int force,
>> +		struct page **pages, struct vm_area_struct **vmas)
>> +{
>> +	return get_user_pages(tsk, mm, start, nr_pages, write, force, pages,
>> +				vmas);
>> +}
>> +#endif
>>   struct kvec;
>>   int get_kernel_pages(const struct kvec *iov, int nr_pages, int write,
>>   			struct page **pages);
>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>> index ab20a60..c31007e 100644
>> --- a/include/linux/mmzone.h
>> +++ b/include/linux/mmzone.h
>> @@ -851,6 +851,10 @@ static inline int is_normal_idx(enum zone_type idx)
>>   	return (idx == ZONE_NORMAL);
>>   }
>>
>> +static inline int zone_is_movable(struct zone *zone)
>> +{
>> +	return zone_idx(zone) == ZONE_MOVABLE;
>> +}
>>   /**
>>    * is_highmem - helper function to quickly check if a struct zone is a
>>    *              highmem zone or not.  This is an attempt to keep references
>> diff --git a/mm/memory.c b/mm/memory.c
>> index 16ca5d0..83db7dd 100644
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -58,6 +58,8 @@
>>   #include <linux/elf.h>
>>   #include <linux/gfp.h>
>>   #include <linux/migrate.h>
>> +#include <linux/page-isolation.h>
>> +#include <linux/mm_inline.h>
>>   #include <linux/string.h>
>>
>>   #include <asm/io.h>
>> @@ -2014,6 +2016,107 @@ long get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
>>   }
>>   EXPORT_SYMBOL(get_user_pages);
>>
>> +#ifdef CONFIG_MEMORY_HOTREMOVE
>> +/**
>> + * It's a wrapper of get_user_pages() but it makes sure that all pages come from
>> + * non-movable zone via additional page migration. It's designed for memory
>> + * hotremove framework.
>> + *
>> + * Currently get_user_pages() always tries to allocate pages from movable zone,
>> + * in some case users of get_user_pages() is easy to pin user pages for a long
>> + * time(for now we found that pages pinned as aio ring pages is such case),
>> + * which is fatal for memory hotremove framework.
>> + *
>> + * This function first calls get_user_pages() to get the candidate pages, and
>> + * then check to ensure all pages are from non movable zone. Otherwise migrate
>> + * them to non movable zone, then retry. It will at most retry once. If
>> + * migration fails, it will keep the base functionality of get_user_pages()
>> + * and issue WARN message for memory hot-remove people.
>> + *
>> + * Fixme: now we don't support non movable version of GUP for hugepage.
>> + */
>> +int get_user_pages_non_movable(struct task_struct *tsk, struct mm_struct *mm,
>> +		unsigned long start, int nr_pages, int write, int force,
>> +		struct page **pages, struct vm_area_struct **vmas)
>> +{
>> +	int ret, i, tried = 0;
>> +	bool isolate_err, migrate_prepped;
>> +	LIST_HEAD(pagelist);
>> +
>> +retry:
>> +	BUG_ON(tried == 2);
>> +	ret = get_user_pages(tsk, mm, start, nr_pages, write, force, pages,
>> +				vmas);
>> +	/* No ZONE_MOVABLE populated, all pages are from non movable zone */
>> +	if (movable_zone == ZONE_MOVABLE || ret <= 0)
>> +		return ret;
>> +
>> +	isolate_err = false;
>> +	migrate_prepped = false;
>> +
>> +	for (i = 0; i < ret; i++) {
>> +		if (zone_is_movable(page_zone(pages[i]))) {
>> +			/* Fixme: improve for hugepage non movable support */
>> +			if (PageHuge(pages[i])) {
>> +				WARN_ONCE(1, "Non movable GUP for hugepages "
>> +					"haven't been implemented yet, it may "
>> +					"lead to memory hot-remove failure.\n");
>> +				continue;
>> +			}
>> +
>> +			/* Hugepage or THP's head page has covered tail pages */
>> +			if (PageTail(pages[i]) && (page_count(pages[i]) == 1))
>> +				continue;
>> +
>> +			if (!migrate_prepped) {
>> +				BUG_ON(migrate_prep());
>> +				migrate_prepped = true;
>> +			}
>> +
>> +			/* Fixme: isolate_lru_page() takes the LRU lock every
>> +			 * time, batching the lock could avoid potential lock
>> +			 * contention problems. -Mel Gorman
>> +			 */
>> +			if (!isolate_lru_page(pages[i])) {
>> +				inc_zone_page_state(pages[i], NR_ISOLATED_ANON +
>> +						 page_is_file_cache(pages[i]));
>> +				list_add(&pages[i]->lru, &pagelist);
>> +			} else {
>> +				isolate_err = true;
>> +				break;
>> +			}
>> +		}
>> +	}
>> +
>> +	/* All pages are non movable, we are done :) */
>> +	if (i == ret && list_empty(&pagelist))
>> +		return ret;
>> +
>> +	/* Undo the effects of former get_user_pages(), ready for another try */
>> +	release_pages(pages, ret, 1);
>> +
>> +	if (!isolate_err) {
>> +		ret = migrate_pages(&pagelist, alloc_migrate_target, 1,
>> +					MIGRATE_SYNC, MR_SYSCALL);
>> +		/* Steal pages from non-movable zone successfully? */
>> +		if (!ret) {
>> +			tried++;
>> +			goto retry;
>> +		}
>> +	}
>> +
>> +	putback_lru_pages(&pagelist);
>> +	/* Migration failed, in order to keep at least the base functionality of
>> +	 * get_user_pages(), we pin pages again but give WARN info to remind
>> +	 * memory hot-remove people, which is a trade-off.
>> +	 */
>> +	WARN_ONCE(1, "Non movable zone migration failed, "
>> +		"it may lead to memroy hot-remove failure.\n");
>> +	return get_user_pages(tsk, mm, start, nr_pages, write, force, pages,
>> +				vmas);
>> +}
>> +EXPORT_SYMBOL(get_user_pages_non_movable);
>> +#endif
>>   /**
>>    * get_dump_page() - pin user page in memory while writing it to core dump
>>    * @addr: user address
>> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
>> index 383bdbb..7823ea5 100644
>> --- a/mm/page_isolation.c
>> +++ b/mm/page_isolation.c
>> @@ -247,6 +247,9 @@ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn,
>>   	return ret ? 0 : -EBUSY;
>>   }
>>
>> +/**
>> + * @private: 0 means page can be alloced from movable zone, otherwise forbidden
>> + */
>>   struct page *alloc_migrate_target(struct page *page, unsigned long private,
>>   				  int **resultp)
>>   {
>> @@ -254,6 +257,11 @@ struct page *alloc_migrate_target(struct page *page, unsigned long private,
>>
>>   	if (PageHighMem(page))
>>   		gfp_mask |= __GFP_HIGHMEM;
>> +#if defined(CONFIG_MEMORY_HOTREMOVE) && defined(CONFIG_HIGHMEM)
>> +	BUILD_BUG_ON(1);
>> +#endif
>> +	if (unlikely(private != 0))
>> +		gfp_mask &= ~__GFP_HIGHMEM;
>>
>>   	return alloc_page(gfp_mask);
>>   }
>>
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
To: Lin Feng <linfeng@cn.fujitsu.com>
Cc: <akpm@linux-foundation.org>, <mgorman@suse.de>, <bcrl@kvack.org>,
	<viro@zeniv.linux.org.uk>, <khlebnikov@openvz.org>,
	<walken@google.com>, <kamezawa.hiroyu@jp.fujitsu.com>,
	<minchan@kernel.org>, <riel@redhat.com>, <rientjes@google.com>,
	<wency@cn.fujitsu.com>, <laijs@cn.fujitsu.com>,
	<tangchen@cn.fujitsu.com>, <guz.fnst@cn.fujitsu.com>,
	<jiang.liu@huawei.com>, <zab@redhat.com>, <jmoyer@redhat.com>,
	<linux-mm@kvack.org>, <linux-aio@kvack.org>,
	<linux-fsdevel@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
	<m.szyprowski@samsung.com>
Subject: Re: [PATCH V3 1/2] mm: hotplug: implement non-movable version of get_user_pages() called get_user_pages_non_movable()
Date: Wed, 6 Mar 2013 16:48:34 +0900	[thread overview]
Message-ID: <5136F4D2.8080205@jp.fujitsu.com> (raw)
In-Reply-To: <512C7C13.9050602@cn.fujitsu.com>

Hi Lin,

IMHO, current implementation depends on luck. So even if system has
many non movable memory, get_user_pages_non_movable() may not allocate
non movable memory.

At following thread, Marek Szyprowski implemented similar feature which
allocates non movable memory by "get_user_pages()".

https://lkml.org/lkml/2013/3/5/44

I think Marek's way is better.

Thanks,
Yasuaki Ishimatsu

2013/02/26 18:10, Lin Feng wrote:
> Hi Andrew, Mel and other guys,
>
> How about this V3 patch, any comments?
>
> thanks,
> linfeng
>
> On 02/21/2013 07:01 PM, Lin Feng wrote:
>> get_user_pages() always tries to allocate pages from movable zone, which is not
>>   reliable to memory hotremove framework in some case.
>>
>> This patch introduces a new library function called get_user_pages_non_movable()
>>   to pin pages only from zone non-movable in memory.
>> It's a wrapper of get_user_pages() but it makes sure that all pages come from
>> non-movable zone via additional page migration. But if migration fails it
>> will at least keep the base functionality of get_user_pages().
>>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Mel Gorman <mgorman@suse.de>
>> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
>> Cc: Jeff Moyer <jmoyer@redhat.com>
>> Cc: Minchan Kim <minchan@kernel.org>
>> Cc: Zach Brown <zab@redhat.com>
>> Reviewed-by: Tang Chen <tangchen@cn.fujitsu.com>
>> Reviewed-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
>> Signed-off-by: Lin Feng <linfeng@cn.fujitsu.com>
>> ---
>>   include/linux/mm.h     |   14 ++++++
>>   include/linux/mmzone.h |    4 ++
>>   mm/memory.c            |  103 ++++++++++++++++++++++++++++++++++++++++++++++++
>>   mm/page_isolation.c    |    8 ++++
>>   4 files changed, 129 insertions(+), 0 deletions(-)
>>
>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>> index 5625c1c..737dc39 100644
>> --- a/include/linux/mm.h
>> +++ b/include/linux/mm.h
>> @@ -1025,6 +1025,20 @@ long get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
>>   		    struct vm_area_struct **vmas);
>>   int get_user_pages_fast(unsigned long start, int nr_pages, int write,
>>   			struct page **pages);
>> +#ifdef CONFIG_MEMORY_HOTREMOVE
>> +int get_user_pages_non_movable(struct task_struct *tsk, struct mm_struct *mm,
>> +		unsigned long start, int nr_pages, int write, int force,
>> +		struct page **pages, struct vm_area_struct **vmas);
>> +#else
>> +static inline
>> +int get_user_pages_non_movable(struct task_struct *tsk, struct mm_struct *mm,
>> +		unsigned long start, int nr_pages, int write, int force,
>> +		struct page **pages, struct vm_area_struct **vmas)
>> +{
>> +	return get_user_pages(tsk, mm, start, nr_pages, write, force, pages,
>> +				vmas);
>> +}
>> +#endif
>>   struct kvec;
>>   int get_kernel_pages(const struct kvec *iov, int nr_pages, int write,
>>   			struct page **pages);
>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>> index ab20a60..c31007e 100644
>> --- a/include/linux/mmzone.h
>> +++ b/include/linux/mmzone.h
>> @@ -851,6 +851,10 @@ static inline int is_normal_idx(enum zone_type idx)
>>   	return (idx == ZONE_NORMAL);
>>   }
>>
>> +static inline int zone_is_movable(struct zone *zone)
>> +{
>> +	return zone_idx(zone) == ZONE_MOVABLE;
>> +}
>>   /**
>>    * is_highmem - helper function to quickly check if a struct zone is a
>>    *              highmem zone or not.  This is an attempt to keep references
>> diff --git a/mm/memory.c b/mm/memory.c
>> index 16ca5d0..83db7dd 100644
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -58,6 +58,8 @@
>>   #include <linux/elf.h>
>>   #include <linux/gfp.h>
>>   #include <linux/migrate.h>
>> +#include <linux/page-isolation.h>
>> +#include <linux/mm_inline.h>
>>   #include <linux/string.h>
>>
>>   #include <asm/io.h>
>> @@ -2014,6 +2016,107 @@ long get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
>>   }
>>   EXPORT_SYMBOL(get_user_pages);
>>
>> +#ifdef CONFIG_MEMORY_HOTREMOVE
>> +/**
>> + * It's a wrapper of get_user_pages() but it makes sure that all pages come from
>> + * non-movable zone via additional page migration. It's designed for memory
>> + * hotremove framework.
>> + *
>> + * Currently get_user_pages() always tries to allocate pages from movable zone,
>> + * in some case users of get_user_pages() is easy to pin user pages for a long
>> + * time(for now we found that pages pinned as aio ring pages is such case),
>> + * which is fatal for memory hotremove framework.
>> + *
>> + * This function first calls get_user_pages() to get the candidate pages, and
>> + * then check to ensure all pages are from non movable zone. Otherwise migrate
>> + * them to non movable zone, then retry. It will at most retry once. If
>> + * migration fails, it will keep the base functionality of get_user_pages()
>> + * and issue WARN message for memory hot-remove people.
>> + *
>> + * Fixme: now we don't support non movable version of GUP for hugepage.
>> + */
>> +int get_user_pages_non_movable(struct task_struct *tsk, struct mm_struct *mm,
>> +		unsigned long start, int nr_pages, int write, int force,
>> +		struct page **pages, struct vm_area_struct **vmas)
>> +{
>> +	int ret, i, tried = 0;
>> +	bool isolate_err, migrate_prepped;
>> +	LIST_HEAD(pagelist);
>> +
>> +retry:
>> +	BUG_ON(tried == 2);
>> +	ret = get_user_pages(tsk, mm, start, nr_pages, write, force, pages,
>> +				vmas);
>> +	/* No ZONE_MOVABLE populated, all pages are from non movable zone */
>> +	if (movable_zone == ZONE_MOVABLE || ret <= 0)
>> +		return ret;
>> +
>> +	isolate_err = false;
>> +	migrate_prepped = false;
>> +
>> +	for (i = 0; i < ret; i++) {
>> +		if (zone_is_movable(page_zone(pages[i]))) {
>> +			/* Fixme: improve for hugepage non movable support */
>> +			if (PageHuge(pages[i])) {
>> +				WARN_ONCE(1, "Non movable GUP for hugepages "
>> +					"haven't been implemented yet, it may "
>> +					"lead to memory hot-remove failure.\n");
>> +				continue;
>> +			}
>> +
>> +			/* Hugepage or THP's head page has covered tail pages */
>> +			if (PageTail(pages[i]) && (page_count(pages[i]) == 1))
>> +				continue;
>> +
>> +			if (!migrate_prepped) {
>> +				BUG_ON(migrate_prep());
>> +				migrate_prepped = true;
>> +			}
>> +
>> +			/* Fixme: isolate_lru_page() takes the LRU lock every
>> +			 * time, batching the lock could avoid potential lock
>> +			 * contention problems. -Mel Gorman
>> +			 */
>> +			if (!isolate_lru_page(pages[i])) {
>> +				inc_zone_page_state(pages[i], NR_ISOLATED_ANON +
>> +						 page_is_file_cache(pages[i]));
>> +				list_add(&pages[i]->lru, &pagelist);
>> +			} else {
>> +				isolate_err = true;
>> +				break;
>> +			}
>> +		}
>> +	}
>> +
>> +	/* All pages are non movable, we are done :) */
>> +	if (i == ret && list_empty(&pagelist))
>> +		return ret;
>> +
>> +	/* Undo the effects of former get_user_pages(), ready for another try */
>> +	release_pages(pages, ret, 1);
>> +
>> +	if (!isolate_err) {
>> +		ret = migrate_pages(&pagelist, alloc_migrate_target, 1,
>> +					MIGRATE_SYNC, MR_SYSCALL);
>> +		/* Steal pages from non-movable zone successfully? */
>> +		if (!ret) {
>> +			tried++;
>> +			goto retry;
>> +		}
>> +	}
>> +
>> +	putback_lru_pages(&pagelist);
>> +	/* Migration failed, in order to keep at least the base functionality of
>> +	 * get_user_pages(), we pin pages again but give WARN info to remind
>> +	 * memory hot-remove people, which is a trade-off.
>> +	 */
>> +	WARN_ONCE(1, "Non movable zone migration failed, "
>> +		"it may lead to memroy hot-remove failure.\n");
>> +	return get_user_pages(tsk, mm, start, nr_pages, write, force, pages,
>> +				vmas);
>> +}
>> +EXPORT_SYMBOL(get_user_pages_non_movable);
>> +#endif
>>   /**
>>    * get_dump_page() - pin user page in memory while writing it to core dump
>>    * @addr: user address
>> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
>> index 383bdbb..7823ea5 100644
>> --- a/mm/page_isolation.c
>> +++ b/mm/page_isolation.c
>> @@ -247,6 +247,9 @@ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn,
>>   	return ret ? 0 : -EBUSY;
>>   }
>>
>> +/**
>> + * @private: 0 means page can be alloced from movable zone, otherwise forbidden
>> + */
>>   struct page *alloc_migrate_target(struct page *page, unsigned long private,
>>   				  int **resultp)
>>   {
>> @@ -254,6 +257,11 @@ struct page *alloc_migrate_target(struct page *page, unsigned long private,
>>
>>   	if (PageHighMem(page))
>>   		gfp_mask |= __GFP_HIGHMEM;
>> +#if defined(CONFIG_MEMORY_HOTREMOVE) && defined(CONFIG_HIGHMEM)
>> +	BUILD_BUG_ON(1);
>> +#endif
>> +	if (unlikely(private != 0))
>> +		gfp_mask &= ~__GFP_HIGHMEM;
>>
>>   	return alloc_page(gfp_mask);
>>   }
>>
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>


  reply	other threads:[~2013-03-06  7:48 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-02-21 11:01 [PATCH V3 0/2] mm: hotplug: implement non-movable version of get_user_pages() to kill long-time pin pages Lin Feng
2013-02-21 11:01 ` Lin Feng
2013-02-21 11:01 ` Lin Feng
2013-02-21 11:01 ` [PATCH V3 1/2] mm: hotplug: implement non-movable version of get_user_pages() called get_user_pages_non_movable() Lin Feng
2013-02-21 11:01   ` Lin Feng
2013-02-21 11:01   ` Lin Feng
2013-02-26  9:10   ` Lin Feng
2013-02-26  9:10     ` Lin Feng
2013-03-06  7:48     ` Yasuaki Ishimatsu [this message]
2013-03-06  7:48       ` Yasuaki Ishimatsu
2013-03-06  7:48       ` Yasuaki Ishimatsu
2013-03-06 11:29       ` Lin Feng
2013-03-06 11:29         ` Lin Feng
2013-03-06 11:29         ` Lin Feng
2013-02-21 11:01 ` [PATCH V3 2/2] fs/aio.c: use get_user_pages_non_movable() to pin ring pages when support memory hotremove Lin Feng
2013-02-21 11:01   ` Lin Feng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5136F4D2.8080205@jp.fujitsu.com \
    --to=isimatu.yasuaki@jp.fujitsu.com \
    --cc=akpm@linux-foundation.org \
    --cc=bcrl@kvack.org \
    --cc=guz.fnst@cn.fujitsu.com \
    --cc=jiang.liu@huawei.com \
    --cc=jmoyer@redhat.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=khlebnikov@openvz.org \
    --cc=laijs@cn.fujitsu.com \
    --cc=linfeng@cn.fujitsu.com \
    --cc=linux-aio@kvack.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=m.szyprowski@samsung.com \
    --cc=mgorman@suse.de \
    --cc=minchan@kernel.org \
    --cc=riel@redhat.com \
    --cc=rientjes@google.com \
    --cc=tangchen@cn.fujitsu.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=walken@google.com \
    --cc=wency@cn.fujitsu.com \
    --cc=zab@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.