From: Andrew Morton <akpm@linux-foundation.org>
To: Lin Feng <linfeng@cn.fujitsu.com>
Cc: mgorman@suse.de, bcrl@kvack.org, viro@zeniv.linux.org.uk,
khlebnikov@openvz.org, walken@google.com,
kamezawa.hiroyu@jp.fujitsu.com, minchan@kernel.org,
riel@redhat.com, rientjes@google.com,
isimatu.yasuaki@jp.fujitsu.com, wency@cn.fujitsu.com,
laijs@cn.fujitsu.com, jiang.liu@huawei.com, linux-mm@kvack.org,
linux-aio@kvack.org, linux-fsdevel@vger.kernel.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/2] mm: hotplug: implement non-movable version of get_user_pages() called get_user_pages_non_movable()
Date: Mon, 4 Feb 2013 16:06:24 -0800 [thread overview]
Message-ID: <20130204160624.5c20a8a0.akpm@linux-foundation.org> (raw)
In-Reply-To: <1359972248-8722-2-git-send-email-linfeng@cn.fujitsu.com>
melreadthis
On Mon, 4 Feb 2013 18:04:07 +0800
Lin Feng <linfeng@cn.fujitsu.com> wrote:
> get_user_pages() always tries to allocate pages from movable zone, which is not
> reliable to memory hotremove framework in some case.
>
> This patch introduces a new library function called get_user_pages_non_movable()
> to pin pages only from zone non-movable in memory.
> It's a wrapper of get_user_pages() but it makes sure that all pages come from
> non-movable zone via additional page migration.
>
> ...
>
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1049,6 +1049,11 @@ int get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
> struct page **pages, struct vm_area_struct **vmas);
> int get_user_pages_fast(unsigned long start, int nr_pages, int write,
> struct page **pages);
> +#ifdef CONFIG_MEMORY_HOTREMOVE
> +int get_user_pages_non_movable(struct task_struct *tsk, struct mm_struct *mm,
> + unsigned long start, int nr_pages, int write, int force,
> + struct page **pages, struct vm_area_struct **vmas);
> +#endif
The ifdefs aren't really needed here and I encourage people to omit
them. This keeps the header files looking neater and reduces the
chances of things later breaking because we forgot to update some
CONFIG_foo logic in a header file. The downside is that errors will be
revealed at link time rather than at compile time, but that's a pretty
small cost.
> struct kvec;
> int get_kernel_pages(const struct kvec *iov, int nr_pages, int write,
> struct page **pages);
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 73b64a3..5db811e 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -838,6 +838,10 @@ static inline int is_normal_idx(enum zone_type idx)
> return (idx == ZONE_NORMAL);
> }
>
> +static inline int is_movable(struct zone *zone)
> +{
> + return zone == zone->zone_pgdat->node_zones + ZONE_MOVABLE;
> +}
A better name would be zone_is_movable(). We haven't been very
consistent about this in mmzone.h, but zone_is_foo() is pretty common.
And a neater implementation would be
return zone_idx(zone) == ZONE_MOVABLE;
All of which made me look at mmzone.h, and what I saw wasn't very nice :(
Please review...
From: Andrew Morton <akpm@linux-foundation.org>
Subject: include/linux/mmzone.h: cleanups
- implement zone_idx() in C to fix its references-args-twice macro bug
- use zone_idx() in is_highmem() to remove large amounts of silly fluff.
Cc: Lin Feng <linfeng@cn.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/mmzone.h | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)
diff -puN include/linux/mmzone.h~include-linux-mmzoneh-cleanups include/linux/mmzone.h
--- a/include/linux/mmzone.h~include-linux-mmzoneh-cleanups
+++ a/include/linux/mmzone.h
@@ -815,7 +815,10 @@ unsigned long __init node_memmap_size_by
/*
* zone_idx() returns 0 for the ZONE_DMA zone, 1 for the ZONE_NORMAL zone, etc.
*/
-#define zone_idx(zone) ((zone) - (zone)->zone_pgdat->node_zones)
+static inline enum zone_type zone_idx(struct zone *zone)
+{
+ return zone - zone->zone_pgdat->node_zones;
+}
static inline int populated_zone(struct zone *zone)
{
@@ -857,10 +860,10 @@ static inline int is_normal_idx(enum zon
static inline int is_highmem(struct zone *zone)
{
#ifdef CONFIG_HIGHMEM
- int zone_off = (char *)zone - (char *)zone->zone_pgdat->node_zones;
- return zone_off == ZONE_HIGHMEM * sizeof(*zone) ||
- (zone_off == ZONE_MOVABLE * sizeof(*zone) &&
- zone_movable_is_highmem());
+ enum zone_type idx = zone_idx(zone);
+
+ return idx == ZONE_HIGHMEM ||
+ (idx == ZONE_MOVABLE && zone_movable_is_highmem());
#else
return 0;
#endif
_
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -58,6 +58,8 @@
> #include <linux/elf.h>
> #include <linux/gfp.h>
> #include <linux/migrate.h>
> +#include <linux/page-isolation.h>
> +#include <linux/mm_inline.h>
> #include <linux/string.h>
>
> #include <asm/io.h>
> @@ -1995,6 +1997,67 @@ int get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
> }
> EXPORT_SYMBOL(get_user_pages);
>
> +#ifdef CONFIG_MEMORY_HOTREMOVE
> +/**
> + * It's a wrapper of get_user_pages() but it makes sure that all pages come from
> + * non-movable zone via additional page migration.
> + */
This needs a description of why the function exists - say something
about the requirements of memory hotplug.
Also a few words describing how the function works would be good.
> +int get_user_pages_non_movable(struct task_struct *tsk, struct mm_struct *mm,
> + unsigned long start, int nr_pages, int write, int force,
> + struct page **pages, struct vm_area_struct **vmas)
> +{
> + int ret, i, isolate_err, migrate_pre_flag;
> + LIST_HEAD(pagelist);
> +
> +retry:
> + ret = get_user_pages(tsk, mm, start, nr_pages, write, force, pages,
> + vmas);
We should handle (ret < 0) here. At present the function will silently
convert an error return into "return 0", which is bad. The function
does appear to otherwise do the right thing if get_user_pages() failed,
but only due to good luck.
> + isolate_err = 0;
> + migrate_pre_flag = 0;
> +
> + for (i = 0; i < ret; i++) {
> + if (is_movable(page_zone(pages[i]))) {
> + if (!migrate_pre_flag) {
> + if (migrate_prep())
> + goto put_page;
> + migrate_pre_flag = 1;
> + }
> +
> + if (!isolate_lru_page(pages[i])) {
> + inc_zone_page_state(pages[i], NR_ISOLATED_ANON +
> + page_is_file_cache(pages[i]));
> + list_add_tail(&pages[i]->lru, &pagelist);
> + } else {
> + isolate_err = 1;
> + goto put_page;
> + }
> + }
> + }
> +
> + /* All pages are non movable, we are done :) */
> + if (i == ret && list_empty(&pagelist))
> + return ret;
> +
> +put_page:
> + /* Undo the effects of former get_user_pages(), we won't pin anything */
> + for (i = 0; i < ret; i++)
> + put_page(pages[i]);
> +
> + if (migrate_pre_flag && !isolate_err) {
> + ret = migrate_pages(&pagelist, alloc_migrate_target, 1,
> + false, MIGRATE_SYNC, MR_SYSCALL);
> + /* Steal pages from non-movable zone successfully? */
> + if (!ret)
> + goto retry;
This is buggy. migrate_pages() doesn't empty its `from' argument, so
page_list must be reinitialised here (or, better, at the start of the loop).
Mel, what's up with migrate_pages()? Shouldn't it be removing the
pages from the list when MIGRATEPAGE_SUCCESS? The use of
list_for_each_entry_safe() suggests we used to do that...
> + }
> +
> + putback_lru_pages(&pagelist);
> + return 0;
> +}
> +EXPORT_SYMBOL(get_user_pages_non_movable);
> +#endif
> +
Generally, I'd like Mel to go through (the next version of) this patch
carefully, please.
--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org. For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>
next prev parent reply other threads:[~2013-02-05 0:06 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-02-04 10:04 [PATCH 0/2] mm: hotplug: implement non-movable version of get_user_pages() to kill long-time pin pages Lin Feng
2013-02-04 10:04 ` [PATCH 1/2] mm: hotplug: implement non-movable version of get_user_pages() called get_user_pages_non_movable() Lin Feng
2013-02-05 0:06 ` Andrew Morton [this message]
2013-02-05 0:18 ` Andrew Morton
2013-02-05 3:09 ` Lin Feng
2013-02-05 21:13 ` Andrew Morton
2013-02-05 11:57 ` Mel Gorman
2013-02-05 13:32 ` Mel Gorman
2013-02-19 13:37 ` Lin Feng
2013-02-20 2:34 ` Lin Feng
2013-02-20 2:44 ` Wanpeng Li
2013-02-20 2:44 ` Wanpeng Li
[not found] ` <20130220024435.GA30208@hacker.(null)>
2013-02-20 2:59 ` Lin Feng
2013-02-20 9:58 ` Simon Jeons
2013-02-20 10:23 ` Lin Feng
2013-02-20 11:31 ` Simon Jeons
2013-02-20 11:54 ` Lin Feng
2013-02-06 2:26 ` Michel Lespinasse
2013-02-06 10:41 ` Mel Gorman
2013-02-18 10:34 ` Lin Feng
2013-02-18 15:17 ` Mel Gorman
2013-02-19 9:55 ` Lin Feng
2013-02-19 10:34 ` Mel Gorman
2013-02-04 10:04 ` [PATCH 2/2] fs/aio.c: use get_user_pages_non_movable() to pin ring pages when support memory hotremove Lin Feng
2013-02-04 15:18 ` Jeff Moyer
2013-02-04 23:02 ` Zach Brown
2013-02-05 5:35 ` Lin Feng
2013-02-05 5:06 ` Lin Feng
2013-02-05 0:58 ` [PATCH 0/2] mm: hotplug: implement non-movable version of get_user_pages() to kill long-time pin pages Minchan Kim
2013-02-05 4:42 ` Lin Feng
2013-02-05 5:25 ` Minchan Kim
2013-02-05 6:18 ` Lin Feng
2013-02-05 7:45 ` Minchan Kim
2013-02-05 8:27 ` Lin Feng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130204160624.5c20a8a0.akpm@linux-foundation.org \
--to=akpm@linux-foundation.org \
--cc=bcrl@kvack.org \
--cc=isimatu.yasuaki@jp.fujitsu.com \
--cc=jiang.liu@huawei.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=khlebnikov@openvz.org \
--cc=laijs@cn.fujitsu.com \
--cc=linfeng@cn.fujitsu.com \
--cc=linux-aio@kvack.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=minchan@kernel.org \
--cc=riel@redhat.com \
--cc=rientjes@google.com \
--cc=viro@zeniv.linux.org.uk \
--cc=walken@google.com \
--cc=wency@cn.fujitsu.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).