From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx151.postini.com [74.125.245.151]) by kanga.kvack.org (Postfix) with SMTP id 04DE96B0032 for ; Tue, 13 Aug 2013 03:05:01 -0400 (EDT) From: Minchan Kim Subject: [RFC 0/3] Pin page control subsystem Date: Tue, 13 Aug 2013 16:04:59 +0900 Message-Id: <1376377502-28207-1-git-send-email-minchan@kernel.org> Sender: owner-linux-mm@kvack.org List-ID: To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, k.kozlowski@samsung.com, Seth Jennings , Mel Gorman , guz.fnst@cn.fujitsu.com, Benjamin LaHaise , Dave Hansen , lliubbo@gmail.com, aquini@redhat.com, Rik van Riel , Minchan Kim !! NOTICE !! It's totally untested patchset so please AVOID real testing. I'd like to show just concept and want to discuss it on very early stage. (so there isn't enough description but I guess code is very simple so not a big problem to understand the intention). This patchset is for solving *kernel* pinpage migration problem more general. Now, zswap, zram and z* family, not sure upcoming what solution are using memory don't live in harmony with VM. (I don't remember ballon compaction but we might be able to unify ballon compaction with this.) VM sometime want to migrate and/or reclaim pages for CMA, memory-hotplug, THP and so on but at the moment, it could handle only userspace pages so if above example subsystem have pinned a some page in a range VM want to migrate, migration is failed so above exmaple couldn't work well. This patchset is for basic facility for the role. patch 1 introduces a new page flags and patch 2 introduce pinpage control subsystem. So, subsystems want to control pinpage should implement own pinpage_xxx functions because each subsystem would have other character so what kinds of data structure for managing pinpage information depends on them. Otherwise, they can use general functions defined in pinpage subsystem. patch 3 hacks migration.c so that migration is aware of pinpage now and migrate them with pinpage subsystem. It exposes new rule that users of pinpage control subsystem shouldn't use struct page->flags and struct page->lru field freely because lru field is used for migration.c and flags field is used for lock_page in pinpage control subsystem. I think it's not a big problem because subsystem can use other fields of the page descriptor, instead. This patch's limitation is that it couldn't apply user space pages although I'd REALLY REALLY like to unify them. IOW, it couldn't handle long pin page by get_user_pages. Basic hurdle is that how to handle nesting cases caused by that several subsystem pin on same page with GUP but they could have different migrate methods. It could add rather complexity and overhead but I'm not sure it's worth because proved culprit until now is AIO ring pages and Gu and Benjamin have approached it with another way so I'd like to hear their opinions. Minchan Kim (3): mm: Introduce new page flag pinpage control subsystem mm: migrate pinned page include/linux/page-flags.h | 2 + include/linux/pinpage.h | 39 +++++++++++++ mm/Makefile | 2 +- mm/compaction.c | 26 ++++++++- mm/migrate.c | 58 ++++++++++++++++--- mm/page_alloc.c | 1 + mm/pinpage.c | 134 ++++++++++++++++++++++++++++++++++++++++++++ 7 files changed, 252 insertions(+), 10 deletions(-) create mode 100644 include/linux/pinpage.h create mode 100644 mm/pinpage.c -- 1.7.9.5 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: From: Minchan Kim Subject: [RFC 1/3] mm: Introduce new page flag Date: Tue, 13 Aug 2013 16:05:00 +0900 Message-Id: <1376377502-28207-2-git-send-email-minchan@kernel.org> In-Reply-To: <1376377502-28207-1-git-send-email-minchan@kernel.org> References: <1376377502-28207-1-git-send-email-minchan@kernel.org> Sender: owner-linux-mm@kvack.org List-ID: To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, k.kozlowski@samsung.com, Seth Jennings , Mel Gorman , guz.fnst@cn.fujitsu.com, Benjamin LaHaise , Dave Hansen , lliubbo@gmail.com, aquini@redhat.com, Rik van Riel , Minchan Kim Signed-off-by: Minchan Kim --- include/linux/page-flags.h | 2 ++ mm/page_alloc.c | 1 + 2 files changed, 3 insertions(+) diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 6d53675..75ce843 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -109,6 +109,7 @@ enum pageflags { #ifdef CONFIG_TRANSPARENT_HUGEPAGE PG_compound_lock, #endif + PG_pin, __NR_PAGEFLAGS, /* Filesystems */ @@ -197,6 +198,7 @@ struct page; /* forward declaration */ TESTPAGEFLAG(Locked, locked) PAGEFLAG(Error, error) TESTCLEARFLAG(Error, error) +PAGEFLAG(Pin, pin) TESTCLEARFLAG(Pin, pin) PAGEFLAG(Referenced, referenced) TESTCLEARFLAG(Referenced, referenced) PAGEFLAG(Dirty, dirty) TESTSCFLAG(Dirty, dirty) __CLEARPAGEFLAG(Dirty, dirty) PAGEFLAG(LRU, lru) __CLEARPAGEFLAG(LRU, lru) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index b100255..5dd8b43 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6345,6 +6345,7 @@ static const struct trace_print_flags pageflag_names[] = { #ifdef CONFIG_TRANSPARENT_HUGEPAGE {1UL << PG_compound_lock, "compound_lock" }, #endif + {1UL << PG_pin, "pin" }, }; static void dump_page_flags(unsigned long flags) -- 1.7.9.5 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: From: Minchan Kim Subject: [RFC 2/3] pinpage control subsystem Date: Tue, 13 Aug 2013 16:05:01 +0900 Message-Id: <1376377502-28207-3-git-send-email-minchan@kernel.org> In-Reply-To: <1376377502-28207-1-git-send-email-minchan@kernel.org> References: <1376377502-28207-1-git-send-email-minchan@kernel.org> Sender: owner-linux-mm@kvack.org List-ID: To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, k.kozlowski@samsung.com, Seth Jennings , Mel Gorman , guz.fnst@cn.fujitsu.com, Benjamin LaHaise , Dave Hansen , lliubbo@gmail.com, aquini@redhat.com, Rik van Riel , Minchan Kim Signed-off-by: Minchan Kim --- include/linux/pinpage.h | 39 ++++++++++++++ mm/Makefile | 2 +- mm/pinpage.c | 134 +++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 174 insertions(+), 1 deletion(-) create mode 100644 include/linux/pinpage.h create mode 100644 mm/pinpage.c diff --git a/include/linux/pinpage.h b/include/linux/pinpage.h new file mode 100644 index 0000000..42fbdc7 --- /dev/null +++ b/include/linux/pinpage.h @@ -0,0 +1,39 @@ +#ifndef _LINUX_PINPAGE_H +#define _LINUX_PINPAGE_H + +#include + +/* + * NOTE : pinpage_system user shouldn't use page->lru and page->flags + * fields. + */ +struct pinpage_system { + struct radix_tree_root page_tree; + spinlock_t tree_lock; + + int (*create_subsys)(struct pinpage_system *psys); + int (*destroy_subsys)(struct pinpage_system *psys); + int (*migrate)(struct pinpage_system *psys, struct page *page, + struct page *newpage); + int (*add_page)(struct pinpage_system *psys, struct page *page, + void *private); + int (*del_page)(struct pinpage_system *psys, struct page *page); + int (*find_page)(struct pinpage_system *psys, struct page *page); + + struct list_head list; +}; + +extern int general_create_subsys(struct pinpage_system *psys); +extern int general_destroy_subsys(struct pinpage_system *psys); +extern int general_add_page(struct pinpage_system *psys, struct page *page, + void *private); +extern int general_del_page(struct pinpage_system *psys, struct page *page); +extern int general_find_page(struct pinpage_system *psys, struct page *page); + +extern int set_pinpage(struct pinpage_system *psys, struct page *page, + void *private); +extern int register_pinpage(struct pinpage_system *psys); +extern int migrate_pinpage(struct page *page, struct page *newpage); + +#endif + diff --git a/mm/Makefile b/mm/Makefile index f008033..bf4a2d9 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -5,7 +5,7 @@ mmu-y := nommu.o mmu-$(CONFIG_MMU) := fremap.o highmem.o madvise.o memory.o mincore.o \ mlock.o mmap.o mprotect.o mremap.o msync.o rmap.o \ - vmalloc.o pagewalk.o pgtable-generic.o + vmalloc.o pagewalk.o pgtable-generic.o pinpage.o ifdef CONFIG_CROSS_MEMORY_ATTACH mmu-$(CONFIG_MMU) += process_vm_access.o diff --git a/mm/pinpage.c b/mm/pinpage.c new file mode 100644 index 0000000..0833204 --- /dev/null +++ b/mm/pinpage.c @@ -0,0 +1,134 @@ +#include +#include +#include +#include + +static DEFINE_SPINLOCK(pinpage_system_lock); +static LIST_HEAD(pinpage_system_list); + +struct pinpage_info { + unsigned long pfn; + void *private; +}; + +int general_create_subsys(struct pinpage_system *psys) +{ + INIT_RADIX_TREE(&psys->page_tree, GFP_KERNEL); + spin_lock_init(&psys->tree_lock); + return 0; +} +EXPORT_SYMBOL(general_create_subsys); + +int general_destroy_subsys(struct pinpage_system *psys) +{ + return 0; +} +EXPORT_SYMBOL(general_destroy_subsys); + +int general_add_page(struct pinpage_system *psys, struct page *page, + void *private) +{ + int ret = -ENOMEM; + unsigned long pfn = page_to_pfn(page); + struct pinpage_info *pinfo = kmalloc(sizeof(pinfo), GFP_KERNEL); + if (!pinfo) + return ret; + + pinfo->pfn = pfn; + pinfo->private = private; + + spin_lock(&psys->tree_lock); + ret = radix_tree_insert(&psys->page_tree, pfn, pinfo); + spin_unlock(&psys->tree_lock); + return ret; +} +EXPORT_SYMBOL(general_add_page); + +int general_del_page(struct pinpage_system *psys, struct page *page) +{ + struct pinpage_info *pinfo; + spin_lock(&psys->tree_lock); + pinfo = radix_tree_lookup(&psys->page_tree, page_to_pfn(page)); + if (!pinfo) { + spin_unlock(&psys->tree_lock); + return -EINVAL; + } + radix_tree_delete(&psys->page_tree, page_to_pfn(page)); + spin_unlock(&psys->tree_lock); + return 0; +} +EXPORT_SYMBOL(general_del_page); + +int general_find_page(struct pinpage_system *psys, struct page *page) +{ + struct pinpage_info *pinfo; + spin_lock(&psys->tree_lock); + pinfo = radix_tree_lookup(&psys->page_tree, page_to_pfn(page)); + spin_unlock(&psys->tree_lock); + return pinfo ? 1 : 0; +} +EXPORT_SYMBOL(general_find_page); + +int set_pinpage(struct pinpage_system *psys, struct page *page, void *private) +{ + int ret; + ret = psys->add_page(psys, page, private); + if (!ret) { + lock_page(page); + /* Doesn't allow nesting */ + VM_BUG_ON(PagePin(page)); + SetPagePin(page); + unlock_page(page); + } + return ret; +} +EXPORT_SYMBOL(set_pinpage); + +int clear_pinpage(struct pinpage_system *psys, struct page *page) +{ + int ret; + ret = psys->del_page(psys, page); + if (!ret) { + lock_page(page); + ClearPagePin(page); + unlock_page(page); + } + return ret; +} +EXPORT_SYMBOL(clear_pinpage); + +int register_pinpage(struct pinpage_system *psys) +{ + /* register pinpage_subsystem to global list */ + spin_lock(&pinpage_system_lock); + list_add(&psys->list, &pinpage_system_list); + spin_unlock(&pinpage_system_lock); + return psys->create_subsys(psys); +} +EXPORT_SYMBOL(register_pinpage); + +int unregister_pinpage(struct pinpage_system *psys) +{ + /* register pinpage_subsystem to global list */ + spin_lock(&pinpage_system_lock); + list_del(&psys->list); + spin_unlock(&pinpage_system_lock); + return psys->destroy_subsys(psys); +} +EXPORT_SYMBOL(unregister_pinpage); + +int migrate_pinpage(struct page *page, struct page *newpage) +{ + int err = 0; + struct pinpage_system *psys; + + spin_lock(&pinpage_system_lock); + list_for_each_entry(psys, &pinpage_system_list, list) { + if (psys->find_page(psys, page)) { + err = psys->migrate(psys, page, newpage); + break; + } + } + spin_unlock(&pinpage_system_lock); + return err; +} -- 1.7.9.5 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: From: Minchan Kim Subject: [RFC 3/3] mm: migrate pinned page Date: Tue, 13 Aug 2013 16:05:02 +0900 Message-Id: <1376377502-28207-4-git-send-email-minchan@kernel.org> In-Reply-To: <1376377502-28207-1-git-send-email-minchan@kernel.org> References: <1376377502-28207-1-git-send-email-minchan@kernel.org> Sender: owner-linux-mm@kvack.org List-ID: To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, k.kozlowski@samsung.com, Seth Jennings , Mel Gorman , guz.fnst@cn.fujitsu.com, Benjamin LaHaise , Dave Hansen , lliubbo@gmail.com, aquini@redhat.com, Rik van Riel , Minchan Kim Signed-off-by: Minchan Kim --- mm/compaction.c | 26 +++++++++++++++++++++++-- mm/migrate.c | 58 ++++++++++++++++++++++++++++++++++++++++++++++++------- 2 files changed, 75 insertions(+), 9 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index 05ccb4c..16b80e6 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -396,8 +396,10 @@ static void acct_isolated(struct zone *zone, bool locked, struct compact_control struct page *page; unsigned int count[2] = { 0, }; - list_for_each_entry(page, &cc->migratepages, lru) - count[!!page_is_file_cache(page)]++; + list_for_each_entry(page, &cc->migratepages, lru) { + if (!PagePin(page)) + count[!!page_is_file_cache(page)]++; + } /* If locked we can use the interrupt unsafe versions */ if (locked) { @@ -535,6 +537,25 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc, } /* + * Pinned kernel page(ex, zswap) could be isolated. + */ + if (PagePin(page)) { + if (!get_page_unless_zero(page)) + continue; + /* + * Subsystem want to use pinpage should not + * use page->lru feild. + */ + VM_BUG_ON(!list_empty(&page->lru)); + if (!trylock_page(page)) { + put_page(page); + continue; + } + + goto isolated; + } + + /* * Check may be lockless but that's ok as we recheck later. * It's possible to migrate LRU pages and balloon pages * Skip any other type of page @@ -601,6 +622,7 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc, /* Successfully isolated */ cc->finished_update_migrate = true; del_page_from_lru_list(page, lruvec, page_lru(page)); +isolated: list_add(&page->lru, migratelist); cc->nr_migratepages++; nr_isolated++; diff --git a/mm/migrate.c b/mm/migrate.c index 6f0c244..4d28049 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -36,6 +36,7 @@ #include #include #include +#include #include @@ -101,12 +102,17 @@ void putback_movable_pages(struct list_head *l) list_for_each_entry_safe(page, page2, l, lru) { list_del(&page->lru); - dec_zone_page_state(page, NR_ISOLATED_ANON + - page_is_file_cache(page)); - if (unlikely(balloon_page_movable(page))) - balloon_page_putback(page); - else - putback_lru_page(page); + if (!PagePin(page)) { + dec_zone_page_state(page, NR_ISOLATED_ANON + + page_is_file_cache(page)); + if (unlikely(balloon_page_movable(page))) + balloon_page_putback(page); + else + putback_lru_page(page); + } else { + unlock_page(page); + put_page(page); + } } } @@ -855,6 +861,39 @@ out: return rc; } +static int unmap_and_move_pinpage(new_page_t get_new_page, + unsigned long private, struct page *page, int force, + enum migrate_mode mode) +{ + int *result = NULL; + int rc = 0; + struct page *newpage = get_new_page(page, private, &result); + if (!newpage) + return -ENOMEM; + + VM_BUG_ON(!PageLocked(page)); + if (page_count(page) == 1) { + /* page was freed from under us. So we are done. */ + goto out; + } + + rc = migrate_pinpage(page, newpage); +out: + if (rc != -EAGAIN) { + list_del(&page->lru); + unlock_page(page); + put_page(page); + } + + if (result) { + if (rc) + *result = rc; + else + *result = page_to_nid(newpage); + } + return rc; + +} /* * Obtain the lock on page, remove all ptes and migrate the page * to the newly allocated page in newpage. @@ -1025,8 +1064,13 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page, list_for_each_entry_safe(page, page2, from, lru) { cond_resched(); - rc = unmap_and_move(get_new_page, private, + if (PagePin(page)) { + rc = unmap_and_move_pinpage(get_new_page, private, page, pass > 2, mode); + } else { + rc = unmap_and_move(get_new_page, private, + page, pass > 2, mode); + } switch(rc) { case -ENOMEM: -- 1.7.9.5 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Message-id: <1376387202.31048.2.camel@AMDC1943> Subject: Re: [RFC 0/3] Pin page control subsystem From: Krzysztof Kozlowski Date: Tue, 13 Aug 2013 11:46:42 +0200 In-reply-to: <1376377502-28207-1-git-send-email-minchan@kernel.org> References: <1376377502-28207-1-git-send-email-minchan@kernel.org> Content-type: text/plain; charset=UTF-8 Content-transfer-encoding: 7bit MIME-version: 1.0 Sender: owner-linux-mm@kvack.org List-ID: To: Minchan Kim Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Seth Jennings , Mel Gorman , guz.fnst@cn.fujitsu.com, Benjamin LaHaise , Dave Hansen , lliubbo@gmail.com, aquini@redhat.com, Rik van Riel , Tomasz Stanislawski Hi Minchan, On wto, 2013-08-13 at 16:04 +0900, Minchan Kim wrote: > patch 2 introduce pinpage control > subsystem. So, subsystems want to control pinpage should implement own > pinpage_xxx functions because each subsystem would have other character > so what kinds of data structure for managing pinpage information depends > on them. Otherwise, they can use general functions defined in pinpage > subsystem. patch 3 hacks migration.c so that migration is > aware of pinpage now and migrate them with pinpage subsystem. I wonder why don't we use page->mapping and a_ops? Is there any disadvantage of such mapping/a_ops? Best regards, Krzysztof -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Tue, 13 Aug 2013 10:23:38 -0400 From: Benjamin LaHaise Subject: Re: [RFC 0/3] Pin page control subsystem Message-ID: <20130813142338.GD13330@kvack.org> References: <1376377502-28207-1-git-send-email-minchan@kernel.org> <1376387202.31048.2.camel@AMDC1943> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1376387202.31048.2.camel@AMDC1943> Sender: owner-linux-mm@kvack.org List-ID: To: Krzysztof Kozlowski Cc: Minchan Kim , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Seth Jennings , Mel Gorman , guz.fnst@cn.fujitsu.com, Dave Hansen , lliubbo@gmail.com, aquini@redhat.com, Rik van Riel , Tomasz Stanislawski On Tue, Aug 13, 2013 at 11:46:42AM +0200, Krzysztof Kozlowski wrote: > Hi Minchan, > > On wto, 2013-08-13 at 16:04 +0900, Minchan Kim wrote: > > patch 2 introduce pinpage control > > subsystem. So, subsystems want to control pinpage should implement own > > pinpage_xxx functions because each subsystem would have other character > > so what kinds of data structure for managing pinpage information depends > > on them. Otherwise, they can use general functions defined in pinpage > > subsystem. patch 3 hacks migration.c so that migration is > > aware of pinpage now and migrate them with pinpage subsystem. > > I wonder why don't we use page->mapping and a_ops? Is there any > disadvantage of such mapping/a_ops? That's what the pending aio patches do, and I think this is a better approach for those use-cases that the technique works for. The biggest problem I see with the pinpage approach is that it's based on a single page at a time. I'd venture a guess that many pinned pages are done in groups of pages, not single ones. -ben > Best regards, > Krzysztof -- "Thought is the essence of where you are now." -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Tue, 13 Aug 2013 16:21:30 +0000 From: Christoph Lameter Subject: Re: [RFC 0/3] Pin page control subsystem In-Reply-To: <1376377502-28207-1-git-send-email-minchan@kernel.org> Message-ID: <00000140787b6191-ae3f2eb1-515e-48a1-8e64-502772af4700-000000@email.amazonses.com> References: <1376377502-28207-1-git-send-email-minchan@kernel.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org List-ID: To: Minchan Kim Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, k.kozlowski@samsung.com, Seth Jennings , Mel Gorman , guz.fnst@cn.fujitsu.com, Benjamin LaHaise , Dave Hansen , lliubbo@gmail.com, aquini@redhat.com, Rik van Riel On Tue, 13 Aug 2013, Minchan Kim wrote: > VM sometime want to migrate and/or reclaim pages for CMA, memory-hotplug, > THP and so on but at the moment, it could handle only userspace pages > so if above example subsystem have pinned a some page in a range VM want > to migrate, migration is failed so above exmaple couldn't work well. Dont we have the mmu_notifiers that could help in that case? You could get a callback which could prepare the pages for migration? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Wed, 14 Aug 2013 08:54:25 +0900 From: Minchan Kim Subject: Re: [RFC 0/3] Pin page control subsystem Message-ID: <20130813235425.GA2271@bbox> References: <1376377502-28207-1-git-send-email-minchan@kernel.org> <1376387202.31048.2.camel@AMDC1943> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1376387202.31048.2.camel@AMDC1943> Sender: owner-linux-mm@kvack.org List-ID: To: Krzysztof Kozlowski Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Seth Jennings , Mel Gorman , guz.fnst@cn.fujitsu.com, Benjamin LaHaise , Dave Hansen , lliubbo@gmail.com, aquini@redhat.com, Rik van Riel , Tomasz Stanislawski Hello Krzysztof, On Tue, Aug 13, 2013 at 11:46:42AM +0200, Krzysztof Kozlowski wrote: > Hi Minchan, > > On wto, 2013-08-13 at 16:04 +0900, Minchan Kim wrote: > > patch 2 introduce pinpage control > > subsystem. So, subsystems want to control pinpage should implement own > > pinpage_xxx functions because each subsystem would have other character > > so what kinds of data structure for managing pinpage information depends > > on them. Otherwise, they can use general functions defined in pinpage > > subsystem. patch 3 hacks migration.c so that migration is > > aware of pinpage now and migrate them with pinpage subsystem. > > I wonder why don't we use page->mapping and a_ops? Is there any > disadvantage of such mapping/a_ops? Most concern of the approach is how to handle nested pin case. For example, driver A and driver B pin same file-backed page conincidently by get_user_pages. For the migration, we needs following operations. 1. [buffer]'s migrate_page for the file-backed page 2. [driver A]'s migrate_page 3. [driver B]'s migrate_page But the page's mapping is only one. How can we handle it? If we give up pinpage subsystem unifying userspace pages(ex, GUP) and kernel space pages(ex, zswap, zram and zcache), we can go address_space's migatepages but we might lost abstraction so that all of users should implement own pinpage manager. It's not hard, I guess but it's more error-prone and not maintainable for the future. > > Best regards, > Krzysztof > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Wed, 14 Aug 2013 09:08:50 +0900 From: Minchan Kim Subject: Re: [RFC 0/3] Pin page control subsystem Message-ID: <20130814000850.GB2271@bbox> References: <1376377502-28207-1-git-send-email-minchan@kernel.org> <1376387202.31048.2.camel@AMDC1943> <20130813142338.GD13330@kvack.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20130813142338.GD13330@kvack.org> Sender: owner-linux-mm@kvack.org List-ID: To: Benjamin LaHaise Cc: Krzysztof Kozlowski , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Seth Jennings , Mel Gorman , guz.fnst@cn.fujitsu.com, Dave Hansen , lliubbo@gmail.com, aquini@redhat.com, Rik van Riel , Tomasz Stanislawski Hello Benjamin, On Tue, Aug 13, 2013 at 10:23:38AM -0400, Benjamin LaHaise wrote: > On Tue, Aug 13, 2013 at 11:46:42AM +0200, Krzysztof Kozlowski wrote: > > Hi Minchan, > > > > On wto, 2013-08-13 at 16:04 +0900, Minchan Kim wrote: > > > patch 2 introduce pinpage control > > > subsystem. So, subsystems want to control pinpage should implement own > > > pinpage_xxx functions because each subsystem would have other character > > > so what kinds of data structure for managing pinpage information depends > > > on them. Otherwise, they can use general functions defined in pinpage > > > subsystem. patch 3 hacks migration.c so that migration is > > > aware of pinpage now and migrate them with pinpage subsystem. > > > > I wonder why don't we use page->mapping and a_ops? Is there any > > disadvantage of such mapping/a_ops? > > That's what the pending aio patches do, and I think this is a better > approach for those use-cases that the technique works for. I saw your implementation roughly and I think it's not a generic solution. How could it handle the example mentioned in reply of Krzysztof? > > The biggest problem I see with the pinpage approach is that it's based on a > single page at a time. I'd venture a guess that many pinned pages are done > in groups of pages, not single ones. In case of z* family, most of allocation is single but I agree many GUP users would allocate groups of pages. Then, we can cover it by expanding the API like this. int set_pinpage(struct pinpage_system *psys, struct page **pages, unsigned long nr_pages, void **privates); so we can handle it by batch and the subsystem can manage pinpage_info with interval tree rather than radix tree which is default. That's why pinpage control subsystem has room for subsystem specific metadata handling. > > -ben > > > Best regards, > > Krzysztof > > -- > "Thought is the essence of where you are now." > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Wed, 14 Aug 2013 09:12:37 +0900 From: Minchan Kim Subject: Re: [RFC 0/3] Pin page control subsystem Message-ID: <20130814001236.GC2271@bbox> References: <1376377502-28207-1-git-send-email-minchan@kernel.org> <00000140787b6191-ae3f2eb1-515e-48a1-8e64-502772af4700-000000@email.amazonses.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <00000140787b6191-ae3f2eb1-515e-48a1-8e64-502772af4700-000000@email.amazonses.com> Sender: owner-linux-mm@kvack.org List-ID: To: Christoph Lameter Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, k.kozlowski@samsung.com, Seth Jennings , Mel Gorman , guz.fnst@cn.fujitsu.com, Benjamin LaHaise , Dave Hansen , lliubbo@gmail.com, aquini@redhat.com, Rik van Riel Hello Christoph, On Tue, Aug 13, 2013 at 04:21:30PM +0000, Christoph Lameter wrote: > On Tue, 13 Aug 2013, Minchan Kim wrote: > > > VM sometime want to migrate and/or reclaim pages for CMA, memory-hotplug, > > THP and so on but at the moment, it could handle only userspace pages > > so if above example subsystem have pinned a some page in a range VM want > > to migrate, migration is failed so above exmaple couldn't work well. > > Dont we have the mmu_notifiers that could help in that case? You could get > a callback which could prepare the pages for migration? Now I'm not familiar with mmu_notifier so please could you elaborate it a bit for me to dive into that? Thanks! > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Wed, 14 Aug 2013 16:36:44 +0000 From: Christoph Lameter Subject: Re: [RFC 0/3] Pin page control subsystem In-Reply-To: <20130814001236.GC2271@bbox> Message-ID: <000001407dafbe92-7b2b4006-2225-4f0b-b23b-d66101a995aa-000000@email.amazonses.com> References: <1376377502-28207-1-git-send-email-minchan@kernel.org> <00000140787b6191-ae3f2eb1-515e-48a1-8e64-502772af4700-000000@email.amazonses.com> <20130814001236.GC2271@bbox> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org List-ID: To: Minchan Kim Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, k.kozlowski@samsung.com, Seth Jennings , Mel Gorman , guz.fnst@cn.fujitsu.com, Benjamin LaHaise , Dave Hansen , lliubbo@gmail.com, aquini@redhat.com, Rik van Riel On Wed, 14 Aug 2013, Minchan Kim wrote: > On Tue, Aug 13, 2013 at 04:21:30PM +0000, Christoph Lameter wrote: > > On Tue, 13 Aug 2013, Minchan Kim wrote: > > > > > VM sometime want to migrate and/or reclaim pages for CMA, memory-hotplug, > > > THP and so on but at the moment, it could handle only userspace pages > > > so if above example subsystem have pinned a some page in a range VM want > > > to migrate, migration is failed so above exmaple couldn't work well. > > > > Dont we have the mmu_notifiers that could help in that case? You could get > > a callback which could prepare the pages for migration? > > Now I'm not familiar with mmu_notifier so please could you elaborate it > a bit for me to dive into that? Add a notifier callback for unpinning pages to the mmu notifier subsystem and then your drivers could register with the subsystem to get notifications when migration needs to occur etc. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Thu, 15 Aug 2013 01:47:05 +0900 From: Minchan Kim Subject: Re: [RFC 0/3] Pin page control subsystem Message-ID: <20130814164705.GD2706@gmail.com> References: <1376377502-28207-1-git-send-email-minchan@kernel.org> <00000140787b6191-ae3f2eb1-515e-48a1-8e64-502772af4700-000000@email.amazonses.com> <20130814001236.GC2271@bbox> <000001407dafbe92-7b2b4006-2225-4f0b-b23b-d66101a995aa-000000@email.amazonses.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <000001407dafbe92-7b2b4006-2225-4f0b-b23b-d66101a995aa-000000@email.amazonses.com> Sender: owner-linux-mm@kvack.org List-ID: To: Christoph Lameter Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, k.kozlowski@samsung.com, Seth Jennings , Mel Gorman , guz.fnst@cn.fujitsu.com, Benjamin LaHaise , Dave Hansen , lliubbo@gmail.com, aquini@redhat.com, Rik van Riel Hi Christoph, On Wed, Aug 14, 2013 at 04:36:44PM +0000, Christoph Lameter wrote: > On Wed, 14 Aug 2013, Minchan Kim wrote: > > > On Tue, Aug 13, 2013 at 04:21:30PM +0000, Christoph Lameter wrote: > > > On Tue, 13 Aug 2013, Minchan Kim wrote: > > > > > > > VM sometime want to migrate and/or reclaim pages for CMA, memory-hotplug, > > > > THP and so on but at the moment, it could handle only userspace pages > > > > so if above example subsystem have pinned a some page in a range VM want > > > > to migrate, migration is failed so above exmaple couldn't work well. > > > > > > Dont we have the mmu_notifiers that could help in that case? You could get > > > a callback which could prepare the pages for migration? > > > > Now I'm not familiar with mmu_notifier so please could you elaborate it > > a bit for me to dive into that? > > Add a notifier callback for unpinning pages to the mmu notifier subsystem > and then your drivers could register with the subsystem to get > notifications when migration needs to occur etc. > When I look API of mmu_notifier, it has mm_struct so I guess it works for only user process. Right? If so, I need to register it without user conext because zram, zswap and zcache works for only kernel side. -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Wed, 14 Aug 2013 16:58:36 +0000 From: Christoph Lameter Subject: Re: [RFC 0/3] Pin page control subsystem In-Reply-To: <20130814164705.GD2706@gmail.com> Message-ID: <000001407dc3c33b-4139d615-aecc-4745-a9b4-c84949f6a8f4-000000@email.amazonses.com> References: <1376377502-28207-1-git-send-email-minchan@kernel.org> <00000140787b6191-ae3f2eb1-515e-48a1-8e64-502772af4700-000000@email.amazonses.com> <20130814001236.GC2271@bbox> <000001407dafbe92-7b2b4006-2225-4f0b-b23b-d66101a995aa-000000@email.amazonses.com> <20130814164705.GD2706@gmail.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org List-ID: To: Minchan Kim Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, k.kozlowski@samsung.com, Seth Jennings , Mel Gorman , guz.fnst@cn.fujitsu.com, Benjamin LaHaise , Dave Hansen , lliubbo@gmail.com, aquini@redhat.com, Rik van Riel On Thu, 15 Aug 2013, Minchan Kim wrote: > When I look API of mmu_notifier, it has mm_struct so I guess it works > for only user process. Right? Correct. A process must have mapped the pages. If you can get a kernel "process" to work then that process could map the pages. > If so, I need to register it without user conext because zram, zswap > and zcache works for only kernel side. Hmmm... Ok but that now gets the complexity of page pinnning up to a very weird level. Is there some way we can have a common way to deal with the various ways that pinning is needed? Just off the top of my head (I may miss some use cases) we have 1. mlock from user space 2. page pinning for reclaim 3. Page pinning for I/O from device drivers (like f.e. the RDMA subsystem) 4. Page pinning for low latency operations 5. Page pinning for migration 6. Page pinning for the perf buffers. 7. Page pinning for cross system access (XPMEM, GRU SGI) Now we have another subsystem wanting different semantics of pinning. Is there any way we can come up with a pinning mechanism that fits all use cases, that is easyly understandable and maintainable? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Thu, 15 Aug 2013 13:48:34 +0900 From: Minchan Kim Subject: Re: [RFC 0/3] Pin page control subsystem Message-ID: <20130815044834.GB3139@gmail.com> References: <1376377502-28207-1-git-send-email-minchan@kernel.org> <00000140787b6191-ae3f2eb1-515e-48a1-8e64-502772af4700-000000@email.amazonses.com> <20130814001236.GC2271@bbox> <000001407dafbe92-7b2b4006-2225-4f0b-b23b-d66101a995aa-000000@email.amazonses.com> <20130814164705.GD2706@gmail.com> <000001407dc3c33b-4139d615-aecc-4745-a9b4-c84949f6a8f4-000000@email.amazonses.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <000001407dc3c33b-4139d615-aecc-4745-a9b4-c84949f6a8f4-000000@email.amazonses.com> Sender: owner-linux-mm@kvack.org List-ID: To: Christoph Lameter Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, k.kozlowski@samsung.com, Seth Jennings , Mel Gorman , guz.fnst@cn.fujitsu.com, Benjamin LaHaise , Dave Hansen , lliubbo@gmail.com, aquini@redhat.com, Rik van Riel Hey Christoph, On Wed, Aug 14, 2013 at 04:58:36PM +0000, Christoph Lameter wrote: > On Thu, 15 Aug 2013, Minchan Kim wrote: > > > When I look API of mmu_notifier, it has mm_struct so I guess it works > > for only user process. Right? > > Correct. A process must have mapped the pages. If you can get a > kernel "process" to work then that process could map the pages. > > > If so, I need to register it without user conext because zram, zswap > > and zcache works for only kernel side. > > Hmmm... Ok but that now gets the complexity of page pinnning up to a very > weird level. Is there some way we can have a common way to deal with the > various ways that pinning is needed? Just off the top of my head (I may > miss some use cases) we have > > 1. mlock from user space Now mlock pages could be migrated in case of CMA so I think it's not a big problem to migrate it for other cases. I remember You and Peter argued what's the mlock semainc of pin POV and as I remember correctly, Peter said mlock doesn't mean pin so we could migrate it but you didn't agree. Right? Anyway, it's off-topic but technically, it's not a problem. > 2. page pinning for reclaim Reclaiming pin a page for a while. Of course, "for a while" means rather vague so it could mean it's really long for someone but really short for others. But at least, reclaim pin should be short and we should try it if it's not ture. > 3. Page pinning for I/O from device drivers (like f.e. the RDMA subsystem) It's one of big concerns for me. Even several drviers might be able to pin a page same time. But normally most of drvier can know he will pin a page long time or short time so if it want to pin a page long time like aio or some GPU driver for zero-coyp, it should use pinpage control subsystem to release pin pages when VM ask. > 4. Page pinning for low latency operations I have no idea but I guess most of them pin a page during short time? Otherwise, they should use pinpage control subsystem, too. > 5. Page pinning for migration It's like 2. migration pin should be short. > 6. Page pinning for the perf buffers. I'm not familiar with that but my gut feeling is it will pin pages for a long time so it should use pinpage control subsystem. > 7. Page pinning for cross system access (XPMEM, GRU SGI) If it's really long pin, it should use pinpage control subsystem. > > Now we have another subsystem wanting different semantics of pinning. Is > there any way we can come up with a pinning mechanism that fits all use > cases, that is easyly understandable and maintainable? I agree it's not easy but we should go that way rather than adding ad-hoc subsystem specific implementaion. If we allow subsystem specific way, maybe, everybody want to touch migrate.c so it would be very complicated and bloated, even not maintainable in future. If it goes another way like a_ops->migratepages, it couldn't handle complex nesting pin pages case so it couldn't gaurantee pinpage migraions. Most hard part is what is "for a while". It depends on system workloads so some system means it is 3ms while other system means it is 3s. :( Sigh, now I have no idea how can handle it with general. Thanks for the comment, Christoph! > -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Thu, 15 Aug 2013 15:18:40 +0000 From: Christoph Lameter Subject: Re: [RFC 0/3] Pin page control subsystem In-Reply-To: <20130815044834.GB3139@gmail.com> Message-ID: <00000140828ea17e-d69af79a-1d8e-4df2-9513-492df5e00afc-000000@email.amazonses.com> References: <1376377502-28207-1-git-send-email-minchan@kernel.org> <00000140787b6191-ae3f2eb1-515e-48a1-8e64-502772af4700-000000@email.amazonses.com> <20130814001236.GC2271@bbox> <000001407dafbe92-7b2b4006-2225-4f0b-b23b-d66101a995aa-000000@email.amazonses.com> <20130814164705.GD2706@gmail.com> <000001407dc3c33b-4139d615-aecc-4745-a9b4-c84949f6a8f4-000000@email.amazonses.com> <20130815044834.GB3139@gmail.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org List-ID: To: Minchan Kim Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, k.kozlowski@samsung.com, Seth Jennings , Mel Gorman , guz.fnst@cn.fujitsu.com, Benjamin LaHaise , Dave Hansen , lliubbo@gmail.com, aquini@redhat.com, Rik van Riel On Thu, 15 Aug 2013, Minchan Kim wrote: > Now mlock pages could be migrated in case of CMA so I think it's not a > big problem to migrate it for other cases. > I remember You and Peter argued what's the mlock semainc of pin POV > and as I remember correctly, Peter said mlock doesn't mean pin so > we could migrate it but you didn't agree. Right? mlock means it can be migrated. Pinning is currently done by increasing the page count. Migration will be attempted but it will fail since the references cannot be all removed. Peter proposed that mlock would work like pinning so that a migration of the page would not be attempted. My concern is not only about migration but about a general way of pinning pages. Having mlock and pinning with different semantics is already an issue as the conversation with Peter brought out. Now we are adding yet another way that pinning is used. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754794Ab3HMHFF (ORCPT ); Tue, 13 Aug 2013 03:05:05 -0400 Received: from lgeamrelo02.lge.com ([156.147.1.126]:44209 "EHLO LGEAMRELO02.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750998Ab3HMHFC (ORCPT ); Tue, 13 Aug 2013 03:05:02 -0400 X-AuditID: 9c93017e-b7c76ae000003897-c9-5209da9b66bb From: Minchan Kim To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, k.kozlowski@samsung.com, Seth Jennings , Mel Gorman , guz.fnst@cn.fujitsu.com, Benjamin LaHaise , Dave Hansen , lliubbo@gmail.com, aquini@redhat.com, Rik van Riel , Minchan Kim Subject: [RFC 0/3] Pin page control subsystem Date: Tue, 13 Aug 2013 16:04:59 +0900 Message-Id: <1376377502-28207-1-git-send-email-minchan@kernel.org> X-Mailer: git-send-email 1.7.9.5 X-Brightmail-Tracker: AAAAAA== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org !! NOTICE !! It's totally untested patchset so please AVOID real testing. I'd like to show just concept and want to discuss it on very early stage. (so there isn't enough description but I guess code is very simple so not a big problem to understand the intention). This patchset is for solving *kernel* pinpage migration problem more general. Now, zswap, zram and z* family, not sure upcoming what solution are using memory don't live in harmony with VM. (I don't remember ballon compaction but we might be able to unify ballon compaction with this.) VM sometime want to migrate and/or reclaim pages for CMA, memory-hotplug, THP and so on but at the moment, it could handle only userspace pages so if above example subsystem have pinned a some page in a range VM want to migrate, migration is failed so above exmaple couldn't work well. This patchset is for basic facility for the role. patch 1 introduces a new page flags and patch 2 introduce pinpage control subsystem. So, subsystems want to control pinpage should implement own pinpage_xxx functions because each subsystem would have other character so what kinds of data structure for managing pinpage information depends on them. Otherwise, they can use general functions defined in pinpage subsystem. patch 3 hacks migration.c so that migration is aware of pinpage now and migrate them with pinpage subsystem. It exposes new rule that users of pinpage control subsystem shouldn't use struct page->flags and struct page->lru field freely because lru field is used for migration.c and flags field is used for lock_page in pinpage control subsystem. I think it's not a big problem because subsystem can use other fields of the page descriptor, instead. This patch's limitation is that it couldn't apply user space pages although I'd REALLY REALLY like to unify them. IOW, it couldn't handle long pin page by get_user_pages. Basic hurdle is that how to handle nesting cases caused by that several subsystem pin on same page with GUP but they could have different migrate methods. It could add rather complexity and overhead but I'm not sure it's worth because proved culprit until now is AIO ring pages and Gu and Benjamin have approached it with another way so I'd like to hear their opinions. Minchan Kim (3): mm: Introduce new page flag pinpage control subsystem mm: migrate pinned page include/linux/page-flags.h | 2 + include/linux/pinpage.h | 39 +++++++++++++ mm/Makefile | 2 +- mm/compaction.c | 26 ++++++++- mm/migrate.c | 58 ++++++++++++++++--- mm/page_alloc.c | 1 + mm/pinpage.c | 134 ++++++++++++++++++++++++++++++++++++++++++++ 7 files changed, 252 insertions(+), 10 deletions(-) create mode 100644 include/linux/pinpage.h create mode 100644 mm/pinpage.c -- 1.7.9.5 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754267Ab3HMHFE (ORCPT ); Tue, 13 Aug 2013 03:05:04 -0400 Received: from lgeamrelo02.lge.com ([156.147.1.126]:61132 "EHLO LGEAMRELO02.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750961Ab3HMHFC (ORCPT ); Tue, 13 Aug 2013 03:05:02 -0400 X-AuditID: 9c93017e-b7c76ae000003897-cc-5209da9c27b6 From: Minchan Kim To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, k.kozlowski@samsung.com, Seth Jennings , Mel Gorman , guz.fnst@cn.fujitsu.com, Benjamin LaHaise , Dave Hansen , lliubbo@gmail.com, aquini@redhat.com, Rik van Riel , Minchan Kim Subject: [RFC 1/3] mm: Introduce new page flag Date: Tue, 13 Aug 2013 16:05:00 +0900 Message-Id: <1376377502-28207-2-git-send-email-minchan@kernel.org> X-Mailer: git-send-email 1.7.9.5 In-Reply-To: <1376377502-28207-1-git-send-email-minchan@kernel.org> References: <1376377502-28207-1-git-send-email-minchan@kernel.org> X-Brightmail-Tracker: AAAAAA== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Signed-off-by: Minchan Kim --- include/linux/page-flags.h | 2 ++ mm/page_alloc.c | 1 + 2 files changed, 3 insertions(+) diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 6d53675..75ce843 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -109,6 +109,7 @@ enum pageflags { #ifdef CONFIG_TRANSPARENT_HUGEPAGE PG_compound_lock, #endif + PG_pin, __NR_PAGEFLAGS, /* Filesystems */ @@ -197,6 +198,7 @@ struct page; /* forward declaration */ TESTPAGEFLAG(Locked, locked) PAGEFLAG(Error, error) TESTCLEARFLAG(Error, error) +PAGEFLAG(Pin, pin) TESTCLEARFLAG(Pin, pin) PAGEFLAG(Referenced, referenced) TESTCLEARFLAG(Referenced, referenced) PAGEFLAG(Dirty, dirty) TESTSCFLAG(Dirty, dirty) __CLEARPAGEFLAG(Dirty, dirty) PAGEFLAG(LRU, lru) __CLEARPAGEFLAG(LRU, lru) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index b100255..5dd8b43 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6345,6 +6345,7 @@ static const struct trace_print_flags pageflag_names[] = { #ifdef CONFIG_TRANSPARENT_HUGEPAGE {1UL << PG_compound_lock, "compound_lock" }, #endif + {1UL << PG_pin, "pin" }, }; static void dump_page_flags(unsigned long flags) -- 1.7.9.5 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755299Ab3HMHFh (ORCPT ); Tue, 13 Aug 2013 03:05:37 -0400 Received: from lgeamrelo02.lge.com ([156.147.1.126]:44209 "EHLO LGEAMRELO02.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751547Ab3HMHFD (ORCPT ); Tue, 13 Aug 2013 03:05:03 -0400 X-AuditID: 9c93017e-b7c76ae000003897-d6-5209da9c5b25 From: Minchan Kim To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, k.kozlowski@samsung.com, Seth Jennings , Mel Gorman , guz.fnst@cn.fujitsu.com, Benjamin LaHaise , Dave Hansen , lliubbo@gmail.com, aquini@redhat.com, Rik van Riel , Minchan Kim Subject: [RFC 3/3] mm: migrate pinned page Date: Tue, 13 Aug 2013 16:05:02 +0900 Message-Id: <1376377502-28207-4-git-send-email-minchan@kernel.org> X-Mailer: git-send-email 1.7.9.5 In-Reply-To: <1376377502-28207-1-git-send-email-minchan@kernel.org> References: <1376377502-28207-1-git-send-email-minchan@kernel.org> X-Brightmail-Tracker: AAAAAA== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Signed-off-by: Minchan Kim --- mm/compaction.c | 26 +++++++++++++++++++++++-- mm/migrate.c | 58 ++++++++++++++++++++++++++++++++++++++++++++++++------- 2 files changed, 75 insertions(+), 9 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index 05ccb4c..16b80e6 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -396,8 +396,10 @@ static void acct_isolated(struct zone *zone, bool locked, struct compact_control struct page *page; unsigned int count[2] = { 0, }; - list_for_each_entry(page, &cc->migratepages, lru) - count[!!page_is_file_cache(page)]++; + list_for_each_entry(page, &cc->migratepages, lru) { + if (!PagePin(page)) + count[!!page_is_file_cache(page)]++; + } /* If locked we can use the interrupt unsafe versions */ if (locked) { @@ -535,6 +537,25 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc, } /* + * Pinned kernel page(ex, zswap) could be isolated. + */ + if (PagePin(page)) { + if (!get_page_unless_zero(page)) + continue; + /* + * Subsystem want to use pinpage should not + * use page->lru feild. + */ + VM_BUG_ON(!list_empty(&page->lru)); + if (!trylock_page(page)) { + put_page(page); + continue; + } + + goto isolated; + } + + /* * Check may be lockless but that's ok as we recheck later. * It's possible to migrate LRU pages and balloon pages * Skip any other type of page @@ -601,6 +622,7 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc, /* Successfully isolated */ cc->finished_update_migrate = true; del_page_from_lru_list(page, lruvec, page_lru(page)); +isolated: list_add(&page->lru, migratelist); cc->nr_migratepages++; nr_isolated++; diff --git a/mm/migrate.c b/mm/migrate.c index 6f0c244..4d28049 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -36,6 +36,7 @@ #include #include #include +#include #include @@ -101,12 +102,17 @@ void putback_movable_pages(struct list_head *l) list_for_each_entry_safe(page, page2, l, lru) { list_del(&page->lru); - dec_zone_page_state(page, NR_ISOLATED_ANON + - page_is_file_cache(page)); - if (unlikely(balloon_page_movable(page))) - balloon_page_putback(page); - else - putback_lru_page(page); + if (!PagePin(page)) { + dec_zone_page_state(page, NR_ISOLATED_ANON + + page_is_file_cache(page)); + if (unlikely(balloon_page_movable(page))) + balloon_page_putback(page); + else + putback_lru_page(page); + } else { + unlock_page(page); + put_page(page); + } } } @@ -855,6 +861,39 @@ out: return rc; } +static int unmap_and_move_pinpage(new_page_t get_new_page, + unsigned long private, struct page *page, int force, + enum migrate_mode mode) +{ + int *result = NULL; + int rc = 0; + struct page *newpage = get_new_page(page, private, &result); + if (!newpage) + return -ENOMEM; + + VM_BUG_ON(!PageLocked(page)); + if (page_count(page) == 1) { + /* page was freed from under us. So we are done. */ + goto out; + } + + rc = migrate_pinpage(page, newpage); +out: + if (rc != -EAGAIN) { + list_del(&page->lru); + unlock_page(page); + put_page(page); + } + + if (result) { + if (rc) + *result = rc; + else + *result = page_to_nid(newpage); + } + return rc; + +} /* * Obtain the lock on page, remove all ptes and migrate the page * to the newly allocated page in newpage. @@ -1025,8 +1064,13 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page, list_for_each_entry_safe(page, page2, from, lru) { cond_resched(); - rc = unmap_and_move(get_new_page, private, + if (PagePin(page)) { + rc = unmap_and_move_pinpage(get_new_page, private, page, pass > 2, mode); + } else { + rc = unmap_and_move(get_new_page, private, + page, pass > 2, mode); + } switch(rc) { case -ENOMEM: -- 1.7.9.5 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755657Ab3HMHFw (ORCPT ); Tue, 13 Aug 2013 03:05:52 -0400 Received: from lgeamrelo02.lge.com ([156.147.1.126]:61132 "EHLO LGEAMRELO02.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751116Ab3HMHFD (ORCPT ); Tue, 13 Aug 2013 03:05:03 -0400 X-AuditID: 9c93017e-b7c76ae000003897-d2-5209da9cde47 From: Minchan Kim To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, k.kozlowski@samsung.com, Seth Jennings , Mel Gorman , guz.fnst@cn.fujitsu.com, Benjamin LaHaise , Dave Hansen , lliubbo@gmail.com, aquini@redhat.com, Rik van Riel , Minchan Kim Subject: [RFC 2/3] pinpage control subsystem Date: Tue, 13 Aug 2013 16:05:01 +0900 Message-Id: <1376377502-28207-3-git-send-email-minchan@kernel.org> X-Mailer: git-send-email 1.7.9.5 In-Reply-To: <1376377502-28207-1-git-send-email-minchan@kernel.org> References: <1376377502-28207-1-git-send-email-minchan@kernel.org> X-Brightmail-Tracker: AAAAAA== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Signed-off-by: Minchan Kim --- include/linux/pinpage.h | 39 ++++++++++++++ mm/Makefile | 2 +- mm/pinpage.c | 134 +++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 174 insertions(+), 1 deletion(-) create mode 100644 include/linux/pinpage.h create mode 100644 mm/pinpage.c diff --git a/include/linux/pinpage.h b/include/linux/pinpage.h new file mode 100644 index 0000000..42fbdc7 --- /dev/null +++ b/include/linux/pinpage.h @@ -0,0 +1,39 @@ +#ifndef _LINUX_PINPAGE_H +#define _LINUX_PINPAGE_H + +#include + +/* + * NOTE : pinpage_system user shouldn't use page->lru and page->flags + * fields. + */ +struct pinpage_system { + struct radix_tree_root page_tree; + spinlock_t tree_lock; + + int (*create_subsys)(struct pinpage_system *psys); + int (*destroy_subsys)(struct pinpage_system *psys); + int (*migrate)(struct pinpage_system *psys, struct page *page, + struct page *newpage); + int (*add_page)(struct pinpage_system *psys, struct page *page, + void *private); + int (*del_page)(struct pinpage_system *psys, struct page *page); + int (*find_page)(struct pinpage_system *psys, struct page *page); + + struct list_head list; +}; + +extern int general_create_subsys(struct pinpage_system *psys); +extern int general_destroy_subsys(struct pinpage_system *psys); +extern int general_add_page(struct pinpage_system *psys, struct page *page, + void *private); +extern int general_del_page(struct pinpage_system *psys, struct page *page); +extern int general_find_page(struct pinpage_system *psys, struct page *page); + +extern int set_pinpage(struct pinpage_system *psys, struct page *page, + void *private); +extern int register_pinpage(struct pinpage_system *psys); +extern int migrate_pinpage(struct page *page, struct page *newpage); + +#endif + diff --git a/mm/Makefile b/mm/Makefile index f008033..bf4a2d9 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -5,7 +5,7 @@ mmu-y := nommu.o mmu-$(CONFIG_MMU) := fremap.o highmem.o madvise.o memory.o mincore.o \ mlock.o mmap.o mprotect.o mremap.o msync.o rmap.o \ - vmalloc.o pagewalk.o pgtable-generic.o + vmalloc.o pagewalk.o pgtable-generic.o pinpage.o ifdef CONFIG_CROSS_MEMORY_ATTACH mmu-$(CONFIG_MMU) += process_vm_access.o diff --git a/mm/pinpage.c b/mm/pinpage.c new file mode 100644 index 0000000..0833204 --- /dev/null +++ b/mm/pinpage.c @@ -0,0 +1,134 @@ +#include +#include +#include +#include + +static DEFINE_SPINLOCK(pinpage_system_lock); +static LIST_HEAD(pinpage_system_list); + +struct pinpage_info { + unsigned long pfn; + void *private; +}; + +int general_create_subsys(struct pinpage_system *psys) +{ + INIT_RADIX_TREE(&psys->page_tree, GFP_KERNEL); + spin_lock_init(&psys->tree_lock); + return 0; +} +EXPORT_SYMBOL(general_create_subsys); + +int general_destroy_subsys(struct pinpage_system *psys) +{ + return 0; +} +EXPORT_SYMBOL(general_destroy_subsys); + +int general_add_page(struct pinpage_system *psys, struct page *page, + void *private) +{ + int ret = -ENOMEM; + unsigned long pfn = page_to_pfn(page); + struct pinpage_info *pinfo = kmalloc(sizeof(pinfo), GFP_KERNEL); + if (!pinfo) + return ret; + + pinfo->pfn = pfn; + pinfo->private = private; + + spin_lock(&psys->tree_lock); + ret = radix_tree_insert(&psys->page_tree, pfn, pinfo); + spin_unlock(&psys->tree_lock); + return ret; +} +EXPORT_SYMBOL(general_add_page); + +int general_del_page(struct pinpage_system *psys, struct page *page) +{ + struct pinpage_info *pinfo; + spin_lock(&psys->tree_lock); + pinfo = radix_tree_lookup(&psys->page_tree, page_to_pfn(page)); + if (!pinfo) { + spin_unlock(&psys->tree_lock); + return -EINVAL; + } + radix_tree_delete(&psys->page_tree, page_to_pfn(page)); + spin_unlock(&psys->tree_lock); + return 0; +} +EXPORT_SYMBOL(general_del_page); + +int general_find_page(struct pinpage_system *psys, struct page *page) +{ + struct pinpage_info *pinfo; + spin_lock(&psys->tree_lock); + pinfo = radix_tree_lookup(&psys->page_tree, page_to_pfn(page)); + spin_unlock(&psys->tree_lock); + return pinfo ? 1 : 0; +} +EXPORT_SYMBOL(general_find_page); + +int set_pinpage(struct pinpage_system *psys, struct page *page, void *private) +{ + int ret; + ret = psys->add_page(psys, page, private); + if (!ret) { + lock_page(page); + /* Doesn't allow nesting */ + VM_BUG_ON(PagePin(page)); + SetPagePin(page); + unlock_page(page); + } + return ret; +} +EXPORT_SYMBOL(set_pinpage); + +int clear_pinpage(struct pinpage_system *psys, struct page *page) +{ + int ret; + ret = psys->del_page(psys, page); + if (!ret) { + lock_page(page); + ClearPagePin(page); + unlock_page(page); + } + return ret; +} +EXPORT_SYMBOL(clear_pinpage); + +int register_pinpage(struct pinpage_system *psys) +{ + /* register pinpage_subsystem to global list */ + spin_lock(&pinpage_system_lock); + list_add(&psys->list, &pinpage_system_list); + spin_unlock(&pinpage_system_lock); + return psys->create_subsys(psys); +} +EXPORT_SYMBOL(register_pinpage); + +int unregister_pinpage(struct pinpage_system *psys) +{ + /* register pinpage_subsystem to global list */ + spin_lock(&pinpage_system_lock); + list_del(&psys->list); + spin_unlock(&pinpage_system_lock); + return psys->destroy_subsys(psys); +} +EXPORT_SYMBOL(unregister_pinpage); + +int migrate_pinpage(struct page *page, struct page *newpage) +{ + int err = 0; + struct pinpage_system *psys; + + spin_lock(&pinpage_system_lock); + list_for_each_entry(psys, &pinpage_system_list, list) { + if (psys->find_page(psys, page)) { + err = psys->migrate(psys, page, newpage); + break; + } + } + spin_unlock(&pinpage_system_lock); + return err; +} -- 1.7.9.5 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756739Ab3HMJq5 (ORCPT ); Tue, 13 Aug 2013 05:46:57 -0400 Received: from mailout1.w1.samsung.com ([210.118.77.11]:26119 "EHLO mailout1.w1.samsung.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753615Ab3HMJq4 (ORCPT ); Tue, 13 Aug 2013 05:46:56 -0400 X-AuditID: cbfec7f5-b7f5f6d00000105f-7f-520a00849517 Message-id: <1376387202.31048.2.camel@AMDC1943> Subject: Re: [RFC 0/3] Pin page control subsystem From: Krzysztof Kozlowski To: Minchan Kim Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Seth Jennings , Mel Gorman , guz.fnst@cn.fujitsu.com, Benjamin LaHaise , Dave Hansen , lliubbo@gmail.com, aquini@redhat.com, Rik van Riel , Tomasz Stanislawski Date: Tue, 13 Aug 2013 11:46:42 +0200 In-reply-to: <1376377502-28207-1-git-send-email-minchan@kernel.org> References: <1376377502-28207-1-git-send-email-minchan@kernel.org> Content-type: text/plain; charset=UTF-8 X-Mailer: Evolution 3.2.3-0ubuntu6 Content-transfer-encoding: 7bit MIME-version: 1.0 X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFnrLLMWRmVeSWpSXmKPExsVy+t/xK7otDFxBBr9v8ljMePiLxaLr3xYW i08vHzBaPG8/wGxxedccNot7a/6zWvyfsZzZYvK7Z4wWy76+Z7f4e2U9i8WhfavYLea1v2R1 4PH4f3ASs8fOWXfZPRbvecnksWlVJ5vHpk+T2D0eHNrM4vF+31U2j74tqxg9Np+u9vi8SS6A K4rLJiU1J7MstUjfLoEr4/rFSSwFM1krOo6wNDDOY+li5OCQEDCROLxKtYuRE8gUk7hwbz1b FyMXh5DAUkaJA/t/sYEkhAQ+M0pcOOgAYvMK6EucW9PPBGILCxhK7D35iRXEZhMwlti8fAlY vYiAisSfp/8YQQYxC9xjkjg0YSJYA4uAqsTzI7vBijgFnCW+dPxmgljgJDFh2VF2EJtZQF1i 0rxFzBAXKUnsbu+EistLbF7zlhniCEGJH5PvsUxgFJiFpGUWkrJZSMoWMDKvYhRNLU0uKE5K zzXSK07MLS7NS9dLzs/dxAiJoq87GJceszrEKMDBqMTDm/GRI0iINbGsuDL3EKMEB7OSCK/S H84gId6UxMqq1KL8+KLSnNTiQ4xMHJxSDYz1T0t90h3nMjyxLonfobv7cnX2u5+cOXd4We4u OfR0YkJqJ7u4y+VbDHq9SQuf+1uEsEbNYP7b0bHgyLsP7lHyD65wyKr/a/udIW31WyxzGrP3 /imLFP3nqoTrO6pP+7/19O69Xms1nYSV7k7aErvTo3Humc+TivYFm378MOuqtgqHQOWilrNK LMUZiYZazEXFiQAqsw20gAIAAA== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Minchan, On wto, 2013-08-13 at 16:04 +0900, Minchan Kim wrote: > patch 2 introduce pinpage control > subsystem. So, subsystems want to control pinpage should implement own > pinpage_xxx functions because each subsystem would have other character > so what kinds of data structure for managing pinpage information depends > on them. Otherwise, they can use general functions defined in pinpage > subsystem. patch 3 hacks migration.c so that migration is > aware of pinpage now and migrate them with pinpage subsystem. I wonder why don't we use page->mapping and a_ops? Is there any disadvantage of such mapping/a_ops? Best regards, Krzysztof From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757657Ab3HMOXm (ORCPT ); Tue, 13 Aug 2013 10:23:42 -0400 Received: from kanga.kvack.org ([205.233.56.17]:36476 "EHLO kanga.kvack.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757092Ab3HMOXl (ORCPT ); Tue, 13 Aug 2013 10:23:41 -0400 Date: Tue, 13 Aug 2013 10:23:38 -0400 From: Benjamin LaHaise To: Krzysztof Kozlowski Cc: Minchan Kim , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Seth Jennings , Mel Gorman , guz.fnst@cn.fujitsu.com, Dave Hansen , lliubbo@gmail.com, aquini@redhat.com, Rik van Riel , Tomasz Stanislawski Subject: Re: [RFC 0/3] Pin page control subsystem Message-ID: <20130813142338.GD13330@kvack.org> References: <1376377502-28207-1-git-send-email-minchan@kernel.org> <1376387202.31048.2.camel@AMDC1943> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1376387202.31048.2.camel@AMDC1943> User-Agent: Mutt/1.4.2.2i Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Aug 13, 2013 at 11:46:42AM +0200, Krzysztof Kozlowski wrote: > Hi Minchan, > > On wto, 2013-08-13 at 16:04 +0900, Minchan Kim wrote: > > patch 2 introduce pinpage control > > subsystem. So, subsystems want to control pinpage should implement own > > pinpage_xxx functions because each subsystem would have other character > > so what kinds of data structure for managing pinpage information depends > > on them. Otherwise, they can use general functions defined in pinpage > > subsystem. patch 3 hacks migration.c so that migration is > > aware of pinpage now and migrate them with pinpage subsystem. > > I wonder why don't we use page->mapping and a_ops? Is there any > disadvantage of such mapping/a_ops? That's what the pending aio patches do, and I think this is a better approach for those use-cases that the technique works for. The biggest problem I see with the pinpage approach is that it's based on a single page at a time. I'd venture a guess that many pinned pages are done in groups of pages, not single ones. -ben > Best regards, > Krzysztof -- "Thought is the essence of where you are now." From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758721Ab3HMQbo (ORCPT ); Tue, 13 Aug 2013 12:31:44 -0400 Received: from a9-92.smtp-out.amazonses.com ([54.240.9.92]:57391 "EHLO a9-92.smtp-out.amazonses.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757965Ab3HMQbl (ORCPT ); Tue, 13 Aug 2013 12:31:41 -0400 X-Greylist: delayed 610 seconds by postgrey-1.27 at vger.kernel.org; Tue, 13 Aug 2013 12:31:41 EDT Date: Tue, 13 Aug 2013 16:21:30 +0000 From: Christoph Lameter To: Minchan Kim cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, k.kozlowski@samsung.com, Seth Jennings , Mel Gorman , guz.fnst@cn.fujitsu.com, Benjamin LaHaise , Dave Hansen , lliubbo@gmail.com, aquini@redhat.com, Rik van Riel Subject: Re: [RFC 0/3] Pin page control subsystem In-Reply-To: <1376377502-28207-1-git-send-email-minchan@kernel.org> Message-ID: <00000140787b6191-ae3f2eb1-515e-48a1-8e64-502772af4700-000000@email.amazonses.com> References: <1376377502-28207-1-git-send-email-minchan@kernel.org> User-Agent: Alpine 2.02 (DEB 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-SES-Outgoing: 2013.08.13-54.240.9.92 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 13 Aug 2013, Minchan Kim wrote: > VM sometime want to migrate and/or reclaim pages for CMA, memory-hotplug, > THP and so on but at the moment, it could handle only userspace pages > so if above example subsystem have pinned a some page in a range VM want > to migrate, migration is failed so above exmaple couldn't work well. Dont we have the mmu_notifiers that could help in that case? You could get a callback which could prepare the pages for migration? From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758489Ab3HMXyZ (ORCPT ); Tue, 13 Aug 2013 19:54:25 -0400 Received: from LGEMRELSE6Q.lge.com ([156.147.1.121]:54360 "EHLO LGEMRELSE6Q.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754221Ab3HMXyY (ORCPT ); Tue, 13 Aug 2013 19:54:24 -0400 X-AuditID: 9c930179-b7c0bae0000040ac-64-520ac72b4709 Date: Wed, 14 Aug 2013 08:54:25 +0900 From: Minchan Kim To: Krzysztof Kozlowski Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Seth Jennings , Mel Gorman , guz.fnst@cn.fujitsu.com, Benjamin LaHaise , Dave Hansen , lliubbo@gmail.com, aquini@redhat.com, Rik van Riel , Tomasz Stanislawski Subject: Re: [RFC 0/3] Pin page control subsystem Message-ID: <20130813235425.GA2271@bbox> References: <1376377502-28207-1-git-send-email-minchan@kernel.org> <1376387202.31048.2.camel@AMDC1943> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1376387202.31048.2.camel@AMDC1943> User-Agent: Mutt/1.5.21 (2010-09-15) X-Brightmail-Tracker: AAAAAA== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello Krzysztof, On Tue, Aug 13, 2013 at 11:46:42AM +0200, Krzysztof Kozlowski wrote: > Hi Minchan, > > On wto, 2013-08-13 at 16:04 +0900, Minchan Kim wrote: > > patch 2 introduce pinpage control > > subsystem. So, subsystems want to control pinpage should implement own > > pinpage_xxx functions because each subsystem would have other character > > so what kinds of data structure for managing pinpage information depends > > on them. Otherwise, they can use general functions defined in pinpage > > subsystem. patch 3 hacks migration.c so that migration is > > aware of pinpage now and migrate them with pinpage subsystem. > > I wonder why don't we use page->mapping and a_ops? Is there any > disadvantage of such mapping/a_ops? Most concern of the approach is how to handle nested pin case. For example, driver A and driver B pin same file-backed page conincidently by get_user_pages. For the migration, we needs following operations. 1. [buffer]'s migrate_page for the file-backed page 2. [driver A]'s migrate_page 3. [driver B]'s migrate_page But the page's mapping is only one. How can we handle it? If we give up pinpage subsystem unifying userspace pages(ex, GUP) and kernel space pages(ex, zswap, zram and zcache), we can go address_space's migatepages but we might lost abstraction so that all of users should implement own pinpage manager. It's not hard, I guess but it's more error-prone and not maintainable for the future. > > Best regards, > Krzysztof > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org -- Kind regards, Minchan Kim From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1759123Ab3HNAIu (ORCPT ); Tue, 13 Aug 2013 20:08:50 -0400 Received: from lgeamrelo01.lge.com ([156.147.1.125]:47185 "EHLO LGEAMRELO01.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1758627Ab3HNAIp (ORCPT ); Tue, 13 Aug 2013 20:08:45 -0400 X-AuditID: 9c93017d-b7cdfae0000026c0-90-520aca8b6054 Date: Wed, 14 Aug 2013 09:08:50 +0900 From: Minchan Kim To: Benjamin LaHaise Cc: Krzysztof Kozlowski , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Seth Jennings , Mel Gorman , guz.fnst@cn.fujitsu.com, Dave Hansen , lliubbo@gmail.com, aquini@redhat.com, Rik van Riel , Tomasz Stanislawski Subject: Re: [RFC 0/3] Pin page control subsystem Message-ID: <20130814000850.GB2271@bbox> References: <1376377502-28207-1-git-send-email-minchan@kernel.org> <1376387202.31048.2.camel@AMDC1943> <20130813142338.GD13330@kvack.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20130813142338.GD13330@kvack.org> User-Agent: Mutt/1.5.21 (2010-09-15) X-Brightmail-Tracker: AAAAAA== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello Benjamin, On Tue, Aug 13, 2013 at 10:23:38AM -0400, Benjamin LaHaise wrote: > On Tue, Aug 13, 2013 at 11:46:42AM +0200, Krzysztof Kozlowski wrote: > > Hi Minchan, > > > > On wto, 2013-08-13 at 16:04 +0900, Minchan Kim wrote: > > > patch 2 introduce pinpage control > > > subsystem. So, subsystems want to control pinpage should implement own > > > pinpage_xxx functions because each subsystem would have other character > > > so what kinds of data structure for managing pinpage information depends > > > on them. Otherwise, they can use general functions defined in pinpage > > > subsystem. patch 3 hacks migration.c so that migration is > > > aware of pinpage now and migrate them with pinpage subsystem. > > > > I wonder why don't we use page->mapping and a_ops? Is there any > > disadvantage of such mapping/a_ops? > > That's what the pending aio patches do, and I think this is a better > approach for those use-cases that the technique works for. I saw your implementation roughly and I think it's not a generic solution. How could it handle the example mentioned in reply of Krzysztof? > > The biggest problem I see with the pinpage approach is that it's based on a > single page at a time. I'd venture a guess that many pinned pages are done > in groups of pages, not single ones. In case of z* family, most of allocation is single but I agree many GUP users would allocate groups of pages. Then, we can cover it by expanding the API like this. int set_pinpage(struct pinpage_system *psys, struct page **pages, unsigned long nr_pages, void **privates); so we can handle it by batch and the subsystem can manage pinpage_info with interval tree rather than radix tree which is default. That's why pinpage control subsystem has room for subsystem specific metadata handling. > > -ben > > > Best regards, > > Krzysztof > > -- > "Thought is the essence of where you are now." > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org -- Kind regards, Minchan Kim From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758962Ab3HNAMd (ORCPT ); Tue, 13 Aug 2013 20:12:33 -0400 Received: from LGEMRELSE6Q.lge.com ([156.147.1.121]:46480 "EHLO LGEMRELSE6Q.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1758101Ab3HNAMb (ORCPT ); Tue, 13 Aug 2013 20:12:31 -0400 X-AuditID: 9c930179-b7c0bae0000040ac-d8-520acb6e252a Date: Wed, 14 Aug 2013 09:12:37 +0900 From: Minchan Kim To: Christoph Lameter Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, k.kozlowski@samsung.com, Seth Jennings , Mel Gorman , guz.fnst@cn.fujitsu.com, Benjamin LaHaise , Dave Hansen , lliubbo@gmail.com, aquini@redhat.com, Rik van Riel Subject: Re: [RFC 0/3] Pin page control subsystem Message-ID: <20130814001236.GC2271@bbox> References: <1376377502-28207-1-git-send-email-minchan@kernel.org> <00000140787b6191-ae3f2eb1-515e-48a1-8e64-502772af4700-000000@email.amazonses.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <00000140787b6191-ae3f2eb1-515e-48a1-8e64-502772af4700-000000@email.amazonses.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-Brightmail-Tracker: AAAAAA== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello Christoph, On Tue, Aug 13, 2013 at 04:21:30PM +0000, Christoph Lameter wrote: > On Tue, 13 Aug 2013, Minchan Kim wrote: > > > VM sometime want to migrate and/or reclaim pages for CMA, memory-hotplug, > > THP and so on but at the moment, it could handle only userspace pages > > so if above example subsystem have pinned a some page in a range VM want > > to migrate, migration is failed so above exmaple couldn't work well. > > Dont we have the mmu_notifiers that could help in that case? You could get > a callback which could prepare the pages for migration? Now I'm not familiar with mmu_notifier so please could you elaborate it a bit for me to dive into that? Thanks! > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org -- Kind regards, Minchan Kim From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932894Ab3HNQrQ (ORCPT ); Wed, 14 Aug 2013 12:47:16 -0400 Received: from mail-oa0-f43.google.com ([209.85.219.43]:56794 "EHLO mail-oa0-f43.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932716Ab3HNQrO (ORCPT ); Wed, 14 Aug 2013 12:47:14 -0400 Date: Thu, 15 Aug 2013 01:47:05 +0900 From: Minchan Kim To: Christoph Lameter Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, k.kozlowski@samsung.com, Seth Jennings , Mel Gorman , guz.fnst@cn.fujitsu.com, Benjamin LaHaise , Dave Hansen , lliubbo@gmail.com, aquini@redhat.com, Rik van Riel Subject: Re: [RFC 0/3] Pin page control subsystem Message-ID: <20130814164705.GD2706@gmail.com> References: <1376377502-28207-1-git-send-email-minchan@kernel.org> <00000140787b6191-ae3f2eb1-515e-48a1-8e64-502772af4700-000000@email.amazonses.com> <20130814001236.GC2271@bbox> <000001407dafbe92-7b2b4006-2225-4f0b-b23b-d66101a995aa-000000@email.amazonses.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <000001407dafbe92-7b2b4006-2225-4f0b-b23b-d66101a995aa-000000@email.amazonses.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Christoph, On Wed, Aug 14, 2013 at 04:36:44PM +0000, Christoph Lameter wrote: > On Wed, 14 Aug 2013, Minchan Kim wrote: > > > On Tue, Aug 13, 2013 at 04:21:30PM +0000, Christoph Lameter wrote: > > > On Tue, 13 Aug 2013, Minchan Kim wrote: > > > > > > > VM sometime want to migrate and/or reclaim pages for CMA, memory-hotplug, > > > > THP and so on but at the moment, it could handle only userspace pages > > > > so if above example subsystem have pinned a some page in a range VM want > > > > to migrate, migration is failed so above exmaple couldn't work well. > > > > > > Dont we have the mmu_notifiers that could help in that case? You could get > > > a callback which could prepare the pages for migration? > > > > Now I'm not familiar with mmu_notifier so please could you elaborate it > > a bit for me to dive into that? > > Add a notifier callback for unpinning pages to the mmu notifier subsystem > and then your drivers could register with the subsystem to get > notifications when migration needs to occur etc. > When I look API of mmu_notifier, it has mm_struct so I guess it works for only user process. Right? If so, I need to register it without user conext because zram, zswap and zcache works for only kernel side. -- Kind regards, Minchan Kim From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932915Ab3HNQsq (ORCPT ); Wed, 14 Aug 2013 12:48:46 -0400 Received: from a9-58.smtp-out.amazonses.com ([54.240.9.58]:49022 "EHLO a9-58.smtp-out.amazonses.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932831Ab3HNQsp (ORCPT ); Wed, 14 Aug 2013 12:48:45 -0400 X-Greylist: delayed 719 seconds by postgrey-1.27 at vger.kernel.org; Wed, 14 Aug 2013 12:48:45 EDT Date: Wed, 14 Aug 2013 16:36:44 +0000 From: Christoph Lameter To: Minchan Kim cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, k.kozlowski@samsung.com, Seth Jennings , Mel Gorman , guz.fnst@cn.fujitsu.com, Benjamin LaHaise , Dave Hansen , lliubbo@gmail.com, aquini@redhat.com, Rik van Riel Subject: Re: [RFC 0/3] Pin page control subsystem In-Reply-To: <20130814001236.GC2271@bbox> Message-ID: <000001407dafbe92-7b2b4006-2225-4f0b-b23b-d66101a995aa-000000@email.amazonses.com> References: <1376377502-28207-1-git-send-email-minchan@kernel.org> <00000140787b6191-ae3f2eb1-515e-48a1-8e64-502772af4700-000000@email.amazonses.com> <20130814001236.GC2271@bbox> User-Agent: Alpine 2.02 (DEB 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-SES-Outgoing: 2013.08.14-54.240.9.58 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 14 Aug 2013, Minchan Kim wrote: > On Tue, Aug 13, 2013 at 04:21:30PM +0000, Christoph Lameter wrote: > > On Tue, 13 Aug 2013, Minchan Kim wrote: > > > > > VM sometime want to migrate and/or reclaim pages for CMA, memory-hotplug, > > > THP and so on but at the moment, it could handle only userspace pages > > > so if above example subsystem have pinned a some page in a range VM want > > > to migrate, migration is failed so above exmaple couldn't work well. > > > > Dont we have the mmu_notifiers that could help in that case? You could get > > a callback which could prepare the pages for migration? > > Now I'm not familiar with mmu_notifier so please could you elaborate it > a bit for me to dive into that? Add a notifier callback for unpinning pages to the mmu notifier subsystem and then your drivers could register with the subsystem to get notifications when migration needs to occur etc. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932951Ab3HNREJ (ORCPT ); Wed, 14 Aug 2013 13:04:09 -0400 Received: from a9-58.smtp-out.amazonses.com ([54.240.9.58]:49811 "EHLO a9-58.smtp-out.amazonses.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932612Ab3HNREH (ORCPT ); Wed, 14 Aug 2013 13:04:07 -0400 X-Greylist: delayed 329 seconds by postgrey-1.27 at vger.kernel.org; Wed, 14 Aug 2013 13:04:07 EDT Date: Wed, 14 Aug 2013 16:58:36 +0000 From: Christoph Lameter To: Minchan Kim cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, k.kozlowski@samsung.com, Seth Jennings , Mel Gorman , guz.fnst@cn.fujitsu.com, Benjamin LaHaise , Dave Hansen , lliubbo@gmail.com, aquini@redhat.com, Rik van Riel Subject: Re: [RFC 0/3] Pin page control subsystem In-Reply-To: <20130814164705.GD2706@gmail.com> Message-ID: <000001407dc3c33b-4139d615-aecc-4745-a9b4-c84949f6a8f4-000000@email.amazonses.com> References: <1376377502-28207-1-git-send-email-minchan@kernel.org> <00000140787b6191-ae3f2eb1-515e-48a1-8e64-502772af4700-000000@email.amazonses.com> <20130814001236.GC2271@bbox> <000001407dafbe92-7b2b4006-2225-4f0b-b23b-d66101a995aa-000000@email.amazonses.com> <20130814164705.GD2706@gmail.com> User-Agent: Alpine 2.02 (DEB 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-SES-Outgoing: 2013.08.14-54.240.9.58 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 15 Aug 2013, Minchan Kim wrote: > When I look API of mmu_notifier, it has mm_struct so I guess it works > for only user process. Right? Correct. A process must have mapped the pages. If you can get a kernel "process" to work then that process could map the pages. > If so, I need to register it without user conext because zram, zswap > and zcache works for only kernel side. Hmmm... Ok but that now gets the complexity of page pinnning up to a very weird level. Is there some way we can have a common way to deal with the various ways that pinning is needed? Just off the top of my head (I may miss some use cases) we have 1. mlock from user space 2. page pinning for reclaim 3. Page pinning for I/O from device drivers (like f.e. the RDMA subsystem) 4. Page pinning for low latency operations 5. Page pinning for migration 6. Page pinning for the perf buffers. 7. Page pinning for cross system access (XPMEM, GRU SGI) Now we have another subsystem wanting different semantics of pinning. Is there any way we can come up with a pinning mechanism that fits all use cases, that is easyly understandable and maintainable? From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752811Ab3HOEsp (ORCPT ); Thu, 15 Aug 2013 00:48:45 -0400 Received: from mail-pb0-f54.google.com ([209.85.160.54]:58408 "EHLO mail-pb0-f54.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751799Ab3HOEsn (ORCPT ); Thu, 15 Aug 2013 00:48:43 -0400 Date: Thu, 15 Aug 2013 13:48:34 +0900 From: Minchan Kim To: Christoph Lameter Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, k.kozlowski@samsung.com, Seth Jennings , Mel Gorman , guz.fnst@cn.fujitsu.com, Benjamin LaHaise , Dave Hansen , lliubbo@gmail.com, aquini@redhat.com, Rik van Riel Subject: Re: [RFC 0/3] Pin page control subsystem Message-ID: <20130815044834.GB3139@gmail.com> References: <1376377502-28207-1-git-send-email-minchan@kernel.org> <00000140787b6191-ae3f2eb1-515e-48a1-8e64-502772af4700-000000@email.amazonses.com> <20130814001236.GC2271@bbox> <000001407dafbe92-7b2b4006-2225-4f0b-b23b-d66101a995aa-000000@email.amazonses.com> <20130814164705.GD2706@gmail.com> <000001407dc3c33b-4139d615-aecc-4745-a9b4-c84949f6a8f4-000000@email.amazonses.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <000001407dc3c33b-4139d615-aecc-4745-a9b4-c84949f6a8f4-000000@email.amazonses.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hey Christoph, On Wed, Aug 14, 2013 at 04:58:36PM +0000, Christoph Lameter wrote: > On Thu, 15 Aug 2013, Minchan Kim wrote: > > > When I look API of mmu_notifier, it has mm_struct so I guess it works > > for only user process. Right? > > Correct. A process must have mapped the pages. If you can get a > kernel "process" to work then that process could map the pages. > > > If so, I need to register it without user conext because zram, zswap > > and zcache works for only kernel side. > > Hmmm... Ok but that now gets the complexity of page pinnning up to a very > weird level. Is there some way we can have a common way to deal with the > various ways that pinning is needed? Just off the top of my head (I may > miss some use cases) we have > > 1. mlock from user space Now mlock pages could be migrated in case of CMA so I think it's not a big problem to migrate it for other cases. I remember You and Peter argued what's the mlock semainc of pin POV and as I remember correctly, Peter said mlock doesn't mean pin so we could migrate it but you didn't agree. Right? Anyway, it's off-topic but technically, it's not a problem. > 2. page pinning for reclaim Reclaiming pin a page for a while. Of course, "for a while" means rather vague so it could mean it's really long for someone but really short for others. But at least, reclaim pin should be short and we should try it if it's not ture. > 3. Page pinning for I/O from device drivers (like f.e. the RDMA subsystem) It's one of big concerns for me. Even several drviers might be able to pin a page same time. But normally most of drvier can know he will pin a page long time or short time so if it want to pin a page long time like aio or some GPU driver for zero-coyp, it should use pinpage control subsystem to release pin pages when VM ask. > 4. Page pinning for low latency operations I have no idea but I guess most of them pin a page during short time? Otherwise, they should use pinpage control subsystem, too. > 5. Page pinning for migration It's like 2. migration pin should be short. > 6. Page pinning for the perf buffers. I'm not familiar with that but my gut feeling is it will pin pages for a long time so it should use pinpage control subsystem. > 7. Page pinning for cross system access (XPMEM, GRU SGI) If it's really long pin, it should use pinpage control subsystem. > > Now we have another subsystem wanting different semantics of pinning. Is > there any way we can come up with a pinning mechanism that fits all use > cases, that is easyly understandable and maintainable? I agree it's not easy but we should go that way rather than adding ad-hoc subsystem specific implementaion. If we allow subsystem specific way, maybe, everybody want to touch migrate.c so it would be very complicated and bloated, even not maintainable in future. If it goes another way like a_ops->migratepages, it couldn't handle complex nesting pin pages case so it couldn't gaurantee pinpage migraions. Most hard part is what is "for a while". It depends on system workloads so some system means it is 3ms while other system means it is 3s. :( Sigh, now I have no idea how can handle it with general. Thanks for the comment, Christoph! > -- Kind regards, Minchan Kim From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757560Ab3HOPgG (ORCPT ); Thu, 15 Aug 2013 11:36:06 -0400 Received: from a9-99.smtp-out.amazonses.com ([54.240.9.99]:44328 "EHLO a9-99.smtp-out.amazonses.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755854Ab3HOPgE (ORCPT ); Thu, 15 Aug 2013 11:36:04 -0400 X-Greylist: delayed 1040 seconds by postgrey-1.27 at vger.kernel.org; Thu, 15 Aug 2013 11:36:04 EDT Date: Thu, 15 Aug 2013 15:18:40 +0000 From: Christoph Lameter To: Minchan Kim cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, k.kozlowski@samsung.com, Seth Jennings , Mel Gorman , guz.fnst@cn.fujitsu.com, Benjamin LaHaise , Dave Hansen , lliubbo@gmail.com, aquini@redhat.com, Rik van Riel Subject: Re: [RFC 0/3] Pin page control subsystem In-Reply-To: <20130815044834.GB3139@gmail.com> Message-ID: <00000140828ea17e-d69af79a-1d8e-4df2-9513-492df5e00afc-000000@email.amazonses.com> References: <1376377502-28207-1-git-send-email-minchan@kernel.org> <00000140787b6191-ae3f2eb1-515e-48a1-8e64-502772af4700-000000@email.amazonses.com> <20130814001236.GC2271@bbox> <000001407dafbe92-7b2b4006-2225-4f0b-b23b-d66101a995aa-000000@email.amazonses.com> <20130814164705.GD2706@gmail.com> <000001407dc3c33b-4139d615-aecc-4745-a9b4-c84949f6a8f4-000000@email.amazonses.com> <20130815044834.GB3139@gmail.com> User-Agent: Alpine 2.02 (DEB 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-SES-Outgoing: 2013.08.15-54.240.9.99 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 15 Aug 2013, Minchan Kim wrote: > Now mlock pages could be migrated in case of CMA so I think it's not a > big problem to migrate it for other cases. > I remember You and Peter argued what's the mlock semainc of pin POV > and as I remember correctly, Peter said mlock doesn't mean pin so > we could migrate it but you didn't agree. Right? mlock means it can be migrated. Pinning is currently done by increasing the page count. Migration will be attempted but it will fail since the references cannot be all removed. Peter proposed that mlock would work like pinning so that a migration of the page would not be attempted. My concern is not only about migration but about a general way of pinning pages. Having mlock and pinning with different semantics is already an issue as the conversation with Peter brought out. Now we are adding yet another way that pinning is used.