* [PATCH 1/5] Swapless V2: try_to_unmap() - Rename ignrefs to "migration"
2006-04-13 23:54 [PATCH 0/5] Swapless page migration V2: Overview Christoph Lameter
@ 2006-04-13 23:54 ` Christoph Lameter
2006-04-13 23:54 ` [PATCH 2/5] Swapless V2: Add migration swap entries Christoph Lameter
` (4 subsequent siblings)
5 siblings, 0 replies; 54+ messages in thread
From: Christoph Lameter @ 2006-04-13 23:54 UTC (permalink / raw)
To: akpm
Cc: Hugh Dickins, linux-kernel, Lee Schermerhorn, linux-mm,
Christoph Lameter, Hirokazu Takahashi, Marcelo Tosatti,
KAMEZAWA Hiroyuki
migrate is a better name since we implement special handling for
page migration later.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Index: linux-2.6.17-rc1-mm2/mm/rmap.c
===================================================================
--- linux-2.6.17-rc1-mm2.orig/mm/rmap.c 2006-04-02 20:22:10.000000000 -0700
+++ linux-2.6.17-rc1-mm2/mm/rmap.c 2006-04-13 12:56:10.000000000 -0700
@@ -578,7 +578,7 @@ void page_remove_rmap(struct page *page)
* repeatedly from either try_to_unmap_anon or try_to_unmap_file.
*/
static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
- int ignore_refs)
+ int migration)
{
struct mm_struct *mm = vma->vm_mm;
unsigned long address;
@@ -602,7 +602,7 @@ static int try_to_unmap_one(struct page
*/
if ((vma->vm_flags & VM_LOCKED) ||
(ptep_clear_flush_young(vma, address, pte)
- && !ignore_refs)) {
+ && !migration)) {
ret = SWAP_FAIL;
goto out_unmap;
}
@@ -736,7 +736,7 @@ static void try_to_unmap_cluster(unsigne
pte_unmap_unlock(pte - 1, ptl);
}
-static int try_to_unmap_anon(struct page *page, int ignore_refs)
+static int try_to_unmap_anon(struct page *page, int migration)
{
struct anon_vma *anon_vma;
struct vm_area_struct *vma;
@@ -747,7 +747,7 @@ static int try_to_unmap_anon(struct page
return ret;
list_for_each_entry(vma, &anon_vma->head, anon_vma_node) {
- ret = try_to_unmap_one(page, vma, ignore_refs);
+ ret = try_to_unmap_one(page, vma, migration);
if (ret == SWAP_FAIL || !page_mapped(page))
break;
}
@@ -764,7 +764,7 @@ static int try_to_unmap_anon(struct page
*
* This function is only called from try_to_unmap for object-based pages.
*/
-static int try_to_unmap_file(struct page *page, int ignore_refs)
+static int try_to_unmap_file(struct page *page, int migration)
{
struct address_space *mapping = page->mapping;
pgoff_t pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
@@ -778,7 +778,7 @@ static int try_to_unmap_file(struct page
spin_lock(&mapping->i_mmap_lock);
vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, pgoff, pgoff) {
- ret = try_to_unmap_one(page, vma, ignore_refs);
+ ret = try_to_unmap_one(page, vma, migration);
if (ret == SWAP_FAIL || !page_mapped(page))
goto out;
}
@@ -863,16 +863,16 @@ out:
* SWAP_AGAIN - we missed a mapping, try again later
* SWAP_FAIL - the page is unswappable
*/
-int try_to_unmap(struct page *page, int ignore_refs)
+int try_to_unmap(struct page *page, int migration)
{
int ret;
BUG_ON(!PageLocked(page));
if (PageAnon(page))
- ret = try_to_unmap_anon(page, ignore_refs);
+ ret = try_to_unmap_anon(page, migration);
else
- ret = try_to_unmap_file(page, ignore_refs);
+ ret = try_to_unmap_file(page, migration);
if (!page_mapped(page))
ret = SWAP_SUCCESS;
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 54+ messages in thread* [PATCH 2/5] Swapless V2: Add migration swap entries
2006-04-13 23:54 [PATCH 0/5] Swapless page migration V2: Overview Christoph Lameter
2006-04-13 23:54 ` [PATCH 1/5] Swapless V2: try_to_unmap() - Rename ignrefs to "migration" Christoph Lameter
@ 2006-04-13 23:54 ` Christoph Lameter
2006-04-14 0:13 ` Andrew Morton
2006-04-13 23:54 ` [PATCH 3/5] Swapless V2: Make try_to_unmap() create migration entries Christoph Lameter
` (3 subsequent siblings)
5 siblings, 1 reply; 54+ messages in thread
From: Christoph Lameter @ 2006-04-13 23:54 UTC (permalink / raw)
To: akpm
Cc: Hugh Dickins, linux-kernel, Lee Schermerhorn, linux-mm,
Christoph Lameter, Hirokazu Takahashi, Marcelo Tosatti,
KAMEZAWA Hiroyuki
Add migration swap type and functions to handle migration entries
SWP_TYPE_MIGRATION is a special swap type that encodes the pfn of the
page in the swp_offset. SWP_TYPE_MIGRATION swap entries are only set
for a pte while the corresponding page is locked. Migration entries
are removed while the page is still locked. Therefore the processing
for this special type of swap page can be simple.
Only freeing and duplication operations are supported for copy_page_range
and zap_range. The freeing of this type of entry is ignored and we also
simply do nothing on duplication relying on the reverse maps to track
replications of the pte.
If do_swap_page encounters a migration entry then it simply redoes
the fault until the migration entry has gone away. We used to take
a page count on the old page which frequently caused the page migration
code to retry again. Redoing the fault immediately avoids
migration retries.
Migration entry related operations work even if CONFIG_SWAP has not been
switched on.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Index: linux-2.6.17-rc1-mm2/mm/swapfile.c
===================================================================
--- linux-2.6.17-rc1-mm2.orig/mm/swapfile.c 2006-04-02 20:22:10.000000000 -0700
+++ linux-2.6.17-rc1-mm2/mm/swapfile.c 2006-04-13 16:43:10.000000000 -0700
@@ -395,6 +395,9 @@ void free_swap_and_cache(swp_entry_t ent
struct swap_info_struct * p;
struct page *page = NULL;
+ if (is_migration_entry(entry))
+ return;
+
p = swap_info_get(entry);
if (p) {
if (swap_entry_free(p, swp_offset(entry)) == 1) {
@@ -1709,6 +1712,9 @@ int swap_duplicate(swp_entry_t entry)
unsigned long offset, type;
int result = 0;
+ if (is_migration_entry(entry))
+ return 1;
+
type = swp_type(entry);
if (type >= nr_swapfiles)
goto bad_file;
Index: linux-2.6.17-rc1-mm2/include/linux/swap.h
===================================================================
--- linux-2.6.17-rc1-mm2.orig/include/linux/swap.h 2006-04-11 12:14:34.000000000 -0700
+++ linux-2.6.17-rc1-mm2/include/linux/swap.h 2006-04-13 16:43:21.000000000 -0700
@@ -29,7 +29,13 @@ static inline int current_is_kswapd(void
* the type/offset into the pte as 5/27 as well.
*/
#define MAX_SWAPFILES_SHIFT 5
+#ifndef CONFIG_MIGRATION
#define MAX_SWAPFILES (1 << MAX_SWAPFILES_SHIFT)
+#else
+/* Use last entry for page migration swap entries */
+#define MAX_SWAPFILES ((1 << MAX_SWAPFILES_SHIFT)-1)
+#define SWP_TYPE_MIGRATION MAX_SWAPFILES
+#endif
/*
* Magic header for a swap area. The first part of the union is
Index: linux-2.6.17-rc1-mm2/include/linux/swapops.h
===================================================================
--- linux-2.6.17-rc1-mm2.orig/include/linux/swapops.h 2006-04-02 20:22:10.000000000 -0700
+++ linux-2.6.17-rc1-mm2/include/linux/swapops.h 2006-04-13 16:43:10.000000000 -0700
@@ -67,3 +67,35 @@ static inline pte_t swp_entry_to_pte(swp
BUG_ON(pte_file(__swp_entry_to_pte(arch_entry)));
return __swp_entry_to_pte(arch_entry);
}
+
+#ifdef CONFIG_MIGRATION
+static inline swp_entry_t make_migration_entry(struct page *page)
+{
+ BUG_ON(!PageLocked(page));
+ return swp_entry(SWP_TYPE_MIGRATION, page_to_pfn(page));
+}
+
+static inline int is_migration_entry(swp_entry_t entry)
+{
+ return swp_type(entry) == SWP_TYPE_MIGRATION;
+}
+
+static inline struct page *migration_entry_to_page(swp_entry_t entry)
+{
+ struct page *p = pfn_to_page(swp_offset(entry));
+ /*
+ * Any use of migration entries may only occur while the
+ * corresponding page is locked
+ */
+ BUG_ON(!PageLocked(p));
+ BUG_ON(!is_migration_entry(entry));
+ return p;
+}
+#else
+
+#define make_migration_entry(page) swp_entry(0, 0)
+#define is_migration_entry(swp) 0
+#define migration_entry_to_page(swp) NULL
+
+#endif
+
Index: linux-2.6.17-rc1-mm2/mm/memory.c
===================================================================
--- linux-2.6.17-rc1-mm2.orig/mm/memory.c 2006-04-11 12:14:34.000000000 -0700
+++ linux-2.6.17-rc1-mm2/mm/memory.c 2006-04-13 16:43:10.000000000 -0700
@@ -1879,6 +1879,12 @@ static int do_swap_page(struct mm_struct
goto out;
entry = pte_to_swp_entry(orig_pte);
+
+ if (unlikely(is_migration_entry(entry))) {
+ yield();
+ goto out;
+ }
+
page = lookup_swap_cache(entry);
if (!page) {
swapin_readahead(entry, address, vma);
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 54+ messages in thread* Re: [PATCH 2/5] Swapless V2: Add migration swap entries
2006-04-13 23:54 ` [PATCH 2/5] Swapless V2: Add migration swap entries Christoph Lameter
@ 2006-04-14 0:13 ` Andrew Morton
2006-04-14 0:29 ` Christoph Lameter
2006-04-14 0:36 ` [PATCH 2/5] Swapless V2: Add migration swap entries Christoph Lameter
0 siblings, 2 replies; 54+ messages in thread
From: Andrew Morton @ 2006-04-14 0:13 UTC (permalink / raw)
To: Christoph Lameter
Cc: hugh, linux-kernel, lee.schermerhorn, linux-mm, taka,
marcelo.tosatti, kamezawa.hiroyu
Christoph Lameter <clameter@sgi.com> wrote:
>
> +
> + if (unlikely(is_migration_entry(entry))) {
Perhaps put the unlikely() in is_migration_entry()?
> + yield();
Please, no yielding.
_especially_ no unchangelogged, uncommented yielding.
> + goto out;
> + }
> +
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 54+ messages in thread* Re: [PATCH 2/5] Swapless V2: Add migration swap entries
2006-04-14 0:13 ` Andrew Morton
@ 2006-04-14 0:29 ` Christoph Lameter
2006-04-14 0:42 ` Andrew Morton
2006-04-14 0:36 ` [PATCH 2/5] Swapless V2: Add migration swap entries Christoph Lameter
1 sibling, 1 reply; 54+ messages in thread
From: Christoph Lameter @ 2006-04-14 0:29 UTC (permalink / raw)
To: Andrew Morton
Cc: hugh, linux-kernel, lee.schermerhorn, linux-mm, taka,
marcelo.tosatti, kamezawa.hiroyu
On Thu, 13 Apr 2006, Andrew Morton wrote:
> Christoph Lameter <clameter@sgi.com> wrote:
> >
> > +
> > + if (unlikely(is_migration_entry(entry))) {
>
> Perhaps put the unlikely() in is_migration_entry()?
>
> > + yield();
>
> Please, no yielding.
>
> _especially_ no unchangelogged, uncommented yielding.
Page migration is ongoing so its best to do something else first.
Add a comment?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 54+ messages in thread* Re: [PATCH 2/5] Swapless V2: Add migration swap entries
2006-04-14 0:29 ` Christoph Lameter
@ 2006-04-14 0:42 ` Andrew Morton
2006-04-14 0:46 ` Christoph Lameter
0 siblings, 1 reply; 54+ messages in thread
From: Andrew Morton @ 2006-04-14 0:42 UTC (permalink / raw)
To: Christoph Lameter
Cc: hugh, linux-kernel, lee.schermerhorn, linux-mm, taka,
marcelo.tosatti, kamezawa.hiroyu
Christoph Lameter <clameter@sgi.com> wrote:
>
> On Thu, 13 Apr 2006, Andrew Morton wrote:
>
> > Christoph Lameter <clameter@sgi.com> wrote:
> > >
> > > +
> > > + if (unlikely(is_migration_entry(entry))) {
> >
> > Perhaps put the unlikely() in is_migration_entry()?
> >
> > > + yield();
> >
> > Please, no yielding.
> >
> > _especially_ no unchangelogged, uncommented yielding.
>
> Page migration is ongoing so its best to do something else first.
That doesn't help a lot. What is "something else"? What are the dynamics
in there, and why do you feel that some sort of delay is needed?
> Add a comment?
I don't think we're up to that stage yet.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 54+ messages in thread* Re: [PATCH 2/5] Swapless V2: Add migration swap entries
2006-04-14 0:42 ` Andrew Morton
@ 2006-04-14 0:46 ` Christoph Lameter
2006-04-14 1:01 ` Andrew Morton
0 siblings, 1 reply; 54+ messages in thread
From: Christoph Lameter @ 2006-04-14 0:46 UTC (permalink / raw)
To: Andrew Morton
Cc: hugh, linux-kernel, lee.schermerhorn, linux-mm, taka,
marcelo.tosatti, kamezawa.hiroyu
On Thu, 13 Apr 2006, Andrew Morton wrote:
> Christoph Lameter <clameter@sgi.com> wrote:
> >
> > On Thu, 13 Apr 2006, Andrew Morton wrote:
> >
> > > Christoph Lameter <clameter@sgi.com> wrote:
> > > >
> > > > +
> > > > + if (unlikely(is_migration_entry(entry))) {
> > >
> > > Perhaps put the unlikely() in is_migration_entry()?
> > >
> > > > + yield();
> > >
> > > Please, no yielding.
> > >
> > > _especially_ no unchangelogged, uncommented yielding.
> >
> > Page migration is ongoing so its best to do something else first.
>
> That doesn't help a lot. What is "something else"? What are the dynamics
> in there, and why do you feel that some sort of delay is needed?
Page migration is ongoing for the page that was faulted. This means
the migration thread has torn down the ptes and replaced them with
migration entries in order to prevent access to this page. The migration
thread is continuing the process of tearing down ptes, copying the page
and then rebuilding the ptes. When the ptes are back then the fault
handler will no longer be invoked or it will fix up some of the bits in
the ptes. This takes a short time, the more ptes point to a page the
longer it will take to replace them.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 54+ messages in thread* Re: [PATCH 2/5] Swapless V2: Add migration swap entries
2006-04-14 0:46 ` Christoph Lameter
@ 2006-04-14 1:01 ` Andrew Morton
2006-04-14 1:17 ` Andrew Morton
2006-04-14 1:31 ` Christoph Lameter
0 siblings, 2 replies; 54+ messages in thread
From: Andrew Morton @ 2006-04-14 1:01 UTC (permalink / raw)
To: Christoph Lameter
Cc: hugh, linux-kernel, lee.schermerhorn, linux-mm, taka,
marcelo.tosatti, kamezawa.hiroyu
Christoph Lameter <clameter@sgi.com> wrote:
>
> On Thu, 13 Apr 2006, Andrew Morton wrote:
>
> > Christoph Lameter <clameter@sgi.com> wrote:
> > >
> > > On Thu, 13 Apr 2006, Andrew Morton wrote:
> > >
> > > > Christoph Lameter <clameter@sgi.com> wrote:
> > > > >
> > > > > +
> > > > > + if (unlikely(is_migration_entry(entry))) {
> > > >
> > > > Perhaps put the unlikely() in is_migration_entry()?
> > > >
> > > > > + yield();
> > > >
> > > > Please, no yielding.
> > > >
> > > > _especially_ no unchangelogged, uncommented yielding.
> > >
> > > Page migration is ongoing so its best to do something else first.
> >
> > That doesn't help a lot. What is "something else"? What are the dynamics
> > in there, and why do you feel that some sort of delay is needed?
>
> Page migration is ongoing for the page that was faulted. This means
> the migration thread has torn down the ptes and replaced them with
> migration entries in order to prevent access to this page. The migration
> thread is continuing the process of tearing down ptes, copying the page
> and then rebuilding the ptes. When the ptes are back then the fault
> handler will no longer be invoked or it will fix up some of the bits in
> the ptes. This takes a short time, the more ptes point to a page the
> longer it will take to replace them.
So we falsely return VM_FAULT_MINOR and let userspace retake the pagefault,
thus implementing a form of polling, yes? If so, there is no "something
else" which this process can do.
Pages are locked during migration. The faulting process will sleep in
lock_page() until migration is complete. Except we've gone and diddled
with the swap pte so do_swap_page() can no longer locate the page which
needs to be locked.
Doing a busy-wait seems a bit lame. Perhaps it would be better to go to
sleep on some global queue, poke that queue each time a page migration
completes?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 54+ messages in thread* Re: [PATCH 2/5] Swapless V2: Add migration swap entries
2006-04-14 1:01 ` Andrew Morton
@ 2006-04-14 1:17 ` Andrew Morton
2006-04-14 1:31 ` Christoph Lameter
2006-04-14 1:31 ` Christoph Lameter
1 sibling, 1 reply; 54+ messages in thread
From: Andrew Morton @ 2006-04-14 1:17 UTC (permalink / raw)
To: clameter, hugh, linux-kernel, lee.schermerhorn, linux-mm, taka,
marcelo.tosatti, kamezawa.hiroyu
Andrew Morton <akpm@osdl.org> wrote:
>
> Perhaps it would be better to go to
> sleep on some global queue, poke that queue each time a page migration
> completes?
Or take mmap_sem for writing in do_migrate_pages()? That takes the whole
pagefault path out of the picture.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH 2/5] Swapless V2: Add migration swap entries
2006-04-14 1:17 ` Andrew Morton
@ 2006-04-14 1:31 ` Christoph Lameter
2006-04-14 5:25 ` Andrew Morton
0 siblings, 1 reply; 54+ messages in thread
From: Christoph Lameter @ 2006-04-14 1:31 UTC (permalink / raw)
To: Andrew Morton
Cc: hugh, linux-kernel, lee.schermerhorn, linux-mm, taka,
marcelo.tosatti, kamezawa.hiroyu
On Thu, 13 Apr 2006, Andrew Morton wrote:
> Andrew Morton <akpm@osdl.org> wrote:
> >
> > Perhaps it would be better to go to
> > sleep on some global queue, poke that queue each time a page migration
> > completes?
>
> Or take mmap_sem for writing in do_migrate_pages()? That takes the whole
> pagefault path out of the picture.
We would have to take that for each task mapping the page. Very expensive
operation.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH 2/5] Swapless V2: Add migration swap entries
2006-04-14 1:31 ` Christoph Lameter
@ 2006-04-14 5:25 ` Andrew Morton
2006-04-14 14:27 ` Lee Schermerhorn
2006-04-14 16:01 ` Christoph Lameter
0 siblings, 2 replies; 54+ messages in thread
From: Andrew Morton @ 2006-04-14 5:25 UTC (permalink / raw)
To: Christoph Lameter
Cc: hugh, linux-kernel, lee.schermerhorn, linux-mm, taka,
marcelo.tosatti, kamezawa.hiroyu
Christoph Lameter <clameter@sgi.com> wrote:
>
> On Thu, 13 Apr 2006, Andrew Morton wrote:
>
> > Andrew Morton <akpm@osdl.org> wrote:
> > >
> > > Perhaps it would be better to go to
> > > sleep on some global queue, poke that queue each time a page migration
> > > completes?
> >
> > Or take mmap_sem for writing in do_migrate_pages()? That takes the whole
> > pagefault path out of the picture.
>
> We would have to take that for each task mapping the page. Very expensive
> operation.
So... why does do_migrate_pages() take mmap_sem at all?
And the code we're talking about here deals with anonymous pages, which are
not shared betweem mm's.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH 2/5] Swapless V2: Add migration swap entries
2006-04-14 5:25 ` Andrew Morton
@ 2006-04-14 14:27 ` Lee Schermerhorn
2006-04-14 16:01 ` Christoph Lameter
1 sibling, 0 replies; 54+ messages in thread
From: Lee Schermerhorn @ 2006-04-14 14:27 UTC (permalink / raw)
To: Andrew Morton
Cc: Christoph Lameter, hugh, linux-kernel, linux-mm, taka,
marcelo.tosatti, kamezawa.hiroyu
On Thu, 2006-04-13 at 22:25 -0700, Andrew Morton wrote:
> Christoph Lameter <clameter@sgi.com> wrote:
> >
> > On Thu, 13 Apr 2006, Andrew Morton wrote:
> >
> > > Andrew Morton <akpm@osdl.org> wrote:
> > > >
> > > > Perhaps it would be better to go to
> > > > sleep on some global queue, poke that queue each time a page migration
> > > > completes?
> > >
> > > Or take mmap_sem for writing in do_migrate_pages()? That takes the whole
> > > pagefault path out of the picture.
> >
> > We would have to take that for each task mapping the page. Very expensive
> > operation.
>
> So... why does do_migrate_pages() take mmap_sem at all?
>
> And the code we're talking about here deals with anonymous pages, which are
> not shared betweem mm's.
I think that anon pages are shared, copy-on-write, between parent and
child after a fork(). If no exec() and no task writes the page, the
sharing can become quite extensive. I encountered this testing the
migrate-on-fault patches. With MPOL_MF_MOVE, these shared anon pages
don't get migrated at all [sometimes this is what you want, sometimes
not...], but with '_MOVE_ALL the shared anon pages DO get migrated, so
you can have races between a faulting task and the migrating task.
Lee
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH 2/5] Swapless V2: Add migration swap entries
2006-04-14 5:25 ` Andrew Morton
2006-04-14 14:27 ` Lee Schermerhorn
@ 2006-04-14 16:01 ` Christoph Lameter
1 sibling, 0 replies; 54+ messages in thread
From: Christoph Lameter @ 2006-04-14 16:01 UTC (permalink / raw)
To: Andrew Morton
Cc: hugh, linux-kernel, lee.schermerhorn, linux-mm, taka,
marcelo.tosatti, kamezawa.hiroyu
On Thu, 13 Apr 2006, Andrew Morton wrote:
> > We would have to take that for each task mapping the page. Very expensive
> > operation.
>
> So... why does do_migrate_pages() take mmap_sem at all?
In order to scan for migratable pages through the page table and in order
to guarantee the existence of the anon vma.
> And the code we're talking about here deals with anonymous pages, which are
> not shared betweem mm's.
COW f.e. results in sharing.
Hmmm.. But I see the point the "optimization" causes an inconsistency
between anon and file backed pages. For anon pages we need to do this
polling. I had prior unoptimized version that modified lookup_swap_cache
to handle migration entries. Maybe we better undo the optimization.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH 2/5] Swapless V2: Add migration swap entries
2006-04-14 1:01 ` Andrew Morton
2006-04-14 1:17 ` Andrew Morton
@ 2006-04-14 1:31 ` Christoph Lameter
2006-04-14 5:29 ` Andrew Morton
1 sibling, 1 reply; 54+ messages in thread
From: Christoph Lameter @ 2006-04-14 1:31 UTC (permalink / raw)
To: Andrew Morton
Cc: hugh, linux-kernel, lee.schermerhorn, linux-mm, taka,
marcelo.tosatti, kamezawa.hiroyu
On Thu, 13 Apr 2006, Andrew Morton wrote:
> So we falsely return VM_FAULT_MINOR and let userspace retake the pagefault,
> thus implementing a form of polling, yes? If so, there is no "something
> else" which this process can do.
Right.
> Pages are locked during migration. The faulting process will sleep in
> lock_page() until migration is complete. Except we've gone and diddled
> with the swap pte so do_swap_page() can no longer locate the page which
> needs to be locked.
Oh. The page is enconded in the migration pte.
> Doing a busy-wait seems a bit lame. Perhaps it would be better to go to
> sleep on some global queue, poke that queue each time a page migration
> completes?
If we rely on the migrating thread to hold the page count while the
page is locked then we could do what the patch below does. But then we
may race with the freeing of the old page after migration is finished.
If we would add the
increment of the page count back then we are on the safe side but have
the problem that we may increment the page count before the migrating
thread gets to the final check. Then the migration check would fail
and we would retry.
Index: linux-2.6.17-rc1-mm2/mm/memory.c
===================================================================
--- linux-2.6.17-rc1-mm2.orig/mm/memory.c 2006-04-13 17:32:36.000000000 -0700
+++ linux-2.6.17-rc1-mm2/mm/memory.c 2006-04-13 18:26:49.000000000 -0700
@@ -1881,11 +1881,11 @@ static int do_swap_page(struct mm_struct
entry = pte_to_swp_entry(orig_pte);
if (is_migration_entry(entry)) {
- /*
- * We cannot access the page because of ongoing page
- * migration. See if we can do something else.
- */
- yield();
+ page = migration_entry_to_page(entry);
+ lock_page(page);
+ entry = pte_to_swp_entry(*page_table);
+ BUG_ON(is_migration_entry(entry));
+ unlock_page(page);
goto out;
}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 54+ messages in thread* Re: [PATCH 2/5] Swapless V2: Add migration swap entries
2006-04-14 1:31 ` Christoph Lameter
@ 2006-04-14 5:29 ` Andrew Morton
2006-04-14 17:28 ` Implement lookup_swap_cache for migration entries Christoph Lameter
0 siblings, 1 reply; 54+ messages in thread
From: Andrew Morton @ 2006-04-14 5:29 UTC (permalink / raw)
To: Christoph Lameter
Cc: hugh, linux-kernel, lee.schermerhorn, linux-mm, taka,
marcelo.tosatti, kamezawa.hiroyu
Christoph Lameter <clameter@sgi.com> wrote:
>
> On Thu, 13 Apr 2006, Andrew Morton wrote:
>
> > So we falsely return VM_FAULT_MINOR and let userspace retake the pagefault,
> > thus implementing a form of polling, yes? If so, there is no "something
> > else" which this process can do.
>
> Right.
>
> > Pages are locked during migration. The faulting process will sleep in
> > lock_page() until migration is complete. Except we've gone and diddled
> > with the swap pte so do_swap_page() can no longer locate the page which
> > needs to be locked.
>
> Oh. The page is enconded in the migration pte.
>
> > Doing a busy-wait seems a bit lame. Perhaps it would be better to go to
> > sleep on some global queue, poke that queue each time a page migration
> > completes?
>
> If we rely on the migrating thread to hold the page count while the
> page is locked then we could do what the patch below does. But then we
> may race with the freeing of the old page after migration is finished.
Yeah, that's unpleasant.
> If we would add the
> increment of the page count back then we are on the safe side but have
> the problem that we may increment the page count before the migrating
> thread gets to the final check. Then the migration check would fail
> and we would retry.
>
>
> Index: linux-2.6.17-rc1-mm2/mm/memory.c
> ===================================================================
> --- linux-2.6.17-rc1-mm2.orig/mm/memory.c 2006-04-13 17:32:36.000000000 -0700
> +++ linux-2.6.17-rc1-mm2/mm/memory.c 2006-04-13 18:26:49.000000000 -0700
> @@ -1881,11 +1881,11 @@ static int do_swap_page(struct mm_struct
> entry = pte_to_swp_entry(orig_pte);
>
> if (is_migration_entry(entry)) {
> - /*
> - * We cannot access the page because of ongoing page
> - * migration. See if we can do something else.
> - */
> - yield();
> + page = migration_entry_to_page(entry);
> + lock_page(page);
> + entry = pte_to_swp_entry(*page_table);
> + BUG_ON(is_migration_entry(entry));
> + unlock_page(page);
> goto out;
> }
Is this page still lookable-uppable in swapcache? If so, that's the way to
get the refcount on it.
We don't _have_ to use the page lock of course. A simple
wait_event(some_wq, !is_migration_entry(entry));
would suffice.
But what prevents this swp_entry_t from becoming an is_migration_entry
swp_pte_t two nanoseconds after we've passed this check?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 54+ messages in thread* Implement lookup_swap_cache for migration entries
2006-04-14 5:29 ` Andrew Morton
@ 2006-04-14 17:28 ` Christoph Lameter
2006-04-14 18:31 ` Andrew Morton
0 siblings, 1 reply; 54+ messages in thread
From: Christoph Lameter @ 2006-04-14 17:28 UTC (permalink / raw)
To: Andrew Morton
Cc: hugh, linux-kernel, lee.schermerhorn, linux-mm, taka,
marcelo.tosatti, kamezawa.hiroyu
This undoes the optimization that resulted in a yield in do_swap_cache().
do_swap_cache() stays as is. Instead we convert the migration entry to
a page * in lookup_swap_cache.
For the non swap case we need a special macro version of lookup_swap_cache
that is only capable of handling migration cache entries.
Signed-off-by: Christoph Lameter <claemter@sgi.com>
Index: linux-2.6.17-rc1-mm2/mm/swap_state.c
===================================================================
--- linux-2.6.17-rc1-mm2.orig/mm/swap_state.c 2006-04-11 12:14:34.000000000 -0700
+++ linux-2.6.17-rc1-mm2/mm/swap_state.c 2006-04-14 09:10:03.000000000 -0700
@@ -10,6 +10,7 @@
#include <linux/mm.h>
#include <linux/kernel_stat.h>
#include <linux/swap.h>
+#include <linux/swapops.h>
#include <linux/swap-prefetch.h>
#include <linux/init.h>
#include <linux/pagemap.h>
@@ -305,6 +306,12 @@ struct page * lookup_swap_cache(swp_entr
{
struct page *page;
+ if (is_migration_entry(entry)) {
+ page = migration_entry_to_page(entry);
+ get_page(page);
+ return page;
+ }
+
page = find_get_page(&swapper_space, entry.val);
if (page)
Index: linux-2.6.17-rc1-mm2/mm/memory.c
===================================================================
--- linux-2.6.17-rc1-mm2.orig/mm/memory.c 2006-04-13 16:43:10.000000000 -0700
+++ linux-2.6.17-rc1-mm2/mm/memory.c 2006-04-14 09:04:11.000000000 -0700
@@ -1880,11 +1880,6 @@ static int do_swap_page(struct mm_struct
entry = pte_to_swp_entry(orig_pte);
- if (unlikely(is_migration_entry(entry))) {
- yield();
- goto out;
- }
-
page = lookup_swap_cache(entry);
if (!page) {
swapin_readahead(entry, address, vma);
Index: linux-2.6.17-rc1-mm2/include/linux/swap.h
===================================================================
--- linux-2.6.17-rc1-mm2.orig/include/linux/swap.h 2006-04-13 16:43:21.000000000 -0700
+++ linux-2.6.17-rc1-mm2/include/linux/swap.h 2006-04-14 09:08:42.000000000 -0700
@@ -302,7 +302,6 @@ static inline void disable_swap_token(vo
#define swap_duplicate(swp) /*NOTHING*/
#define swap_free(swp) /*NOTHING*/
#define read_swap_cache_async(swp,vma,addr) NULL
-#define lookup_swap_cache(swp) NULL
#define valid_swaphandles(swp, off) 0
#define can_share_swap_page(p) 0
#define move_to_swap_cache(p, swp) 1
@@ -311,6 +310,19 @@ static inline void disable_swap_token(vo
#define delete_from_swap_cache(p) /*NOTHING*/
#define swap_token_default_timeout 0
+/*
+ * Must use a macro for lookup_swap_cache since the functions
+ * used are only available in certain contexts.
+ */
+#define lookup_swap_cache(__swp) \
+({ struct page *p = NULL; \
+ if (is_migration_entry(__swp)) { \
+ p = migration_entry_to_page(__swp); \
+ get_page(p); \
+ } \
+ p; \
+})
+
static inline int remove_exclusive_swap_page(struct page *p)
{
return 0;
Index: linux-2.6.17-rc1-mm2/include/linux/swapops.h
===================================================================
--- linux-2.6.17-rc1-mm2.orig/include/linux/swapops.h 2006-04-13 16:43:10.000000000 -0700
+++ linux-2.6.17-rc1-mm2/include/linux/swapops.h 2006-04-14 09:55:25.000000000 -0700
@@ -77,7 +77,7 @@ static inline swp_entry_t make_migration
static inline int is_migration_entry(swp_entry_t entry)
{
- return swp_type(entry) == SWP_TYPE_MIGRATION;
+ return unlikely(swp_type(entry) == SWP_TYPE_MIGRATION);
}
static inline struct page *migration_entry_to_page(swp_entry_t entry)
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 54+ messages in thread* Re: Implement lookup_swap_cache for migration entries
2006-04-14 17:28 ` Implement lookup_swap_cache for migration entries Christoph Lameter
@ 2006-04-14 18:31 ` Andrew Morton
2006-04-14 18:48 ` Christoph Lameter
0 siblings, 1 reply; 54+ messages in thread
From: Andrew Morton @ 2006-04-14 18:31 UTC (permalink / raw)
To: Christoph Lameter
Cc: hugh, linux-kernel, lee.schermerhorn, linux-mm, taka,
marcelo.tosatti, kamezawa.hiroyu
Christoph Lameter <clameter@sgi.com> wrote:
>
> This undoes the optimization that resulted in a yield in do_swap_cache().
> do_swap_cache() stays as is. Instead we convert the migration entry to
> a page * in lookup_swap_cache.
>
> For the non swap case we need a special macro version of lookup_swap_cache
> that is only capable of handling migration cache entries.
>
> ...
>
> @@ -305,6 +306,12 @@ struct page * lookup_swap_cache(swp_entr
> {
> struct page *page;
>
> + if (is_migration_entry(entry)) {
> + page = migration_entry_to_page(entry);
> + get_page(page);
> + return page;
> + }
What locking ensures that the state of `entry' remains unaltered across the
is_migration_entry() and migration_entry_to_page() calls?
>
> +/*
> + * Must use a macro for lookup_swap_cache since the functions
> + * used are only available in certain contexts.
> + */
> +#define lookup_swap_cache(__swp) \
> +({ struct page *p = NULL; \
> + if (is_migration_entry(__swp)) { \
> + p = migration_entry_to_page(__swp); \
> + get_page(p); \
> + } \
> + p; \
> +})
hm. Can nommu do any of this?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 54+ messages in thread* Re: Implement lookup_swap_cache for migration entries
2006-04-14 18:31 ` Andrew Morton
@ 2006-04-14 18:48 ` Christoph Lameter
2006-04-14 19:15 ` Andrew Morton
0 siblings, 1 reply; 54+ messages in thread
From: Christoph Lameter @ 2006-04-14 18:48 UTC (permalink / raw)
To: Andrew Morton
Cc: hugh, linux-kernel, lee.schermerhorn, linux-mm, taka,
marcelo.tosatti, kamezawa.hiroyu
On Fri, 14 Apr 2006, Andrew Morton wrote:
> > @@ -305,6 +306,12 @@ struct page * lookup_swap_cache(swp_entr
> > {
> > struct page *page;
> >
> > + if (is_migration_entry(entry)) {
> > + page = migration_entry_to_page(entry);
> > + get_page(page);
> > + return page;
> > + }
>
> What locking ensures that the state of `entry' remains unaltered across the
> is_migration_entry() and migration_entry_to_page() calls?
entry is a variable passed by value to the function.
> > +/*
> > + * Must use a macro for lookup_swap_cache since the functions
> > + * used are only available in certain contexts.
> > + */
> > +#define lookup_swap_cache(__swp) \
> > +({ struct page *p = NULL; \
> > + if (is_migration_entry(__swp)) { \
> > + p = migration_entry_to_page(__swp); \
> > + get_page(p); \
> > + } \
> > + p; \
> > +})
>
> hm. Can nommu do any of this?
If page migration is off (methinks nommu may not support numa) then
the fallback functions are used.
Fallback is
is_migration_entry() == 0
therefore
#define lookup_swap_cache(__swp) NULL
like before.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 54+ messages in thread* Re: Implement lookup_swap_cache for migration entries
2006-04-14 18:48 ` Christoph Lameter
@ 2006-04-14 19:15 ` Andrew Morton
2006-04-14 19:22 ` Christoph Lameter
0 siblings, 1 reply; 54+ messages in thread
From: Andrew Morton @ 2006-04-14 19:15 UTC (permalink / raw)
To: Christoph Lameter
Cc: hugh, linux-kernel, lee.schermerhorn, linux-mm, taka,
marcelo.tosatti, kamezawa.hiroyu
Christoph Lameter <clameter@sgi.com> wrote:
>
> On Fri, 14 Apr 2006, Andrew Morton wrote:
>
> > > @@ -305,6 +306,12 @@ struct page * lookup_swap_cache(swp_entr
> > > {
> > > struct page *page;
> > >
> > > + if (is_migration_entry(entry)) {
> > > + page = migration_entry_to_page(entry);
> > > + get_page(page);
> > > + return page;
> > > + }
> >
> > What locking ensures that the state of `entry' remains unaltered across the
> > is_migration_entry() and migration_entry_to_page() calls?
>
> entry is a variable passed by value to the function.
Sigh.
What locking ensures that the state of the page referred to by `entry' is
stable?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 54+ messages in thread* Re: Implement lookup_swap_cache for migration entries
2006-04-14 19:15 ` Andrew Morton
@ 2006-04-14 19:22 ` Christoph Lameter
2006-04-14 19:53 ` Andrew Morton
0 siblings, 1 reply; 54+ messages in thread
From: Christoph Lameter @ 2006-04-14 19:22 UTC (permalink / raw)
To: Andrew Morton
Cc: hugh, linux-kernel, lee.schermerhorn, linux-mm, taka,
marcelo.tosatti, kamezawa.hiroyu
On Fri, 14 Apr 2006, Andrew Morton wrote:
> > > What locking ensures that the state of `entry' remains unaltered across the
> > > is_migration_entry() and migration_entry_to_page() calls?
> >
> > entry is a variable passed by value to the function.
>
> Sigh.
>
> What locking ensures that the state of the page referred to by `entry' is
> stable?
Oh, that.
Well, there is no locking when retrieving a pte atomically from the page
table. In do_swap_cache we figure out the page from the pte, lock the page
and then check that the pte has not changed. If it has changed then we
redo the fault. If the pte is still the same then we know that the page
was stable in the sense that it is still mapped the same way. So it was
not freed.
This applies to all pages handled by do_swap_page().
The differences are:
1. A migration entry does not take the tree_lock in lookup_swap_cache().
2. The migration thread will restore the regular pte before
dropping the page lock.
So after we succeed with the page lock we know that the pte has been
changed. The fault will be redone with the regular pte.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: Implement lookup_swap_cache for migration entries
2006-04-14 19:22 ` Christoph Lameter
@ 2006-04-14 19:53 ` Andrew Morton
2006-04-14 20:12 ` Christoph Lameter
2006-04-14 21:51 ` Wait for migrating page after incr of page count under anon_vma lock Christoph Lameter
0 siblings, 2 replies; 54+ messages in thread
From: Andrew Morton @ 2006-04-14 19:53 UTC (permalink / raw)
To: Christoph Lameter
Cc: hugh, linux-kernel, lee.schermerhorn, linux-mm, taka,
marcelo.tosatti, kamezawa.hiroyu
Christoph Lameter <clameter@sgi.com> wrote:
>
> On Fri, 14 Apr 2006, Andrew Morton wrote:
>
> > > > What locking ensures that the state of `entry' remains unaltered across the
> > > > is_migration_entry() and migration_entry_to_page() calls?
> > >
> > > entry is a variable passed by value to the function.
> >
> > Sigh.
> >
> > What locking ensures that the state of the page referred to by `entry' is
> > stable?
>
> Oh, that.
>
> Well, there is no locking when retrieving a pte atomically from the page
> table. In do_swap_cache we figure out the page from the pte, lock the page
> and then check that the pte has not changed. If it has changed then we
> redo the fault. If the pte is still the same then we know that the page
> was stable in the sense that it is still mapped the same way. So it was
> not freed.
>
> This applies to all pages handled by do_swap_page().
>
> The differences are:
>
> 1. A migration entry does not take the tree_lock in lookup_swap_cache().
>
> 2. The migration thread will restore the regular pte before
> dropping the page lock.
>
> So after we succeed with the page lock we know that the pte has been
> changed. The fault will be redone with the regular pte.\
So we're doing a get_page() on a random page which could be in any state -
it could be on the freelists, or in the per-cpu pages arrays, it could have
been reused for something else.
There's code in the kernel which assumes that we don't do that sort of
thing. For example:
static inline int page_is_buddy(struct page *page, int order)
{
#ifdef CONFIG_HOLES_IN_ZONE
if (!pfn_valid(page_to_pfn(page)))
return 0;
#endif
if (PageBuddy(page) && page_order(page) == order) {
BUG_ON(page_count(page) != 0);
return 1;
}
return 0;
}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 54+ messages in thread* Re: Implement lookup_swap_cache for migration entries
2006-04-14 19:53 ` Andrew Morton
@ 2006-04-14 20:12 ` Christoph Lameter
2006-04-14 21:51 ` Wait for migrating page after incr of page count under anon_vma lock Christoph Lameter
1 sibling, 0 replies; 54+ messages in thread
From: Christoph Lameter @ 2006-04-14 20:12 UTC (permalink / raw)
To: Andrew Morton
Cc: hugh, linux-kernel, lee.schermerhorn, linux-mm, taka,
marcelo.tosatti, kamezawa.hiroyu
On Fri, 14 Apr 2006, Andrew Morton wrote:
> So we're doing a get_page() on a random page which could be in any state -
> it could be on the freelists, or in the per-cpu pages arrays, it could have
> been reused for something else.
Hmmm... Yes, Ahh! The tree_lock prohibits this sort of thing from
happening to regular pages. Right.... Yuck this could be expensive to fix.
We are holding the anon_vma lock while remapping migration ptes. So we
could take the anonvma lock, check to see if the pte is still a migration
pte if so then it cannot change and we can safely increase page
count.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 54+ messages in thread
* Wait for migrating page after incr of page count under anon_vma lock
2006-04-14 19:53 ` Andrew Morton
2006-04-14 20:12 ` Christoph Lameter
@ 2006-04-14 21:51 ` Christoph Lameter
2006-04-17 23:52 ` migration_entry_wait: Use the pte lock instead of the " Christoph Lameter
1 sibling, 1 reply; 54+ messages in thread
From: Christoph Lameter @ 2006-04-14 21:51 UTC (permalink / raw)
To: Andrew Morton
Cc: hugh, linux-kernel, lee.schermerhorn, linux-mm, taka,
marcelo.tosatti, kamezawa.hiroyu
Another patch that considers the need to prevent the freeing of the page
and the pte while incrementing the page count.
Wait for migrating page after incr of page count under anon_vma lock
This patch replaces the yield() in do_swap_page with a call to
migration_entry_wait() in the migration code.
migration_entry_wait() locks the anonymous vma of the page and then
safely increments page count before waiting for the page to become
unlocked.
Migration entries are only removed while holding the anon_vma lock
(See remove_migration_ptes). Therefore we can be sure that the
migration pte is not modified and the underlying page is not
removed while holding this lock.
Also make is_migration_entry() unlikely and clean up a unnecessary
BUG_ON.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Index: linux-2.6.17-rc1-mm2/mm/memory.c
===================================================================
--- linux-2.6.17-rc1-mm2.orig/mm/memory.c 2006-04-13 16:43:10.000000000 -0700
+++ linux-2.6.17-rc1-mm2/mm/memory.c 2006-04-14 13:57:44.000000000 -0700
@@ -1880,8 +1880,8 @@ static int do_swap_page(struct mm_struct
entry = pte_to_swp_entry(orig_pte);
- if (unlikely(is_migration_entry(entry))) {
- yield();
+ if (is_migration_entry(entry)) {
+ migration_entry_wait(entry, page_table);
goto out;
}
Index: linux-2.6.17-rc1-mm2/include/linux/swapops.h
===================================================================
--- linux-2.6.17-rc1-mm2.orig/include/linux/swapops.h 2006-04-13 16:43:10.000000000 -0700
+++ linux-2.6.17-rc1-mm2/include/linux/swapops.h 2006-04-14 13:57:44.000000000 -0700
@@ -77,7 +77,7 @@ static inline swp_entry_t make_migration
static inline int is_migration_entry(swp_entry_t entry)
{
- return swp_type(entry) == SWP_TYPE_MIGRATION;
+ return unlikely(swp_type(entry) == SWP_TYPE_MIGRATION);
}
static inline struct page *migration_entry_to_page(swp_entry_t entry)
@@ -88,14 +88,16 @@ static inline struct page *migration_ent
* corresponding page is locked
*/
BUG_ON(!PageLocked(p));
- BUG_ON(!is_migration_entry(entry));
return p;
}
+
+extern void migration_entry_wait(swp_entry_t, pte_t *);
#else
#define make_migration_entry(page) swp_entry(0, 0)
#define is_migration_entry(swp) 0
#define migration_entry_to_page(swp) NULL
+static inline void migration_entry_wait(swp_entry_t entry, pte_t *ptep) { }
#endif
Index: linux-2.6.17-rc1-mm2/mm/migrate.c
===================================================================
--- linux-2.6.17-rc1-mm2.orig/mm/migrate.c 2006-04-13 16:44:07.000000000 -0700
+++ linux-2.6.17-rc1-mm2/mm/migrate.c 2006-04-14 14:27:06.000000000 -0700
@@ -174,6 +174,57 @@ out:
}
/*
+ * Something used the pte of a page under migration. We need to
+ * get to the page and wait until migration is finished.
+ * When we return from this function the fault will be retried.
+ *
+ * This function is called from do_swap_page().
+ */
+void migration_entry_wait(swp_entry_t entry, pte_t *ptep)
+{
+ struct page *page = migration_entry_to_page(entry);
+ unsigned long mapping = (unsigned long)page->mapping;
+ struct anon_vma *anon_vma;
+ pte_t pte;
+
+ if (!mapping ||
+ (mapping & PAGE_MAPPING_ANON) == 0)
+ return;
+ /*
+ * We hold the mmap_sem lock.
+ */
+ anon_vma = (struct anon_vma *) (mapping - PAGE_MAPPING_ANON);
+
+ /*
+ * The anon_vma lock is also taken while removing the migration
+ * entries. Take the lock here to insure that the migration pte
+ * is not modified while we increment the page count.
+ * This is similar to find_get_page().
+ */
+ spin_lock(&anon_vma->lock);
+ pte = *ptep;
+ if (pte_present(pte) || pte_none(pte) || pte_file(pte)) {
+ spin_unlock(&anon_vma->lock);
+ return;
+ }
+ entry = pte_to_swp_entry(pte);
+ if (!is_migration_entry(entry) ||
+ migration_entry_to_page(entry) != page) {
+ /* Migration entry is gone */
+ spin_unlock(&anon_vma->lock);
+ return;
+ }
+ /* Pages with migration entries must be locked */
+ BUG_ON(!PageLocked(page));
+
+ /* Phew. Finally we can increment the refcount */
+ get_page(page);
+ spin_unlock(&anon_vma->lock);
+ wait_on_page_locked(page);
+ put_page(page);
+}
+
+/*
* Get rid of all migration entries and replace them by
* references to the indicated page.
*
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 54+ messages in thread* migration_entry_wait: Use the pte lock instead of the anon_vma lock.
2006-04-14 21:51 ` Wait for migrating page after incr of page count under anon_vma lock Christoph Lameter
@ 2006-04-17 23:52 ` Christoph Lameter
0 siblings, 0 replies; 54+ messages in thread
From: Christoph Lameter @ 2006-04-17 23:52 UTC (permalink / raw)
To: Andrew Morton
Cc: hugh, linux-kernel, lee.schermerhorn, linux-mm, taka,
marcelo.tosatti, kamezawa.hiroyu
Use of the pte lock allows for much finer grained locking and avoids
the complexity coming with locking via the anon_vma. It will also
make the fetching of the pte value cleaner. Add a couple of other
improvements as well.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Index: linux-2.6.17-rc1-mm2/mm/memory.c
===================================================================
--- linux-2.6.17-rc1-mm2.orig/mm/memory.c 2006-04-14 14:47:37.000000000 -0700
+++ linux-2.6.17-rc1-mm2/mm/memory.c 2006-04-17 16:23:50.000000000 -0700
@@ -1881,7 +1881,7 @@ static int do_swap_page(struct mm_struct
entry = pte_to_swp_entry(orig_pte);
if (is_migration_entry(entry)) {
- migration_entry_wait(entry, page_table);
+ migration_entry_wait(mm, pmd, address);
goto out;
}
Index: linux-2.6.17-rc1-mm2/include/linux/swapops.h
===================================================================
--- linux-2.6.17-rc1-mm2.orig/include/linux/swapops.h 2006-04-14 14:47:37.000000000 -0700
+++ linux-2.6.17-rc1-mm2/include/linux/swapops.h 2006-04-17 16:45:52.000000000 -0700
@@ -91,13 +91,15 @@ static inline struct page *migration_ent
return p;
}
-extern void migration_entry_wait(swp_entry_t entry, pte_t *);
+extern void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd,
+ unsigned long address);
#else
#define make_migration_entry(page) swp_entry(0, 0)
#define is_migration_entry(swp) 0
#define migration_entry_to_page(swp) NULL
-static inline void migration_entry_wait(swp_entry_t entry, pte_t *ptep) { }
+static inline void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd,
+ unsigned long address) { }
#endif
Index: linux-2.6.17-rc1-mm2/mm/migrate.c
===================================================================
--- linux-2.6.17-rc1-mm2.orig/mm/migrate.c 2006-04-14 14:47:37.000000000 -0700
+++ linux-2.6.17-rc1-mm2/mm/migrate.c 2006-04-17 16:46:45.000000000 -0700
@@ -180,48 +180,35 @@ out:
*
* This function is called from do_swap_page().
*/
-void migration_entry_wait(swp_entry_t entry, pte_t *ptep)
+void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd,
+ unsigned long address)
{
- struct page *page = migration_entry_to_page(entry);
- unsigned long mapping = (unsigned long)page->mapping;
- struct anon_vma *anon_vma;
- pte_t pte;
-
- if (!mapping ||
- (mapping & PAGE_MAPPING_ANON) == 0)
- return;
- /*
- * We hold the mmap_sem lock.
- */
- anon_vma = (struct anon_vma *) (mapping - PAGE_MAPPING_ANON);
+ pte_t *ptep, pte;
+ spinlock_t *ptl;
+ swp_entry_t entry;
+ struct page *page;
- /*
- * The anon_vma lock is also taken while removing the migration
- * entries. Take the lock here to insure that the migration pte
- * is not modified while we increment the page count.
- * This is similar to find_get_page().
- */
- spin_lock(&anon_vma->lock);
+ ptep = pte_offset_map_lock(mm, pmd, address, &ptl);
pte = *ptep;
- if (pte_present(pte) || pte_none(pte) || pte_file(pte)) {
- spin_unlock(&anon_vma->lock);
- return;
- }
+ if (!is_swap_pte(pte))
+ goto out;
+
entry = pte_to_swp_entry(pte);
- if (!is_migration_entry(entry) ||
- migration_entry_to_page(entry) != page) {
- /* Migration entry is gone */
- spin_unlock(&anon_vma->lock);
- return;
- }
- /* Pages with migration entries must be locked */
+ if (!is_migration_entry(entry))
+ goto out;
+
+ page = migration_entry_to_page(entry);
+
+ /* Pages with migration entries are always locked */
BUG_ON(!PageLocked(page));
- /* Phew. Finally we can increment the refcount */
get_page(page);
- spin_unlock(&anon_vma->lock);
+ pte_unmap_unlock(ptep, ptl);
wait_on_page_locked(page);
put_page(page);
+ return;
+out:
+ pte_unmap_unlock(ptep, ptl);
}
/*
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH 2/5] Swapless V2: Add migration swap entries
2006-04-14 0:13 ` Andrew Morton
2006-04-14 0:29 ` Christoph Lameter
@ 2006-04-14 0:36 ` Christoph Lameter
1 sibling, 0 replies; 54+ messages in thread
From: Christoph Lameter @ 2006-04-14 0:36 UTC (permalink / raw)
To: Andrew Morton
Cc: hugh, linux-kernel, lee.schermerhorn, linux-mm, taka,
marcelo.tosatti, kamezawa.hiroyu
1. Add explanation for the yield
2. Move unlikely to is_migration_entry (Does that really work??)
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Index: linux-2.6.17-rc1-mm2/mm/memory.c
===================================================================
--- linux-2.6.17-rc1-mm2.orig/mm/memory.c 2006-04-13 16:43:10.000000000 -0700
+++ linux-2.6.17-rc1-mm2/mm/memory.c 2006-04-13 17:32:36.000000000 -0700
@@ -1880,7 +1880,11 @@ static int do_swap_page(struct mm_struct
entry = pte_to_swp_entry(orig_pte);
- if (unlikely(is_migration_entry(entry))) {
+ if (is_migration_entry(entry)) {
+ /*
+ * We cannot access the page because of ongoing page
+ * migration. See if we can do something else.
+ */
yield();
goto out;
}
Index: linux-2.6.17-rc1-mm2/include/linux/swapops.h
===================================================================
--- linux-2.6.17-rc1-mm2.orig/include/linux/swapops.h 2006-04-13 16:43:10.000000000 -0700
+++ linux-2.6.17-rc1-mm2/include/linux/swapops.h 2006-04-13 17:32:58.000000000 -0700
@@ -77,7 +77,7 @@ static inline swp_entry_t make_migration
static inline int is_migration_entry(swp_entry_t entry)
{
- return swp_type(entry) == SWP_TYPE_MIGRATION;
+ return unlikely(swp_type(entry) == SWP_TYPE_MIGRATION);
}
static inline struct page *migration_entry_to_page(swp_entry_t entry)
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 54+ messages in thread
* [PATCH 3/5] Swapless V2: Make try_to_unmap() create migration entries
2006-04-13 23:54 [PATCH 0/5] Swapless page migration V2: Overview Christoph Lameter
2006-04-13 23:54 ` [PATCH 1/5] Swapless V2: try_to_unmap() - Rename ignrefs to "migration" Christoph Lameter
2006-04-13 23:54 ` [PATCH 2/5] Swapless V2: Add migration swap entries Christoph Lameter
@ 2006-04-13 23:54 ` Christoph Lameter
2006-04-13 23:54 ` [PATCH 4/5] Swapless V2: Rip out swap portion of old migration code Christoph Lameter
` (2 subsequent siblings)
5 siblings, 0 replies; 54+ messages in thread
From: Christoph Lameter @ 2006-04-13 23:54 UTC (permalink / raw)
To: akpm
Cc: Hugh Dickins, linux-kernel, Lee Schermerhorn, linux-mm,
Christoph Lameter, Hirokazu Takahashi, Marcelo Tosatti,
KAMEZAWA Hiroyuki
Modify try_to_unmap to produce migration entries
If we are trying to unmap an entry and do not have an associated
swapcache entry but are doing migration then create a special
migration entry pointing to the pfn.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Index: linux-2.6.17-rc1-mm2/mm/rmap.c
===================================================================
--- linux-2.6.17-rc1-mm2.orig/mm/rmap.c 2006-04-13 12:56:10.000000000 -0700
+++ linux-2.6.17-rc1-mm2/mm/rmap.c 2006-04-13 15:18:45.000000000 -0700
@@ -620,17 +620,27 @@ static int try_to_unmap_one(struct page
if (PageAnon(page)) {
swp_entry_t entry = { .val = page_private(page) };
- /*
- * Store the swap location in the pte.
- * See handle_pte_fault() ...
- */
- BUG_ON(!PageSwapCache(page));
- swap_duplicate(entry);
- if (list_empty(&mm->mmlist)) {
- spin_lock(&mmlist_lock);
- if (list_empty(&mm->mmlist))
- list_add(&mm->mmlist, &init_mm.mmlist);
- spin_unlock(&mmlist_lock);
+
+ if (PageSwapCache(page)) {
+ /*
+ * Store the swap location in the pte.
+ * See handle_pte_fault() ...
+ */
+ swap_duplicate(entry);
+ if (list_empty(&mm->mmlist)) {
+ spin_lock(&mmlist_lock);
+ if (list_empty(&mm->mmlist))
+ list_add(&mm->mmlist, &init_mm.mmlist);
+ spin_unlock(&mmlist_lock);
+ }
+ } else {
+ /*
+ * Store the pfn of the page in a special migration
+ * pte. do_swap_page() will wait until the migration
+ * pte is removed and then restart fault handling.
+ */
+ BUG_ON(!migration);
+ entry = make_migration_entry(page);
}
set_pte_at(mm, address, pte, swp_entry_to_pte(entry));
BUG_ON(pte_file(*pte));
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 54+ messages in thread* [PATCH 4/5] Swapless V2: Rip out swap portion of old migration code
2006-04-13 23:54 [PATCH 0/5] Swapless page migration V2: Overview Christoph Lameter
` (2 preceding siblings ...)
2006-04-13 23:54 ` [PATCH 3/5] Swapless V2: Make try_to_unmap() create migration entries Christoph Lameter
@ 2006-04-13 23:54 ` Christoph Lameter
2006-04-13 23:54 ` [PATCH 5/5] Swapless V2: Revise main migration logic Christoph Lameter
2006-04-14 0:08 ` [PATCH 0/5] Swapless page migration V2: Overview Andrew Morton
5 siblings, 0 replies; 54+ messages in thread
From: Christoph Lameter @ 2006-04-13 23:54 UTC (permalink / raw)
To: akpm
Cc: Hugh Dickins, linux-kernel, Lee Schermerhorn, linux-mm,
Christoph Lameter, Hirokazu Takahashi, Marcelo Tosatti,
KAMEZAWA Hiroyuki
Rip the page migration logic out
Remove all code that has to do with swapping during page migration.
This also guts the ability to migrate pages to swap. No one used that
so lets let it go for good.
Page migration should be a bit broken after this patch.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Index: linux-2.6.17-rc1-mm2/mm/migrate.c
===================================================================
--- linux-2.6.17-rc1-mm2.orig/mm/migrate.c 2006-04-11 12:14:34.000000000 -0700
+++ linux-2.6.17-rc1-mm2/mm/migrate.c 2006-04-11 22:56:27.000000000 -0700
@@ -70,10 +70,6 @@ int isolate_lru_page(struct page *page,
*/
int migrate_prep(void)
{
- /* Must have swap device for migration */
- if (nr_swap_pages <= 0)
- return -ENODEV;
-
/*
* Clear the LRU lists so pages can be isolated.
* Note that pages may be moved off the LRU after we have
@@ -129,52 +125,6 @@ int fail_migrate_page(struct page *newpa
EXPORT_SYMBOL(fail_migrate_page);
/*
- * swapout a single page
- * page is locked upon entry, unlocked on exit
- */
-static int swap_page(struct page *page)
-{
- struct address_space *mapping = page_mapping(page);
-
- if (page_mapped(page) && mapping)
- if (try_to_unmap(page, 1) != SWAP_SUCCESS)
- goto unlock_retry;
-
- if (PageDirty(page)) {
- /* Page is dirty, try to write it out here */
- switch(pageout(page, mapping)) {
- case PAGE_KEEP:
- case PAGE_ACTIVATE:
- goto unlock_retry;
-
- case PAGE_SUCCESS:
- goto retry;
-
- case PAGE_CLEAN:
- ; /* try to free the page below */
- }
- }
-
- if (PagePrivate(page)) {
- if (!try_to_release_page(page, GFP_KERNEL) ||
- (!mapping && page_count(page) == 1))
- goto unlock_retry;
- }
-
- if (remove_mapping(mapping, page)) {
- /* Success */
- unlock_page(page);
- return 0;
- }
-
-unlock_retry:
- unlock_page(page);
-
-retry:
- return -EAGAIN;
-}
-
-/*
* Remove references for a page and establish the new page with the correct
* basic settings to be able to stop accesses to the page.
*/
@@ -335,8 +285,7 @@ EXPORT_SYMBOL(migrate_page);
* Two lists are passed to this function. The first list
* contains the pages isolated from the LRU to be migrated.
* The second list contains new pages that the pages isolated
- * can be moved to. If the second list is NULL then all
- * pages are swapped out.
+ * can be moved to.
*
* The function returns after 10 attempts or if no pages
* are movable anymore because to has become empty
@@ -392,30 +341,13 @@ redo:
* Only wait on writeback if we have already done a pass where
* we we may have triggered writeouts for lots of pages.
*/
- if (pass > 0) {
+ if (pass > 0)
wait_on_page_writeback(page);
- } else {
+ else {
if (PageWriteback(page))
goto unlock_page;
}
- /*
- * Anonymous pages must have swap cache references otherwise
- * the information contained in the page maps cannot be
- * preserved.
- */
- if (PageAnon(page) && !PageSwapCache(page)) {
- if (!add_to_swap(page, GFP_KERNEL)) {
- rc = -ENOMEM;
- goto unlock_page;
- }
- }
-
- if (!to) {
- rc = swap_page(page);
- goto next;
- }
-
newpage = lru_to_page(to);
lock_page(newpage);
@@ -469,24 +401,6 @@ redo:
goto unlock_both;
}
- /*
- * On early passes with mapped pages simply
- * retry. There may be a lock held for some
- * buffers that may go away. Later
- * swap them out.
- */
- if (pass > 4) {
- /*
- * Persistently unable to drop buffers..... As a
- * measure of last resort we fall back to
- * swap_page().
- */
- unlock_page(newpage);
- newpage = NULL;
- rc = swap_page(page);
- goto next;
- }
-
unlock_both:
unlock_page(newpage);
Index: linux-2.6.17-rc1-mm2/mm/swapfile.c
===================================================================
--- linux-2.6.17-rc1-mm2.orig/mm/swapfile.c 2006-04-11 22:56:23.000000000 -0700
+++ linux-2.6.17-rc1-mm2/mm/swapfile.c 2006-04-11 22:56:27.000000000 -0700
@@ -618,15 +618,6 @@ static int unuse_mm(struct mm_struct *mm
return 0;
}
-#ifdef CONFIG_MIGRATION
-int remove_vma_swap(struct vm_area_struct *vma, struct page *page)
-{
- swp_entry_t entry = { .val = page_private(page) };
-
- return unuse_vma(vma, entry, page);
-}
-#endif
-
/*
* Scan swap_map from current position to next entry still in use.
* Recycle to start on reaching the end, returning 0 when empty.
Index: linux-2.6.17-rc1-mm2/mm/rmap.c
===================================================================
--- linux-2.6.17-rc1-mm2.orig/mm/rmap.c 2006-04-11 22:56:24.000000000 -0700
+++ linux-2.6.17-rc1-mm2/mm/rmap.c 2006-04-11 22:56:27.000000000 -0700
@@ -205,44 +205,6 @@ out:
return anon_vma;
}
-#ifdef CONFIG_MIGRATION
-/*
- * Remove an anonymous page from swap replacing the swap pte's
- * through real pte's pointing to valid pages and then releasing
- * the page from the swap cache.
- *
- * Must hold page lock on page and mmap_sem of one vma that contains
- * the page.
- */
-void remove_from_swap(struct page *page)
-{
- struct anon_vma *anon_vma;
- struct vm_area_struct *vma;
- unsigned long mapping;
-
- if (!PageSwapCache(page))
- return;
-
- mapping = (unsigned long)page->mapping;
-
- if (!mapping || (mapping & PAGE_MAPPING_ANON) == 0)
- return;
-
- /*
- * We hold the mmap_sem lock. So no need to call page_lock_anon_vma.
- */
- anon_vma = (struct anon_vma *) (mapping - PAGE_MAPPING_ANON);
- spin_lock(&anon_vma->lock);
-
- list_for_each_entry(vma, &anon_vma->head, anon_vma_node)
- remove_vma_swap(vma, page);
-
- spin_unlock(&anon_vma->lock);
- delete_from_swap_cache(page);
-}
-EXPORT_SYMBOL(remove_from_swap);
-#endif
-
/*
* At what user virtual address is page expected in vma?
*/
Index: linux-2.6.17-rc1-mm2/include/linux/rmap.h
===================================================================
--- linux-2.6.17-rc1-mm2.orig/include/linux/rmap.h 2006-04-11 22:56:24.000000000 -0700
+++ linux-2.6.17-rc1-mm2/include/linux/rmap.h 2006-04-11 22:56:27.000000000 -0700
@@ -92,7 +92,6 @@ static inline void page_dup_rmap(struct
*/
int page_referenced(struct page *, int is_locked);
int try_to_unmap(struct page *, int ignore_refs);
-void remove_from_swap(struct page *page);
/*
* Called from mm/filemap_xip.c to unmap empty zero page
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 54+ messages in thread* [PATCH 5/5] Swapless V2: Revise main migration logic
2006-04-13 23:54 [PATCH 0/5] Swapless page migration V2: Overview Christoph Lameter
` (3 preceding siblings ...)
2006-04-13 23:54 ` [PATCH 4/5] Swapless V2: Rip out swap portion of old migration code Christoph Lameter
@ 2006-04-13 23:54 ` Christoph Lameter
2006-04-14 1:19 ` KAMEZAWA Hiroyuki
2006-04-14 0:08 ` [PATCH 0/5] Swapless page migration V2: Overview Andrew Morton
5 siblings, 1 reply; 54+ messages in thread
From: Christoph Lameter @ 2006-04-13 23:54 UTC (permalink / raw)
To: akpm
Cc: Hugh Dickins, linux-kernel, Lee Schermerhorn, linux-mm,
Christoph Lameter, Hirokazu Takahashi, Marcelo Tosatti,
KAMEZAWA Hiroyuki
Use the migration entries for page migration
This modifies the migration code to use the new migration entries.
It now becomes possible to migrate anonymous pages without having to
add a swap entry.
We add a couple of new functions to replace migration entries with the proper
ptes.
We cannot take the tree_lock for migrating anonymous pages anymore. However,
we know that we hold the only remaining reference to the page when the page
count reaches 1.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Index: linux-2.6.17-rc1-mm2/mm/migrate.c
===================================================================
--- linux-2.6.17-rc1-mm2.orig/mm/migrate.c 2006-04-13 15:58:54.000000000 -0700
+++ linux-2.6.17-rc1-mm2/mm/migrate.c 2006-04-13 16:36:28.000000000 -0700
@@ -15,6 +15,7 @@
#include <linux/migrate.h>
#include <linux/module.h>
#include <linux/swap.h>
+#include <linux/swapops.h>
#include <linux/pagemap.h>
#include <linux/buffer_head.h>
#include <linux/mm_inline.h>
@@ -23,7 +24,6 @@
#include <linux/topology.h>
#include <linux/cpu.h>
#include <linux/cpuset.h>
-#include <linux/swapops.h>
#include "internal.h"
@@ -115,6 +115,95 @@ int putback_lru_pages(struct list_head *
return count;
}
+static inline int is_swap_pte(pte_t pte)
+{
+ return !pte_none(pte) && !pte_present(pte) && !pte_file(pte);
+}
+
+/*
+ * Restore a potential migration pte to a working pte entry for
+ * anonymous pages.
+ */
+static void remove_migration_pte(struct vm_area_struct *vma, unsigned long addr,
+ struct page *old, struct page *new)
+{
+ struct mm_struct *mm = vma->vm_mm;
+ swp_entry_t entry;
+ pgd_t *pgd;
+ pud_t *pud;
+ pmd_t *pmd;
+ pte_t *ptep, pte;
+ spinlock_t *ptl;
+
+ pgd = pgd_offset(mm, addr);
+ if (!pgd_present(*pgd))
+ return;
+
+ pud = pud_offset(pgd, addr);
+ if (!pud_present(*pud))
+ return;
+
+ pmd = pmd_offset(pud, addr);
+ if (!pmd_present(*pmd))
+ return;
+
+ ptep = pte_offset_map(pmd, addr);
+
+ if (!is_swap_pte(*ptep)) {
+ pte_unmap(ptep);
+ return;
+ }
+
+ ptl = pte_lockptr(mm, pmd);
+ spin_lock(ptl);
+ pte = *ptep;
+ if (!is_swap_pte(pte))
+ goto out;
+
+ entry = pte_to_swp_entry(pte);
+
+ if (!is_migration_entry(entry) || migration_entry_to_page(entry) != old)
+ goto out;
+
+ inc_mm_counter(mm, anon_rss);
+ get_page(new);
+ set_pte_at(mm, addr, ptep, pte_mkold(mk_pte(new, vma->vm_page_prot)));
+ page_add_anon_rmap(new, vma, addr);
+out:
+ pte_unmap_unlock(pte, ptl);
+}
+
+/*
+ * Get rid of all migration entries and replace them by
+ * references to the indicated page.
+ *
+ * Must hold mmap_sem lock on at least one of the vmas containing
+ * the page so that the anon_vma cannot vanish.
+ */
+static void remove_migration_ptes(struct page *old, struct page *new)
+{
+ struct anon_vma *anon_vma;
+ struct vm_area_struct *vma;
+ unsigned long mapping;
+
+ mapping = (unsigned long)new->mapping;
+
+ if (!mapping || (mapping & PAGE_MAPPING_ANON) == 0)
+ return;
+
+ /*
+ * We hold the mmap_sem lock. So no need to call page_lock_anon_vma.
+ */
+ anon_vma = (struct anon_vma *) (mapping - PAGE_MAPPING_ANON);
+ spin_lock(&anon_vma->lock);
+
+ list_for_each_entry(vma, &anon_vma->head, anon_vma_node)
+ remove_migration_pte(vma, page_address_in_vma(new, vma),
+ old, new);
+
+ spin_unlock(&anon_vma->lock);
+}
+
/*
* Non migratable page
*/
@@ -125,8 +214,9 @@ int fail_migrate_page(struct page *newpa
EXPORT_SYMBOL(fail_migrate_page);
/*
- * Remove references for a page and establish the new page with the correct
- * basic settings to be able to stop accesses to the page.
+ * Remove or replace all references to a page so that future accesses to
+ * the page can be blocked. Establish the new page
+ * with the basic settings to be able to stop accesses to the page.
*/
int migrate_page_remove_references(struct page *newpage,
struct page *page, int nr_refs)
@@ -139,38 +229,51 @@ int migrate_page_remove_references(struc
* indicates that the page is in use or truncate has removed
* the page.
*/
- if (!mapping || page_mapcount(page) + nr_refs != page_count(page))
- return -EAGAIN;
+ if (!page->mapping ||
+ page_mapcount(page) + nr_refs != page_count(page))
+ return -EAGAIN;
/*
- * Establish swap ptes for anonymous pages or destroy pte
+ * Establish migration ptes for anonymous pages or destroy pte
* maps for files.
*
* In order to reestablish file backed mappings the fault handlers
* will take the radix tree_lock which may then be used to stop
* processses from accessing this page until the new page is ready.
*
- * A process accessing via a swap pte (an anonymous page) will take a
- * page_lock on the old page which will block the process until the
- * migration attempt is complete. At that time the PageSwapCache bit
- * will be examined. If the page was migrated then the PageSwapCache
- * bit will be clear and the operation to retrieve the page will be
- * retried which will find the new page in the radix tree. Then a new
- * direct mapping may be generated based on the radix tree contents.
- *
- * If the page was not migrated then the PageSwapCache bit
- * is still set and the operation may continue.
+ * A process accessing via a migration pte (an anonymous page) will
+ * take a page_lock on the old page which will block the process
+ * until the migration attempt is complete.
*/
if (try_to_unmap(page, 1) == SWAP_FAIL)
/* A vma has VM_LOCKED set -> permanent failure */
return -EPERM;
/*
- * Give up if we were unable to remove all mappings.
+ * Retry if we were unable to remove all mappings.
*/
if (page_mapcount(page))
return -EAGAIN;
+ if (!mapping) {
+ /*
+ * Anonymous page without swap mapping.
+ * User space cannot access the page anymore since we
+ * removed the ptes. Now check if the kernel still has
+ * pending references.
+ */
+ if (page_count(page) != nr_refs)
+ return -EAGAIN;
+
+ /* We are holding the only remaining reference */
+ newpage->index = page->index;
+ newpage->mapping = page->mapping;
+ return 0;
+ }
+
+ /*
+ * The page has a mapping that we need to change
+ */
write_lock_irq(&mapping->tree_lock);
radix_pointer = (struct page **)radix_tree_lookup_slot(
@@ -194,10 +297,13 @@ int migrate_page_remove_references(struc
get_page(newpage);
newpage->index = page->index;
newpage->mapping = page->mapping;
+
+#ifdef CONFIG_SWAP
if (PageSwapCache(page)) {
SetPageSwapCache(newpage);
set_page_private(newpage, page_private(page));
}
+#endif
*radix_pointer = newpage;
__put_page(page);
@@ -232,7 +338,9 @@ void migrate_page_copy(struct page *newp
set_page_dirty(newpage);
}
+#ifdef CONFIG_SWAP
ClearPageSwapCache(page);
+#endif
ClearPageActive(page);
ClearPagePrivate(page);
set_page_private(page, 0);
@@ -259,22 +367,16 @@ int migrate_page(struct page *newpage, s
BUG_ON(PageWriteback(page)); /* Writeback must be complete */
- rc = migrate_page_remove_references(newpage, page, 2);
+ rc = migrate_page_remove_references(newpage, page,
+ page_mapping(page) ? 2 : 1);
- if (rc)
+ if (rc) {
+ remove_migration_ptes(page, page);
return rc;
+ }
migrate_page_copy(newpage, page);
-
- /*
- * Remove auxiliary swap entries and replace
- * them with real ptes.
- *
- * Note that a real pte entry will allow processes that are not
- * waiting on the page lock to use the new page via the page tables
- * before the new page is unlocked.
- */
- remove_from_swap(newpage);
+ remove_migration_ptes(page, newpage);
return 0;
}
EXPORT_SYMBOL(migrate_page);
@@ -356,9 +458,11 @@ redo:
* Try to migrate the page.
*/
mapping = page_mapping(page);
- if (!mapping)
+ if (!mapping) {
+ rc = migrate_page(newpage, page);
goto unlock_both;
+ } else
if (mapping->a_ops->migratepage) {
/*
* Most pages have a mapping and most filesystems
Index: linux-2.6.17-rc1-mm2/mm/Kconfig
===================================================================
--- linux-2.6.17-rc1-mm2.orig/mm/Kconfig 2006-04-02 20:22:10.000000000 -0700
+++ linux-2.6.17-rc1-mm2/mm/Kconfig 2006-04-13 15:58:56.000000000 -0700
@@ -138,8 +138,8 @@ config SPLIT_PTLOCK_CPUS
#
config MIGRATION
bool "Page migration"
- def_bool y if NUMA
- depends on SWAP && NUMA
+ def_bool y
+ depends on NUMA
help
Allows the migration of the physical location of pages of processes
while the virtual addresses are not changed. This is useful for
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 54+ messages in thread* Re: [PATCH 5/5] Swapless V2: Revise main migration logic
2006-04-13 23:54 ` [PATCH 5/5] Swapless V2: Revise main migration logic Christoph Lameter
@ 2006-04-14 1:19 ` KAMEZAWA Hiroyuki
2006-04-14 1:33 ` Christoph Lameter
0 siblings, 1 reply; 54+ messages in thread
From: KAMEZAWA Hiroyuki @ 2006-04-14 1:19 UTC (permalink / raw)
To: Christoph Lameter
Cc: akpm, hugh, linux-kernel, lee.schermerhorn, linux-mm, taka,
marcelo.tosatti
On Thu, 13 Apr 2006 16:54:32 -0700 (PDT)
Christoph Lameter <clameter@sgi.com> wrote:
>
> + inc_mm_counter(mm, anon_rss);
> + get_page(new);
> + set_pte_at(mm, addr, ptep, pte_mkold(mk_pte(new, vma->vm_page_prot)));
> + page_add_anon_rmap(new, vma, addr);
Just a note:
This will cause unecessary copy-on-write later.
(current remove_from_swap() can cause copy-on-write....)
But maybe copy-on-write is just minor case for migrating specified vmas.
For hotremove (I stops it now..), we should fix this later (if we can do).
If new SWP_TYPE_MIGRATION swp entry can contain write protect bit,
hotremove can avoid copy-on-write but things will be more complicated.
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH 5/5] Swapless V2: Revise main migration logic
2006-04-14 1:19 ` KAMEZAWA Hiroyuki
@ 2006-04-14 1:33 ` Christoph Lameter
2006-04-14 1:40 ` KAMEZAWA Hiroyuki
2006-04-14 2:34 ` KAMEZAWA Hiroyuki
0 siblings, 2 replies; 54+ messages in thread
From: Christoph Lameter @ 2006-04-14 1:33 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: akpm, hugh, linux-kernel, lee.schermerhorn, linux-mm, taka,
marcelo.tosatti
On Fri, 14 Apr 2006, KAMEZAWA Hiroyuki wrote:
> For hotremove (I stops it now..), we should fix this later (if we can do).
> If new SWP_TYPE_MIGRATION swp entry can contain write protect bit,
> hotremove can avoid copy-on-write but things will be more complicated.
This is a known issue.I'd be glad if you could come up with a working
scheme to solve this that is simple.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH 5/5] Swapless V2: Revise main migration logic
2006-04-14 1:33 ` Christoph Lameter
@ 2006-04-14 1:40 ` KAMEZAWA Hiroyuki
2006-04-14 2:34 ` KAMEZAWA Hiroyuki
1 sibling, 0 replies; 54+ messages in thread
From: KAMEZAWA Hiroyuki @ 2006-04-14 1:40 UTC (permalink / raw)
To: Christoph Lameter
Cc: akpm, hugh, linux-kernel, lee.schermerhorn, linux-mm, taka,
marcelo.tosatti
On Thu, 13 Apr 2006 18:33:07 -0700 (PDT)
Christoph Lameter <clameter@sgi.com> wrote:
> On Fri, 14 Apr 2006, KAMEZAWA Hiroyuki wrote:
>
> > For hotremove (I stops it now..), we should fix this later (if we can do).
> > If new SWP_TYPE_MIGRATION swp entry can contain write protect bit,
> > hotremove can avoid copy-on-write but things will be more complicated.
>
> This is a known issue.I'd be glad if you could come up with a working
> scheme to solve this that is simple.
>
Hmm. I'll post sample implemntation on your patch, later.
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH 5/5] Swapless V2: Revise main migration logic
2006-04-14 1:33 ` Christoph Lameter
2006-04-14 1:40 ` KAMEZAWA Hiroyuki
@ 2006-04-14 2:34 ` KAMEZAWA Hiroyuki
2006-04-14 2:44 ` KAMEZAWA Hiroyuki
2006-04-14 16:48 ` [PATCH 5/5] Swapless V2: Revise main migration logic Christoph Lameter
1 sibling, 2 replies; 54+ messages in thread
From: KAMEZAWA Hiroyuki @ 2006-04-14 2:34 UTC (permalink / raw)
To: Christoph Lameter
Cc: akpm, hugh, linux-kernel, lee.schermerhorn, linux-mm, taka,
marcelo.tosatti
On Thu, 13 Apr 2006 18:33:07 -0700 (PDT)
Christoph Lameter <clameter@sgi.com> wrote:
> On Fri, 14 Apr 2006, KAMEZAWA Hiroyuki wrote:
>
> > For hotremove (I stops it now..), we should fix this later (if we can do).
> > If new SWP_TYPE_MIGRATION swp entry can contain write protect bit,
> > hotremove can avoid copy-on-write but things will be more complicated.
>
> This is a known issue.I'd be glad if you could come up with a working
> scheme to solve this that is simple.
>
This patch can fix copy-on-write problem.
I just compiled this patch (because I cannot use NUMA now.)
BTW, why MAX_SWAPFILES_SHIFT==5 now ? required by some arch ?
-Kame
==
This patch removes unnecessary copy-on-write after page migraiton.
This patch preserve writable(write-protection) bit in swap entry,
and make pte writable/protected when push it back.
Because I don't understand why MAX_SWAPFILES_SHIFT==5 now,
This patch uses one more swap type for migration.
(By this patch, available swp type goes down to 30.)
Signed-Off-By: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Index: Christoph-New-Migration/include/linux/swap.h
===================================================================
--- Christoph-New-Migration.orig/include/linux/swap.h 2006-04-14 11:13:38.000000000 +0900
+++ Christoph-New-Migration/include/linux/swap.h 2006-04-14 11:13:55.000000000 +0900
@@ -33,8 +33,11 @@
#define MAX_SWAPFILES (1 << MAX_SWAPFILES_SHIFT)
#else
/* Use last entry for page migration swap entries */
-#define MAX_SWAPFILES ((1 << MAX_SWAPFILES_SHIFT)-1)
-#define SWP_TYPE_MIGRATION MAX_SWAPFILES
+#define MAX_SWAPFILES ((1 << MAX_SWAPFILES_SHIFT)-2)
+/* write protected page under migration*/
+#define SWP_TYPE_MIGRATION_WP (MAX_SWAPFILES - 1)
+/* write enabled migration type */
+#define SWP_TYPE_MIGRATION_WE (MAX_SWAPFILES)
#endif
/*
Index: Christoph-New-Migration/include/linux/swapops.h
===================================================================
--- Christoph-New-Migration.orig/include/linux/swapops.h 2006-04-14 11:13:38.000000000 +0900
+++ Christoph-New-Migration/include/linux/swapops.h 2006-04-14 11:13:55.000000000 +0900
@@ -69,17 +69,32 @@
}
#ifdef CONFIG_MIGRATION
-static inline swp_entry_t make_migration_entry(struct page *page)
+static inline swp_entry_t make_migration_entry(struct page *page, int writable)
{
BUG_ON(!PageLocked(page));
- return swp_entry(SWP_TYPE_MIGRATION, page_to_pfn(page));
+ if (writable)
+ return swp_entry(SWP_TYPE_MIGRATION_WE, page_to_pfn(page));
+ else
+ return swp_entry(SWP_TYPE_MIGRATION_WP, page_to_pfn(page));
}
static inline int is_migration_entry(swp_entry_t entry)
{
- return swp_type(entry) == SWP_TYPE_MIGRATION;
+ return (swp_type(entry) == SWP_TYPE_MIGRATION_WP) ||
+ (swp_type(entry) == SWP_TYPE_MIGRATION_WE);
}
+static inline int is_migration_entry_wp(swp_entry_t entry)
+{
+ return (swp_type(entry) == SWP_TYPE_MIGRATION_WP);
+}
+
+static inline int is_migration_entry_we(swp_entry_t entry)
+{
+ return (swp_type(entry) == SWP_TYPE_MIGRATION_WE);
+}
+
+
static inline struct page *migration_entry_to_page(swp_entry_t entry)
{
struct page *p = pfn_to_page(swp_offset(entry));
Index: Christoph-New-Migration/mm/migrate.c
===================================================================
--- Christoph-New-Migration.orig/mm/migrate.c 2006-04-14 11:13:49.000000000 +0900
+++ Christoph-New-Migration/mm/migrate.c 2006-04-14 11:13:55.000000000 +0900
@@ -167,7 +167,11 @@
inc_mm_counter(mm, anon_rss);
get_page(new);
- set_pte_at(mm, addr, ptep, pte_mkold(mk_pte(new, vma->vm_page_prot)));
+ pte = pte_mkold(mk_pte(new, vma->vm_page_prot));
+ if (is_migration_entry_we(entry)) {
+ pte = pte_mkwrite(pte);
+ }
+ set_pte_at(mm, addr, ptep, pte);
page_add_anon_rmap(new, vma, addr);
out:
pte_unmap_unlock(pte, ptl);
Index: Christoph-New-Migration/mm/rmap.c
===================================================================
--- Christoph-New-Migration.orig/mm/rmap.c 2006-04-14 11:13:45.000000000 +0900
+++ Christoph-New-Migration/mm/rmap.c 2006-04-14 11:13:55.000000000 +0900
@@ -602,7 +602,10 @@
* pte is removed and then restart fault handling.
*/
BUG_ON(!migration);
- entry = make_migration_entry(page);
+ if (pte_write(pteval))
+ entry = make_migration_entry(page, 1);
+ else
+ entry = make_migration_entry(page, 0);
}
set_pte_at(mm, address, pte, swp_entry_to_pte(entry));
BUG_ON(pte_file(*pte));
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 54+ messages in thread* Re: [PATCH 5/5] Swapless V2: Revise main migration logic
2006-04-14 2:34 ` KAMEZAWA Hiroyuki
@ 2006-04-14 2:44 ` KAMEZAWA Hiroyuki
2006-04-14 17:29 ` Preserve write permissions in migration entries Christoph Lameter
2006-04-14 16:48 ` [PATCH 5/5] Swapless V2: Revise main migration logic Christoph Lameter
1 sibling, 1 reply; 54+ messages in thread
From: KAMEZAWA Hiroyuki @ 2006-04-14 2:44 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: clameter, akpm, hugh, linux-kernel, lee.schermerhorn, linux-mm,
taka, marcelo.tosatti
soryy...This is wrong.
On Fri, 14 Apr 2006 11:34:55 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> /* Use last entry for page migration swap entries */
> -#define MAX_SWAPFILES ((1 << MAX_SWAPFILES_SHIFT)-1)
> -#define SWP_TYPE_MIGRATION MAX_SWAPFILES
> +#define MAX_SWAPFILES ((1 << MAX_SWAPFILES_SHIFT)-2)
> +/* write protected page under migration*/
> +#define SWP_TYPE_MIGRATION_WP (MAX_SWAPFILES - 1)
> +/* write enabled migration type */
> +#define SWP_TYPE_MIGRATION_WE (MAX_SWAPFILES)
> #endif
maybe I need one more eye..
This is fix.
-Kame
Index: Christoph-New-Migration/include/linux/swap.h
===================================================================
--- Christoph-New-Migration.orig/include/linux/swap.h 2006-04-14 11:13:55.000000000 +0900
+++ Christoph-New-Migration/include/linux/swap.h 2006-04-14 11:40:28.000000000 +0900
@@ -35,9 +35,9 @@
/* Use last entry for page migration swap entries */
#define MAX_SWAPFILES ((1 << MAX_SWAPFILES_SHIFT)-2)
/* write protected page under migration*/
-#define SWP_TYPE_MIGRATION_WP (MAX_SWAPFILES - 1)
+#define SWP_TYPE_MIGRATION_WP (MAX_SWAPFILES)
/* write enabled migration type */
-#define SWP_TYPE_MIGRATION_WE (MAX_SWAPFILES)
+#define SWP_TYPE_MIGRATION_WE (MAX_SWAPFILES + 1)
#endif
/*
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 54+ messages in thread
* Preserve write permissions in migration entries
2006-04-14 2:44 ` KAMEZAWA Hiroyuki
@ 2006-04-14 17:29 ` Christoph Lameter
0 siblings, 0 replies; 54+ messages in thread
From: Christoph Lameter @ 2006-04-14 17:29 UTC (permalink / raw)
To: akpm
Cc: KAMEZAWA Hiroyuki, hugh, linux-kernel, lee.schermerhorn, linux-mm,
taka, marcelo.tosatti
I cleaned up the patch a bit and ran some tests.
This patch implements the preservation of the write permissions.
The preservation of write permission avoids unnecessary COW operations
following page migration.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-Off-By: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Index: linux-2.6.17-rc1-mm2/include/linux/swap.h
===================================================================
--- linux-2.6.17-rc1-mm2.orig/include/linux/swap.h 2006-04-14 09:08:42.000000000 -0700
+++ linux-2.6.17-rc1-mm2/include/linux/swap.h 2006-04-14 10:01:27.000000000 -0700
@@ -33,8 +33,9 @@ static inline int current_is_kswapd(void
#define MAX_SWAPFILES (1 << MAX_SWAPFILES_SHIFT)
#else
/* Use last entry for page migration swap entries */
-#define MAX_SWAPFILES ((1 << MAX_SWAPFILES_SHIFT)-1)
-#define SWP_TYPE_MIGRATION MAX_SWAPFILES
+#define MAX_SWAPFILES ((1 << MAX_SWAPFILES_SHIFT)-2)
+#define SWP_MIGRATION_READ MAX_SWAPFILES
+#define SWP_MIGRATION_WRITE (MAX_SWAPFILES + 1)
#endif
/*
Index: linux-2.6.17-rc1-mm2/include/linux/swapops.h
===================================================================
--- linux-2.6.17-rc1-mm2.orig/include/linux/swapops.h 2006-04-14 09:55:25.000000000 -0700
+++ linux-2.6.17-rc1-mm2/include/linux/swapops.h 2006-04-14 10:06:43.000000000 -0700
@@ -69,15 +69,22 @@ static inline pte_t swp_entry_to_pte(swp
}
#ifdef CONFIG_MIGRATION
-static inline swp_entry_t make_migration_entry(struct page *page)
+static inline swp_entry_t make_migration_entry(struct page *page, int write)
{
BUG_ON(!PageLocked(page));
- return swp_entry(SWP_TYPE_MIGRATION, page_to_pfn(page));
+ return swp_entry(write ? SWP_MIGRATION_WRITE : SWP_MIGRATION_READ,
+ page_to_pfn(page));
}
static inline int is_migration_entry(swp_entry_t entry)
{
- return unlikely(swp_type(entry) == SWP_TYPE_MIGRATION);
+ return unlikely(swp_type(entry) == SWP_MIGRATION_READ ||
+ swp_type(entry) == SWP_MIGRATION_WRITE);
+}
+
+static inline int is_write_migration_entry(swp_entry_t entry)
+{
+ return swp_type(entry) == SWP_MIGRATION_WRITE;
}
static inline struct page *migration_entry_to_page(swp_entry_t entry)
Index: linux-2.6.17-rc1-mm2/mm/rmap.c
===================================================================
--- linux-2.6.17-rc1-mm2.orig/mm/rmap.c 2006-04-13 16:43:24.000000000 -0700
+++ linux-2.6.17-rc1-mm2/mm/rmap.c 2006-04-14 10:04:27.000000000 -0700
@@ -602,7 +602,7 @@ static int try_to_unmap_one(struct page
* pte is removed and then restart fault handling.
*/
BUG_ON(!migration);
- entry = make_migration_entry(page);
+ entry = make_migration_entry(page, pte_write(pteval));
}
set_pte_at(mm, address, pte, swp_entry_to_pte(entry));
BUG_ON(pte_file(*pte));
Index: linux-2.6.17-rc1-mm2/mm/migrate.c
===================================================================
--- linux-2.6.17-rc1-mm2.orig/mm/migrate.c 2006-04-13 16:44:07.000000000 -0700
+++ linux-2.6.17-rc1-mm2/mm/migrate.c 2006-04-14 10:06:01.000000000 -0700
@@ -167,7 +167,10 @@ static void remove_migration_pte(struct
inc_mm_counter(mm, anon_rss);
get_page(new);
- set_pte_at(mm, addr, ptep, pte_mkold(mk_pte(new, vma->vm_page_prot)));
+ pte = pte_mkold(mk_pte(new, vma->vm_page_prot));
+ if (is_write_migration_entry(entry))
+ pte = pte_mkwrite(pte);
+ set_pte_at(mm, addr, ptep, pte);
page_add_anon_rmap(new, vma, addr);
out:
pte_unmap_unlock(pte, ptl);
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH 5/5] Swapless V2: Revise main migration logic
2006-04-14 2:34 ` KAMEZAWA Hiroyuki
2006-04-14 2:44 ` KAMEZAWA Hiroyuki
@ 2006-04-14 16:48 ` Christoph Lameter
2006-04-15 0:06 ` KAMEZAWA Hiroyuki
1 sibling, 1 reply; 54+ messages in thread
From: Christoph Lameter @ 2006-04-14 16:48 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: akpm, hugh, linux-kernel, lee.schermerhorn, linux-mm, taka,
marcelo.tosatti
On Fri, 14 Apr 2006, KAMEZAWA Hiroyuki wrote:
> I just compiled this patch (because I cannot use NUMA now.)
I can give this a spin later today.
>
> BTW, why MAX_SWAPFILES_SHIFT==5 now ? required by some arch ?
No idea.
> +/* write protected page under migration*/
> +#define SWP_TYPE_MIGRATION_WP (MAX_SWAPFILES - 1)
> +/* write enabled migration type */
> +#define SWP_TYPE_MIGRATION_WE (MAX_SWAPFILES)
Could we call this SWP_TYPE_MIGRATION_READ / WRITE?
> + pte = pte_mkold(mk_pte(new, vma->vm_page_prot));
> + if (is_migration_entry_we(entry)) {
is_write_migration_entry?
> + pte = pte_mkwrite(pte);
> + }
No {} needed.
> - entry = make_migration_entry(page);
> + if (pte_write(pteval))
> + entry = make_migration_entry(page, 1);
> + else
> + entry = make_migration_entry(page, 0);
> }
entry = make_migration_entry(page, pte_write(pteval))
?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 54+ messages in thread* Re: [PATCH 5/5] Swapless V2: Revise main migration logic
2006-04-14 16:48 ` [PATCH 5/5] Swapless V2: Revise main migration logic Christoph Lameter
@ 2006-04-15 0:06 ` KAMEZAWA Hiroyuki
2006-04-15 17:41 ` Christoph Lameter
0 siblings, 1 reply; 54+ messages in thread
From: KAMEZAWA Hiroyuki @ 2006-04-15 0:06 UTC (permalink / raw)
To: Christoph Lameter
Cc: akpm, hugh, linux-kernel, lee.schermerhorn, linux-mm, taka,
marcelo.tosatti
Hi,
On Fri, 14 Apr 2006 09:48:25 -0700 (PDT)
Christoph Lameter <clameter@sgi.com> wrote:
> > +/* write protected page under migration*/
> > +#define SWP_TYPE_MIGRATION_WP (MAX_SWAPFILES - 1)
> > +/* write enabled migration type */
> > +#define SWP_TYPE_MIGRATION_WE (MAX_SWAPFILES)
>
> Could we call this SWP_TYPE_MIGRATION_READ / WRITE?
>
ok, it looks better.
> > + pte = pte_mkold(mk_pte(new, vma->vm_page_prot));
> > + if (is_migration_entry_we(entry)) {
> is_write_migration_entry?
>
> > + pte = pte_mkwrite(pte);
> > + }
>
> No {} needed.
>
> > - entry = make_migration_entry(page);
> > + if (pte_write(pteval))
> > + entry = make_migration_entry(page, 1);
> > + else
> > + entry = make_migration_entry(page, 0);
> > }
>
> entry = make_migration_entry(page, pte_write(pteval))
>
> ?
Ah, O.K.
Thanks,
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 54+ messages in thread* Re: [PATCH 5/5] Swapless V2: Revise main migration logic
2006-04-15 0:06 ` KAMEZAWA Hiroyuki
@ 2006-04-15 17:41 ` Christoph Lameter
2006-04-17 0:18 ` KAMEZAWA Hiroyuki
0 siblings, 1 reply; 54+ messages in thread
From: Christoph Lameter @ 2006-04-15 17:41 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: akpm, hugh, linux-kernel, lee.schermerhorn, linux-mm, taka,
marcelo.tosatti
Note that there is an issue with your approach. If a migration entry is
copied during fork then SWP_MIGRATION_WRITE must become SWP_MIGRATION_READ
for some cases. Would you look into fixing this?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH 5/5] Swapless V2: Revise main migration logic
2006-04-15 17:41 ` Christoph Lameter
@ 2006-04-17 0:18 ` KAMEZAWA Hiroyuki
2006-04-17 17:00 ` Christoph Lameter
0 siblings, 1 reply; 54+ messages in thread
From: KAMEZAWA Hiroyuki @ 2006-04-17 0:18 UTC (permalink / raw)
To: Christoph Lameter
Cc: akpm, hugh, linux-kernel, lee.schermerhorn, linux-mm, taka,
marcelo.tosatti
On Sat, 15 Apr 2006 10:41:59 -0700 (PDT)
Christoph Lameter <clameter@sgi.com> wrote:
> Note that there is an issue with your approach. If a migration entry is
> copied during fork then SWP_MIGRATION_WRITE must become SWP_MIGRATION_READ
> for some cases. Would you look into fixing this?
>
Thank you for pointing out the issue.
In my understanding, copy_page_range() is used at fork().
This finally calls copy_one_pte() and copies ptes one by one.
Maybe, I'll do like this.
==
438 if (unlikely(!pte_present(pte)) {
439 if (!pte_file(pte)) {
440 swap_duplicate(pte_to_swp_entry(pte));
entry = pte_to_swp_entry(pte);
#ifdef CONFIG_MIGRATION
if (is_migration_entry(entry)) {
......always copy as MIGRATION_READ.
}
#endif
441 /* make sure dst_mm is on swapoff's mmlist. */
442 if (unlikely(list_empty(&dst_mm->mmlist))) {
443 spin_lock(&mmlist_lock);
444 if (list_empty(&dst_mm->mmlist))
445 list_add(&dst_mm->mmlist,
446 &src_mm->mmlist);
447 spin_unlock(&mmlist_lock);
448 }
449 }
450 goto out_set_pte;
451 }
==
Thanks,
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 54+ messages in thread* Re: [PATCH 5/5] Swapless V2: Revise main migration logic
2006-04-17 0:18 ` KAMEZAWA Hiroyuki
@ 2006-04-17 17:00 ` Christoph Lameter
2006-04-18 0:04 ` KAMEZAWA Hiroyuki
0 siblings, 1 reply; 54+ messages in thread
From: Christoph Lameter @ 2006-04-17 17:00 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: akpm, hugh, linux-kernel, lee.schermerhorn, linux-mm, taka,
marcelo.tosatti
On Mon, 17 Apr 2006, KAMEZAWA Hiroyuki wrote:
> > Note that there is an issue with your approach. If a migration entry is
> > copied during fork then SWP_MIGRATION_WRITE must become SWP_MIGRATION_READ
> > for some cases. Would you look into fixing this?
> Thank you for pointing out the issue.
>
> In my understanding, copy_page_range() is used at fork().
> This finally calls copy_one_pte() and copies ptes one by one.
Right this is one spot but the ptes in the original mm must also be marked
read. Are there any additional races?
> Maybe, I'll do like this.
> ==
> 438 if (unlikely(!pte_present(pte)) {
> 439 if (!pte_file(pte)) {
> 440 swap_duplicate(pte_to_swp_entry(pte));
> entry = pte_to_swp_entry(pte);
> #ifdef CONFIG_MIGRATION
> if (is_migration_entry(entry)) {
> ......always copy as MIGRATION_READ.
> }
> #endif
> 441 /* make sure dst_mm is on swapoff's mmlist. */
Looks okay for this one location.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 54+ messages in thread* Re: [PATCH 5/5] Swapless V2: Revise main migration logic
2006-04-17 17:00 ` Christoph Lameter
@ 2006-04-18 0:04 ` KAMEZAWA Hiroyuki
2006-04-18 0:27 ` Christoph Lameter
0 siblings, 1 reply; 54+ messages in thread
From: KAMEZAWA Hiroyuki @ 2006-04-18 0:04 UTC (permalink / raw)
To: Christoph Lameter
Cc: akpm, hugh, linux-kernel, lee.schermerhorn, linux-mm, taka,
marcelo.tosatti
On Mon, 17 Apr 2006 10:00:02 -0700 (PDT)
Christoph Lameter <clameter@sgi.com> wrote:
> On Mon, 17 Apr 2006, KAMEZAWA Hiroyuki wrote:
>
> > > Note that there is an issue with your approach. If a migration entry is
> > > copied during fork then SWP_MIGRATION_WRITE must become SWP_MIGRATION_READ
> > > for some cases. Would you look into fixing this?
> > Thank you for pointing out the issue.
> >
> > In my understanding, copy_page_range() is used at fork().
> > This finally calls copy_one_pte() and copies ptes one by one.
>
> Right this is one spot but the ptes in the original mm must also be marked
> read. Are there any additional races?
>
Ah, yes. you are right.
> > Maybe, I'll do like this.
> > ==
> > 438 if (unlikely(!pte_present(pte)) {
> > 439 if (!pte_file(pte)) {
> > 440 swap_duplicate(pte_to_swp_entry(pte));
> > entry = pte_to_swp_entry(pte);
> > #ifdef CONFIG_MIGRATION
> > if (is_migration_entry(entry)) {
> > ......always copy as MIGRATION_READ.
> > }
> > #endif
> > 441 /* make sure dst_mm is on swapoff's mmlist. */
>
> Looks okay for this one location.
>
Then,
if (is_migration_entry(entry)) {
change_to_read_migration_entry(entry);
copy_entry(entry);
}
is sane.
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 54+ messages in thread* Re: [PATCH 5/5] Swapless V2: Revise main migration logic
2006-04-18 0:04 ` KAMEZAWA Hiroyuki
@ 2006-04-18 0:27 ` Christoph Lameter
2006-04-18 0:42 ` KAMEZAWA Hiroyuki
0 siblings, 1 reply; 54+ messages in thread
From: Christoph Lameter @ 2006-04-18 0:27 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: akpm, hugh, linux-kernel, lee.schermerhorn, linux-mm, taka,
marcelo.tosatti
On Tue, 18 Apr 2006, KAMEZAWA Hiroyuki wrote:
> Then,
>
> if (is_migration_entry(entry)) {
> change_to_read_migration_entry(entry);
> copy_entry(entry);
> }
>
> is sane.
Hmmm... Looks like I need to do the patch. Is the following okay? This
will also only work on cow mappings.
Read/Write migration entries: Implement correct behavior in copy_one_pte
Migration entries with write permission must become SWP_MIGRATION_READ
entries if a COW mapping is processed. The migration entries from which
the copy is being made must also become SWP_MIGRATION_READ. This mimicks
the copying of pte for an anonymous page.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Index: linux-2.6.17-rc1-mm2/mm/memory.c
===================================================================
--- linux-2.6.17-rc1-mm2.orig/mm/memory.c 2006-04-17 16:23:50.000000000 -0700
+++ linux-2.6.17-rc1-mm2/mm/memory.c 2006-04-17 17:25:50.000000000 -0700
@@ -434,7 +434,9 @@ copy_one_pte(struct mm_struct *dst_mm, s
/* pte contains position in swap or file, so copy. */
if (unlikely(!pte_present(pte))) {
if (!pte_file(pte)) {
- swap_duplicate(pte_to_swp_entry(pte));
+ swp_entry_t entry = pte_to_swp_entry(pte);
+
+ swap_duplicate(entry);
/* make sure dst_mm is on swapoff's mmlist. */
if (unlikely(list_empty(&dst_mm->mmlist))) {
spin_lock(&mmlist_lock);
@@ -443,6 +445,19 @@ copy_one_pte(struct mm_struct *dst_mm, s
&src_mm->mmlist);
spin_unlock(&mmlist_lock);
}
+ if (is_migration_entry(entry) &&
+ is_cow_mapping(vm_flags)) {
+ page = migration_entry_to_page(entry);
+
+ /*
+ * COW mappings require pages in both parent
+ * and child to be set to read.
+ */
+ entry = make_migration_entry(page,
+ ` SWP_MIGRATION_READ);
+ pte = swp_entry_to_pte(entry);
+ set_pte_at(src_mm, addr, src_pte, pte);
+ }
}
goto out_set_pte;
}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 54+ messages in thread* Re: [PATCH 5/5] Swapless V2: Revise main migration logic
2006-04-18 0:27 ` Christoph Lameter
@ 2006-04-18 0:42 ` KAMEZAWA Hiroyuki
2006-04-18 1:57 ` Christoph Lameter
0 siblings, 1 reply; 54+ messages in thread
From: KAMEZAWA Hiroyuki @ 2006-04-18 0:42 UTC (permalink / raw)
To: Christoph Lameter
Cc: akpm, hugh, linux-kernel, lee.schermerhorn, linux-mm, taka,
marcelo.tosatti
On Mon, 17 Apr 2006 17:27:48 -0700 (PDT)
Christoph Lameter <clameter@sgi.com> wrote:
> On Tue, 18 Apr 2006, KAMEZAWA Hiroyuki wrote:
>
> > Then,
> >
> > if (is_migration_entry(entry)) {
> > change_to_read_migration_entry(entry);
> > copy_entry(entry);
> > }
> >
> > is sane.
>
> Hmmm... Looks like I need to do the patch. Is the following okay? This
> will also only work on cow mappings.
>
I think okay.
BTW, when copying mm, mm->mmap_sem is held. Is mm->mmap_sem is not held while
page migraion now ? I'm sorry I can't catch up all changes.
or Is this needed for lazy migration (migration-on-fault) ?
-Kame
>
>
> Read/Write migration entries: Implement correct behavior in copy_one_pte
>
> Migration entries with write permission must become SWP_MIGRATION_READ
> entries if a COW mapping is processed. The migration entries from which
> the copy is being made must also become SWP_MIGRATION_READ. This mimicks
> the copying of pte for an anonymous page.
>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Signed-off-by: Christoph Lameter <clameter@sgi.com>
>
> Index: linux-2.6.17-rc1-mm2/mm/memory.c
> ===================================================================
> --- linux-2.6.17-rc1-mm2.orig/mm/memory.c 2006-04-17 16:23:50.000000000 -0700
> +++ linux-2.6.17-rc1-mm2/mm/memory.c 2006-04-17 17:25:50.000000000 -0700
> @@ -434,7 +434,9 @@ copy_one_pte(struct mm_struct *dst_mm, s
> /* pte contains position in swap or file, so copy. */
> if (unlikely(!pte_present(pte))) {
> if (!pte_file(pte)) {
> - swap_duplicate(pte_to_swp_entry(pte));
> + swp_entry_t entry = pte_to_swp_entry(pte);
> +
> + swap_duplicate(entry);
> /* make sure dst_mm is on swapoff's mmlist. */
> if (unlikely(list_empty(&dst_mm->mmlist))) {
> spin_lock(&mmlist_lock);
> @@ -443,6 +445,19 @@ copy_one_pte(struct mm_struct *dst_mm, s
> &src_mm->mmlist);
> spin_unlock(&mmlist_lock);
> }
> + if (is_migration_entry(entry) &&
> + is_cow_mapping(vm_flags)) {
> + page = migration_entry_to_page(entry);
> +
> + /*
> + * COW mappings require pages in both parent
> + * and child to be set to read.
> + */
> + entry = make_migration_entry(page,
> + ` SWP_MIGRATION_READ);
> + pte = swp_entry_to_pte(entry);
> + set_pte_at(src_mm, addr, src_pte, pte);
> + }
> }
> goto out_set_pte;
> }
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 54+ messages in thread* Re: [PATCH 5/5] Swapless V2: Revise main migration logic
2006-04-18 0:42 ` KAMEZAWA Hiroyuki
@ 2006-04-18 1:57 ` Christoph Lameter
2006-04-18 3:00 ` KAMEZAWA Hiroyuki
0 siblings, 1 reply; 54+ messages in thread
From: Christoph Lameter @ 2006-04-18 1:57 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: akpm, hugh, linux-kernel, lee.schermerhorn, linux-mm, taka,
marcelo.tosatti
On Tue, 18 Apr 2006, KAMEZAWA Hiroyuki wrote:
> BTW, when copying mm, mm->mmap_sem is held. Is mm->mmap_sem is not held while
> page migraion now ? I'm sorry I can't catch up all changes.
> or Is this needed for lazy migration (migration-on-fault) ?
mmap_sem must be held during page migration due to the way we retrieve the
anonymous vma.
I think you would want to get rid of that requirement for the hotplug
remove. But how do we reliably get to the anon_vma of the page without
mmap_sem?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH 5/5] Swapless V2: Revise main migration logic
2006-04-18 1:57 ` Christoph Lameter
@ 2006-04-18 3:00 ` KAMEZAWA Hiroyuki
2006-04-18 3:16 ` Christoph Lameter
0 siblings, 1 reply; 54+ messages in thread
From: KAMEZAWA Hiroyuki @ 2006-04-18 3:00 UTC (permalink / raw)
To: Christoph Lameter
Cc: akpm, hugh, linux-kernel, lee.schermerhorn, linux-mm, taka,
marcelo.tosatti
On Mon, 17 Apr 2006 18:57:40 -0700 (PDT)
Christoph Lameter <clameter@sgi.com> wrote:
> On Tue, 18 Apr 2006, KAMEZAWA Hiroyuki wrote:
>
> > BTW, when copying mm, mm->mmap_sem is held. Is mm->mmap_sem is not held while
> > page migraion now ? I'm sorry I can't catch up all changes.
> > or Is this needed for lazy migration (migration-on-fault) ?
>
> mmap_sem must be held during page migration due to the way we retrieve the
> anonymous vma.
>
> I think you would want to get rid of that requirement for the hotplug
> remove.
yes.
> But how do we reliably get to the anon_vma of the page without mmap_sem?
>
>
I think following patch will help. but this increases complexity...
-Kame
=
hold anon_vma->lock under migration.
While migration, page_mapcount(page) goes down to 0 and page->mapping is valid.
This breaks assumptions around page_mapcount() and page->mapping.
(See rmap.c, page_remove_rmap())
If mmap->sem is held while migration, there is no problem. But if mmap->sem is
not held, this is a race.
This patch locks anon_vma under migration.
Signed-Off-By: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Index: Christoph-NewMigrationV2/mm/migrate.c
===================================================================
--- Christoph-NewMigrationV2.orig/mm/migrate.c
+++ Christoph-NewMigrationV2/mm/migrate.c
@@ -178,6 +178,20 @@ out:
}
/*
+ * When mmap->sem is not held, we have to guarantee anon_vma is not freed.
+ */
+static void migrate_lock_anon_vma(struct page *page)
+{
+ unsigned long mapping;
+ struct anon_vma *anon_vma;
+ struct vm_area_struct *vma;
+
+ if (PageAnon(page))
+ page_lock_anon_vma(page);
+ /* remove migration ptes will unlock */
+}
+
+/*
* Get rid of all migration entries and replace them by
* references to the indicated page.
*
@@ -196,10 +210,9 @@ static void remove_migration_ptes(struct
return;
/*
- * We hold the mmap_sem lock. So no need to call page_lock_anon_vma.
+ * anon_vma is preserved and locked while migration.
*/
anon_vma = (struct anon_vma *) (mapping - PAGE_MAPPING_ANON);
- spin_lock(&anon_vma->lock);
list_for_each_entry(vma, &anon_vma->head, anon_vma_node)
remove_migration_pte(vma, page_address_in_vma(new, vma),
@@ -371,6 +384,7 @@ int migrate_page(struct page *newpage, s
BUG_ON(PageWriteback(page)); /* Writeback must be complete */
+ migrate_lock_anon_vma(page);
rc = migrate_page_remove_references(newpage, page,
page_mapping(page) ? 2 : 1);
@@ -378,7 +392,6 @@ int migrate_page(struct page *newpage, s
remove_migration_ptes(page, page);
return rc;
}
-
migrate_page_copy(newpage, page);
remove_migration_ptes(page, newpage);
return 0;
Index: Christoph-NewMigrationV2/mm/rmap.c
===================================================================
--- Christoph-NewMigrationV2.orig/mm/rmap.c
+++ Christoph-NewMigrationV2/mm/rmap.c
@@ -160,7 +160,7 @@ void anon_vma_unlink(struct vm_area_stru
empty = list_empty(&anon_vma->head);
spin_unlock(&anon_vma->lock);
- if (empty)
+ if (empty && !anon_vma->async_refernece)
anon_vma_free(anon_vma);
}
@@ -717,7 +717,13 @@ static int try_to_unmap_anon(struct page
struct vm_area_struct *vma;
int ret = SWAP_AGAIN;
- anon_vma = page_lock_anon_vma(page);
+ if (migration) { /* anon_vma->lock is held under migration */
+ unsigned long mapping;
+ mapping = (unsigned long)page->mapping - PAGE_MAPPING_ANON;
+ anon_vma = (struct anon_vma *)mapping;
+ } else {
+ anon_vma = page_lock_anon_vma(page);
+ }
if (!anon_vma)
return ret;
@@ -726,7 +732,8 @@ static int try_to_unmap_anon(struct page
if (ret == SWAP_FAIL || !page_mapped(page))
break;
}
- spin_unlock(&anon_vma->lock);
+ if (!migration)
+ spin_unlock(&anon_vma->lock);
return ret;
}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 54+ messages in thread* Re: [PATCH 5/5] Swapless V2: Revise main migration logic
2006-04-18 3:00 ` KAMEZAWA Hiroyuki
@ 2006-04-18 3:16 ` Christoph Lameter
2006-04-18 3:32 ` KAMEZAWA Hiroyuki
0 siblings, 1 reply; 54+ messages in thread
From: Christoph Lameter @ 2006-04-18 3:16 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: akpm, hugh, linux-kernel, lee.schermerhorn, linux-mm, taka,
marcelo.tosatti
On Tue, 18 Apr 2006, KAMEZAWA Hiroyuki wrote:
> I think following patch will help. but this increases complexity...
Hmm... So the idea is to lock the anon vma before removing the ptes and
keep it until we are finished migrating. I like it! That would also reduce
the locking overhead.
> /*
> + * When mmap->sem is not held, we have to guarantee anon_vma is not freed.
> + */
> +static void migrate_lock_anon_vma(struct page *page)
> +{
> + unsigned long mapping;
> + struct anon_vma *anon_vma;
> + struct vm_area_struct *vma;
> +
> + if (PageAnon(page))
> + page_lock_anon_vma(page);
> + /* remove migration ptes will unlock */
> +}
We need a whole function for two statements?
> */
> anon_vma = (struct anon_vma *) (mapping - PAGE_MAPPING_ANON);
> - spin_lock(&anon_vma->lock);
Maybe we better pass the anon_vma as a parameter?
> +++ Christoph-NewMigrationV2/mm/rmap.c
> @@ -160,7 +160,7 @@ void anon_vma_unlink(struct vm_area_stru
> empty = list_empty(&anon_vma->head);
> spin_unlock(&anon_vma->lock);
>
> - if (empty)
> + if (empty && !anon_vma->async_refernece)
> anon_vma_free(anon_vma);
> }
async_reference? What is this for? This does not exist in Linus'
tree.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 54+ messages in thread* Re: [PATCH 5/5] Swapless V2: Revise main migration logic
2006-04-18 3:16 ` Christoph Lameter
@ 2006-04-18 3:32 ` KAMEZAWA Hiroyuki
2006-04-18 6:58 ` Christoph Lameter
0 siblings, 1 reply; 54+ messages in thread
From: KAMEZAWA Hiroyuki @ 2006-04-18 3:32 UTC (permalink / raw)
To: Christoph Lameter
Cc: akpm, hugh, linux-kernel, lee.schermerhorn, linux-mm, taka,
marcelo.tosatti
On Mon, 17 Apr 2006 20:16:11 -0700 (PDT)
Christoph Lameter <clameter@sgi.com> wrote:
> On Tue, 18 Apr 2006, KAMEZAWA Hiroyuki wrote:
>
> > I think following patch will help. but this increases complexity...
>
> Hmm... So the idea is to lock the anon vma before removing the ptes and
> keep it until we are finished migrating. I like it! That would also reduce
> the locking overhead.
>
> > /*
> > + * When mmap->sem is not held, we have to guarantee anon_vma is not freed.
> > + */
> > +static void migrate_lock_anon_vma(struct page *page)
> > +{
> > + unsigned long mapping;
> > + struct anon_vma *anon_vma;
> > + struct vm_area_struct *vma;
> > +
> > + if (PageAnon(page))
> > + page_lock_anon_vma(page);
> > + /* remove migration ptes will unlock */
> > +}
>
> We need a whole function for two statements?
>
Ah, ok, inlining is better.
> > */
> > anon_vma = (struct anon_vma *) (mapping - PAGE_MAPPING_ANON);
> > - spin_lock(&anon_vma->lock);
>
> Maybe we better pass the anon_vma as a parameter?
>
Agreed, it will improve total look.
> > +++ Christoph-NewMigrationV2/mm/rmap.c
> > @@ -160,7 +160,7 @@ void anon_vma_unlink(struct vm_area_stru
> > empty = list_empty(&anon_vma->head);
> > spin_unlock(&anon_vma->lock);
> >
> > - if (empty)
> > + if (empty && !anon_vma->async_refernece)
> > anon_vma_free(anon_vma);
> > }
>
> async_reference? What is this for? This does not exist in Linus'
> tree.
please ignore ;), this is trash of my old tree. sorry.
Here is updated one.
-Kame
hold anon_vma->lock under migration.
While migration, page_mapcount(page) goes down to 0 and page->mapping is valid.
This breaks assumptions around page_mapcount() and page->mapping.
(See rmap.c, page_remove_rmap())
If mmap->sem is held while migration, there is no problem. But if mmap->sem is
not held, this is a race.
This patch locks anon_vma under migration.
Signed-Off-By: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Index: Christoph-NewMigrationV2/mm/migrate.c
===================================================================
--- Christoph-NewMigrationV2.orig/mm/migrate.c
+++ Christoph-NewMigrationV2/mm/migrate.c
@@ -184,28 +184,18 @@ out:
* Must hold mmap_sem lock on at least one of the vmas containing
* the page so that the anon_vma cannot vanish.
*/
-static void remove_migration_ptes(struct page *old, struct page *new)
+static void remove_migration_ptes(struct anon_vma *anon_vma,
+ struct page *old, struct page *new)
{
- struct anon_vma *anon_vma;
struct vm_area_struct *vma;
unsigned long mapping;
- mapping = (unsigned long)new->mapping;
-
- if (!mapping || (mapping & PAGE_MAPPING_ANON) == 0)
+ if (!anon_vma)
return;
- /*
- * We hold the mmap_sem lock. So no need to call page_lock_anon_vma.
- */
- anon_vma = (struct anon_vma *) (mapping - PAGE_MAPPING_ANON);
- spin_lock(&anon_vma->lock);
-
list_for_each_entry(vma, &anon_vma->head, anon_vma_node)
remove_migration_pte(vma, page_address_in_vma(new, vma),
old, new);
-
- spin_unlock(&anon_vma->lock);
}
/*
@@ -368,20 +358,24 @@ EXPORT_SYMBOL(migrate_page_copy);
int migrate_page(struct page *newpage, struct page *page)
{
int rc;
-
+ struct anon_vma *anon_vma;
BUG_ON(PageWriteback(page)); /* Writeback must be complete */
-
+ if (PageAnon(page)) {
+ anon_vma = page_lock_anon_vma(page);
+ }
rc = migrate_page_remove_references(newpage, page,
page_mapping(page) ? 2 : 1);
if (rc) {
- remove_migration_ptes(page, page);
- return rc;
+ remove_migration_ptes(anon_vma, page, page);
+ goto unlock_out;
}
-
migrate_page_copy(newpage, page);
- remove_migration_ptes(page, newpage);
- return 0;
+ remove_migration_ptes(anon_vma, page, newpage);
+unlock_out:
+ if (anon_vma)
+ spin_unlock(&anon_vma->lock);
+ return rc;
}
EXPORT_SYMBOL(migrate_page);
Index: Christoph-NewMigrationV2/mm/rmap.c
===================================================================
--- Christoph-NewMigrationV2.orig/mm/rmap.c
+++ Christoph-NewMigrationV2/mm/rmap.c
@@ -717,7 +717,13 @@ static int try_to_unmap_anon(struct page
struct vm_area_struct *vma;
int ret = SWAP_AGAIN;
- anon_vma = page_lock_anon_vma(page);
+ if (migration) { /* anon_vma->lock is held under migration */
+ unsigned long mapping;
+ mapping = (unsigned long)page->mapping - PAGE_MAPPING_ANON;
+ anon_vma = (struct anon_vma *)mapping;
+ } else {
+ anon_vma = page_lock_anon_vma(page);
+ }
if (!anon_vma)
return ret;
@@ -726,7 +732,8 @@ static int try_to_unmap_anon(struct page
if (ret == SWAP_FAIL || !page_mapped(page))
break;
}
- spin_unlock(&anon_vma->lock);
+ if (!migration)
+ spin_unlock(&anon_vma->lock);
return ret;
}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 54+ messages in thread* Re: [PATCH 5/5] Swapless V2: Revise main migration logic
2006-04-18 3:32 ` KAMEZAWA Hiroyuki
@ 2006-04-18 6:58 ` Christoph Lameter
2006-04-18 8:05 ` KAMEZAWA Hiroyuki
0 siblings, 1 reply; 54+ messages in thread
From: Christoph Lameter @ 2006-04-18 6:58 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: akpm, hugh, linux-kernel, lee.schermerhorn, linux-mm, taka,
marcelo.tosatti
Hmmm... Good ideas. I think it could be much simpler like the following
patch.
However, the problem here is how to know that we really took the anon_vma
lock and what to do about a page being unmmapped while migrating. This
could cause the anon_vma not to be unlocked.
I guess we would need to have try_to_unmap return some state information.
I also toyed around with writing an "install_migration_ptes" function
which would be called only for anonymous pages and would reduce the
changes to try_to_unmap(). However, that also got too complicated.
Index: linux-2.6.17-rc1-mm2/mm/migrate.c
===================================================================
--- linux-2.6.17-rc1-mm2.orig/mm/migrate.c 2006-04-17 17:21:08.000000000 -0700
+++ linux-2.6.17-rc1-mm2/mm/migrate.c 2006-04-17 23:53:32.000000000 -0700
@@ -236,7 +233,6 @@ static void remove_migration_ptes(struct
* We hold the mmap_sem lock. So no need to call page_lock_anon_vma.
*/
anon_vma = (struct anon_vma *) (mapping - PAGE_MAPPING_ANON);
- spin_lock(&anon_vma->lock);
list_for_each_entry(vma, &anon_vma->head, anon_vma_node)
remove_migration_pte(vma, page_address_in_vma(new, vma),
Index: linux-2.6.17-rc1-mm2/mm/rmap.c
===================================================================
--- linux-2.6.17-rc1-mm2.orig/mm/rmap.c 2006-04-17 17:21:08.000000000 -0700
+++ linux-2.6.17-rc1-mm2/mm/rmap.c 2006-04-17 23:53:39.000000000 -0700
@@ -723,7 +723,8 @@ static int try_to_unmap_anon(struct page
if (ret == SWAP_FAIL || !page_mapped(page))
break;
}
- spin_unlock(&anon_vma->lock);
+ if (!migration)
+ spin_unlock(&anon_vma->lock);
return ret;
}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH 5/5] Swapless V2: Revise main migration logic
2006-04-18 6:58 ` Christoph Lameter
@ 2006-04-18 8:05 ` KAMEZAWA Hiroyuki
2006-04-18 8:27 ` Christoph Lameter
0 siblings, 1 reply; 54+ messages in thread
From: KAMEZAWA Hiroyuki @ 2006-04-18 8:05 UTC (permalink / raw)
To: Christoph Lameter
Cc: akpm, hugh, linux-kernel, lee.schermerhorn, linux-mm, taka,
marcelo.tosatti
On Mon, 17 Apr 2006 23:58:41 -0700 (PDT)
Christoph Lameter <clameter@sgi.com> wrote:
> Hmmm... Good ideas. I think it could be much simpler like the following
> patch.
>
> However, the problem here is how to know that we really took the anon_vma
> lock and what to do about a page being unmmapped while migrating. This
> could cause the anon_vma not to be unlocked.
>
lock dependency here is page_lock(page) -> page's anon_vma->lock.
So, I guess anon_vma->lock cannot be unlocked by other threads
if we have page_lock(page).
> I guess we would need to have try_to_unmap return some state information.
What kind of information ?
> I also toyed around with writing an "install_migration_ptes" function
> which would be called only for anonymous pages and would reduce the
> changes to try_to_unmap(). However, that also got too complicated.
>
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH 5/5] Swapless V2: Revise main migration logic
2006-04-18 8:05 ` KAMEZAWA Hiroyuki
@ 2006-04-18 8:27 ` Christoph Lameter
2006-04-18 9:08 ` KAMEZAWA Hiroyuki
0 siblings, 1 reply; 54+ messages in thread
From: Christoph Lameter @ 2006-04-18 8:27 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: akpm, hugh, linux-kernel, lee.schermerhorn, linux-mm, taka,
marcelo.tosatti
On Tue, 18 Apr 2006, KAMEZAWA Hiroyuki wrote:
> On Mon, 17 Apr 2006 23:58:41 -0700 (PDT)
> Christoph Lameter <clameter@sgi.com> wrote:
>
> > Hmmm... Good ideas. I think it could be much simpler like the following
> > patch.
> >
> > However, the problem here is how to know that we really took the anon_vma
> > lock and what to do about a page being unmmapped while migrating. This
> > could cause the anon_vma not to be unlocked.
> >
> lock dependency here is page_lock(page) -> page's anon_vma->lock.
> So, I guess anon_vma->lock cannot be unlocked by other threads
> if we have page_lock(page).
No the problem is to know if the lock was really taken. SWAP_AGAIN could
mean that page_lock_anon_vma failed.
Also the page may be freed while it is being processes. In that case
remove_migration_ptes may not find the mapping and may not unlock the
anon_vma.
> > I guess we would need to have try_to_unmap return some state information.
> What kind of information ?
Information that indicates that the anon_vma lock was taken.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH 5/5] Swapless V2: Revise main migration logic
2006-04-18 8:27 ` Christoph Lameter
@ 2006-04-18 9:08 ` KAMEZAWA Hiroyuki
2006-04-18 16:49 ` Christoph Lameter
0 siblings, 1 reply; 54+ messages in thread
From: KAMEZAWA Hiroyuki @ 2006-04-18 9:08 UTC (permalink / raw)
To: Christoph Lameter
Cc: akpm, hugh, linux-kernel, lee.schermerhorn, linux-mm, taka,
marcelo.tosatti
On Tue, 18 Apr 2006 01:27:40 -0700 (PDT)
Christoph Lameter <clameter@sgi.com> wrote:
> On Tue, 18 Apr 2006, KAMEZAWA Hiroyuki wrote:
>
> > On Mon, 17 Apr 2006 23:58:41 -0700 (PDT)
> > Christoph Lameter <clameter@sgi.com> wrote:
> >
> > > Hmmm... Good ideas. I think it could be much simpler like the following
> > > patch.
> > >
> > > However, the problem here is how to know that we really took the anon_vma
> > > lock and what to do about a page being unmmapped while migrating. This
> > > could cause the anon_vma not to be unlocked.
> > >
> > lock dependency here is page_lock(page) -> page's anon_vma->lock.
> > So, I guess anon_vma->lock cannot be unlocked by other threads
> > if we have page_lock(page).
>
> No the problem is to know if the lock was really taken. SWAP_AGAIN could
> mean that page_lock_anon_vma failed.
>
Ah, I see. and understood what you did in http://lkml.org/lkml/2006/4/18/19
That will be happen when the migration takes the anon_vma->lock in
try_to_unmap().
> Also the page may be freed while it is being processes. In that case
> remove_migration_ptes may not find the mapping and may not unlock the
> anon_vma.
>
My patch in http://lkml.org/lkml/2006/4/17/180
This is a look when above patch is applied.
==
/*
* Common logic to directly migrate a single page suitable for
* pages that do not use PagePrivate.
*
* Pages are locked upon entry and exit.
*/
int migrate_page(struct page *newpage, struct page *page)
{
int rc;
struct anon_vma *anon_vma;
BUG_ON(PageWriteback(page)); /* Writeback must be complete */
if (PageAnon(page)) {
anon_vma = page_lock_anon_vma(page);
}
rc = migrate_page_remove_references(newpage, page,
page_mapping(page) ? 2 : 1);
if (rc) {
remove_migration_ptes(anon_vma, page, page);
goto unlock_out;
}
migrate_page_copy(newpage, page);
remove_migration_ptes(anon_vma, page, newpage);
unlock_out:
if (anon_vma)
spin_unlock(&anon_vma->lock);
return rc;
}
==
lock around anon_vma->lock does not depend on the result of
try_to_unmap() and remove_migration_ptes().
But I agree : 'taking anon_vma->lock before try_to_unmap() is ugly and complicated
and will make things insane.'
Will this attached one make things clearer ?
This anon_vma->lock is just an optimization (for now) but complicated.
I think restart discusstion against -mm3? will be better.
-Kame
==
Index: Christoph-NewMigrationV2/mm/rmap.c
===================================================================
--- Christoph-NewMigrationV2.orig/mm/rmap.c
+++ Christoph-NewMigrationV2/mm/rmap.c
@@ -711,29 +711,44 @@ static void try_to_unmap_cluster(unsigne
pte_unmap_unlock(pte - 1, ptl);
}
-static int try_to_unmap_anon(struct page *page, int migration)
+static int __try_to_unmap_anon(struct anon_vma *anon_vma,
+ struct page *page, int migration)
{
- struct anon_vma *anon_vma;
struct vm_area_struct *vma;
int ret = SWAP_AGAIN;
- if (migration) { /* anon_vma->lock is held under migration */
- unsigned long mapping;
- mapping = (unsigned long)page->mapping - PAGE_MAPPING_ANON;
- anon_vma = (struct anon_vma *)mapping;
- } else {
- anon_vma = page_lock_anon_vma(page);
- }
- if (!anon_vma)
- return ret;
-
list_for_each_entry(vma, &anon_vma->head, anon_vma_node) {
ret = try_to_unmap_one(page, vma, migration);
if (ret == SWAP_FAIL || !page_mapped(page))
break;
}
- if (!migration)
- spin_unlock(&anon_vma->lock);
+ return ret;
+}
+
+static int try_to_unmap_anon(struct page *page)
+{
+ struct anon_vma *anon_vma;
+ struct vm_area_struct *vma;
+ int ret = SWAP_AGAIN;
+
+ anon_vma = page_lock_anon_vma(page);
+ if (!anon_vma)
+ return ret;
+ ret = __try_to_unmap_anon(anon_vma, page, 0);
+ spin_unlock(&anon_vma->lock);
+ return ret;
+}
+
+static int try_to_unmap_anon_migrate(struct page *page)
+{
+ struct anon_vma *anon_vma;
+ unsigned long mapping;
+ int ret = SWAP_AGAIN;
+ if (PageAnon(page))
+ return ret;
+ mapping = page->mapping;
+ anon_vma = (struct anon_vma *)(mapping - PAGE_MAPPING_ANON);
+ ret = __try_to_unmap_anon_migrate(anon_vma, page, 1);
return ret;
}
@@ -851,9 +866,12 @@ int try_to_unmap(struct page *page, int
BUG_ON(!PageLocked(page));
- if (PageAnon(page))
- ret = try_to_unmap_anon(page, migration);
- else
+ if (PageAnon(page)) {
+ if (migration)
+ ret = try_to_unmap_anon_migrate(page);
+ else
+ ret = try_to_unmap_anon(page);
+ } else
ret = try_to_unmap_file(page, migration);
if (!page_mapped(page))
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 54+ messages in thread* Re: [PATCH 5/5] Swapless V2: Revise main migration logic
2006-04-18 9:08 ` KAMEZAWA Hiroyuki
@ 2006-04-18 16:49 ` Christoph Lameter
0 siblings, 0 replies; 54+ messages in thread
From: Christoph Lameter @ 2006-04-18 16:49 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: akpm, hugh, linux-kernel, lee.schermerhorn, linux-mm, taka,
marcelo.tosatti
On Tue, 18 Apr 2006, KAMEZAWA Hiroyuki wrote:
> This anon_vma->lock is just an optimization (for now) but complicated.
> I think restart discusstion against -mm3? will be better.
I agree.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH 0/5] Swapless page migration V2: Overview
2006-04-13 23:54 [PATCH 0/5] Swapless page migration V2: Overview Christoph Lameter
` (4 preceding siblings ...)
2006-04-13 23:54 ` [PATCH 5/5] Swapless V2: Revise main migration logic Christoph Lameter
@ 2006-04-14 0:08 ` Andrew Morton
2006-04-14 0:27 ` Christoph Lameter
5 siblings, 1 reply; 54+ messages in thread
From: Andrew Morton @ 2006-04-14 0:08 UTC (permalink / raw)
To: Christoph Lameter
Cc: hugh, linux-kernel, lee.schermerhorn, linux-mm, taka,
marcelo.tosatti, kamezawa.hiroyu
Christoph Lameter <clameter@sgi.com> wrote:
>
> Swapless Page migration V2
>
> Currently page migration is depending on the ability to assign swap entries
> to pages. However, those entries will only be to identify anonymous pages.
> Page migration will not work without swap although swap space is never
> really used.
That strikes me as a fairly minor limitation?
> ...
>
> Efficiency of migration is increased by:
>
> 1. Avoiding useless retries
> The use of migration entries avoids raising the page count in do_swap_page().
> The existing approach can increase the page count between the unmapping
> of the ptes for a page and the page migration page count check resulting
> in having to retry migration although all accesses have been stopped.
Minor.
> 2. Swap entries do not have to be assigned and removed from pages.
Minor.
> 3. No swap space has to be setup for page migration. Page migration
> will never use swap.
Minor.
> The patchset will allow later patches to enable migration of VM_LOCKED vmas,
> the ability to exempt vmas from page migration, and allow the implementation
> of a another userland migration API for handling batches of pages.
These seem like more important justifications. Would you agree with that
judgement?
Is it not possible to implement some or all of these new things without
this work?
That all being said, this patchset is pretty low-impact:
include/linux/rmap.h | 1
include/linux/swap.h | 6
include/linux/swapops.h | 32 +++++
mm/Kconfig | 4
mm/memory.c | 6
mm/migrate.c | 242 ++++++++++++++++++++------------------
mm/rmap.c | 88 ++++---------
mm/swapfile.c | 15 --
8 files changed, 212 insertions(+), 182 deletions(-)
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 54+ messages in thread* Re: [PATCH 0/5] Swapless page migration V2: Overview
2006-04-14 0:08 ` [PATCH 0/5] Swapless page migration V2: Overview Andrew Morton
@ 2006-04-14 0:27 ` Christoph Lameter
2006-04-14 14:14 ` Lee Schermerhorn
0 siblings, 1 reply; 54+ messages in thread
From: Christoph Lameter @ 2006-04-14 0:27 UTC (permalink / raw)
To: Andrew Morton
Cc: hugh, linux-kernel, lee.schermerhorn, linux-mm, taka,
marcelo.tosatti, kamezawa.hiroyu
On Thu, 13 Apr 2006, Andrew Morton wrote:
> > Currently page migration is depending on the ability to assign swap entries
> > to pages. However, those entries will only be to identify anonymous pages.
> > Page migration will not work without swap although swap space is never
> > really used.
>
> That strikes me as a fairly minor limitation?
Some people want never ever to use swap. Systems that have no swap defined
will currently not be able to migrate pages. Its kind of difficult to
comprehend that you need to have swap for migration, but then its not
going to be used.
> > The patchset will allow later patches to enable migration of VM_LOCKED vmas,
> > the ability to exempt vmas from page migration, and allow the implementation
> > of a another userland migration API for handling batches of pages.
>
> These seem like more important justifications. Would you agree with that
> judgement?
The swapless thing is the most important for us because many of our
customers do not have swap setup. Then follow the above
features then the efficiency consideration.
> Is it not possible to implement some or all of these new things without
> this work?
VM_LOCKED semantics are that a page cannot be swapped out. Not being able
to swap and not being able to migrate are the same right now. We need to
separate both.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH 0/5] Swapless page migration V2: Overview
2006-04-14 0:27 ` Christoph Lameter
@ 2006-04-14 14:14 ` Lee Schermerhorn
0 siblings, 0 replies; 54+ messages in thread
From: Lee Schermerhorn @ 2006-04-14 14:14 UTC (permalink / raw)
To: Christoph Lameter
Cc: Andrew Morton, hugh, linux-kernel, linux-mm, taka,
marcelo.tosatti, kamezawa.hiroyu
On Thu, 2006-04-13 at 17:27 -0700, Christoph Lameter wrote:
> On Thu, 13 Apr 2006, Andrew Morton wrote:
>
> > > Currently page migration is depending on the ability to assign swap entries
> > > to pages. However, those entries will only be to identify anonymous pages.
> > > Page migration will not work without swap although swap space is never
> > > really used.
> >
> > That strikes me as a fairly minor limitation?
>
> Some people want never ever to use swap. Systems that have no swap defined
> will currently not be able to migrate pages. Its kind of difficult to
> comprehend that you need to have swap for migration, but then its not
> going to be used.
>
> > > The patchset will allow later patches to enable migration of VM_LOCKED vmas,
> > > the ability to exempt vmas from page migration, and allow the implementation
> > > of a another userland migration API for handling batches of pages.
> >
> > These seem like more important justifications. Would you agree with that
> > judgement?
>
> The swapless thing is the most important for us because many of our
> customers do not have swap setup. Then follow the above
> features then the efficiency consideration.
I do have the migration cache working against 17-rc1-mm2. I tried to
address Christoph's prior comments. I just haven't posted yet, as I was
working the migrate-on-fault/auto-migration series. If one accepts lazy
migration, then the migration cache becomes more important because anon
pages can/will stay in the swap cache until the page is finally freed
[or maybe gets evicted from the swap cache?].
The migration cache still uses the swap infrastructure, so must
configure SWAP. But, no swap devices need be configured. Should
address that particular concern w/o major surgery to the existing
migration code.
Let me know if I should repost the patches. Meanwhile, they're
available at:
http://free.linux.hp.com/~lts/Patches/PageMigration/ [which seems
temporarily, I hope, unavailable]. Look for the migcache tarball.
Lee
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 54+ messages in thread