From: Mel Gorman <mel@csn.ul.ie>
To: Christoph Lameter <cl@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Andrea Arcangeli <aarcange@redhat.com>,
Adam Litke <agl@us.ibm.com>, Avi Kivity <avi@redhat.com>,
David Rientjes <rientjes@google.com>,
Minchan Kim <minchan.kim@gmail.com>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
Rik van Riel <riel@redhat.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH 04/14] mm,migration: Allow the migration of PageSwapCache pages
Date: Thu, 22 Apr 2010 10:28:20 +0100 [thread overview]
Message-ID: <20100422092819.GR30306@csn.ul.ie> (raw)
In-Reply-To: <alpine.DEB.2.00.1004211038020.4959@router.home>
On Wed, Apr 21, 2010 at 10:46:45AM -0500, Christoph Lameter wrote:
> On Wed, 21 Apr 2010, Mel Gorman wrote:
>
> > > > 2. Is the BUG_ON check in
> > > > include/linux/swapops.h#migration_entry_to_page() now wrong? (I
> > > > think yes, but I'm not sure and I'm having trouble verifying it)
> > >
> > > The bug check ensures that migration entries only occur when the page
> > > is locked. This patch changes that behavior. This is going too oops
> > > therefore in unmap_and_move() when you try to remove the migration_ptes
> > > from an unlocked page.
> > >
> >
> > It's not unmap_and_move() that the problem is occurring on but during a
> > page fault - presumably in do_swap_page but I'm not 100% certain.
>
> remove_migration_pte() calls migration_entry_to_page(). So it must do that
> only if the page is still locked.
>
Correct, but the other call path is
do_swap_page
-> migration_entry_wait
-> migration_entry_to_page
with migration_entry_wait expecting the page to be locked. There is a dangling
migration PTEs coming from somewhere. I thought it was from unmapped swapcache
first, but that cannot be the case. There is a race somewhere.
> You need to ensure that the page is not unlocked in move_to_new_page() if
> the migration ptes are kept.
>
> move_to_new_page() only unlocks the new page not the original page. So that is safe.
>
> And it seems that the old page is also unlocked in unmap_and_move() only
> after the migration_ptes have been removed? So we are fine after all...?
>
You'd think but migration PTEs are being left behind in some circumstance. I
thought it was due to this series, but it's unlikely. It's more a case that
compaction heavily exercises migration.
We can clean up the old migration PTEs though when they are encountered
like in the following patch for example? I'll continue investigating why
this dangling migration pte exists as closing that race would be a
better fix.
==== CUT HERE ====
mm,migration: Remove dangling migration ptes pointing to unlocked pages
Due to some yet-to-be-identified race, it is possible for migration PTEs
to be left behind, When later paged-in, a BUG is triggered that assumes
that all migration PTEs are point to a page currently being migrated and
so must be locked.
Rather than calling BUG, this patch notes the existance of dangling migration
PTEs in migration_entry_wait() and cleans them up.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
include/linux/swapops.h | 11 ++++++-----
mm/memory.c | 2 +-
mm/migrate.c | 24 ++++++++++++++++++++----
3 files changed, 27 insertions(+), 10 deletions(-)
diff --git a/include/linux/swapops.h b/include/linux/swapops.h
index cd42e30..01fba71 100644
--- a/include/linux/swapops.h
+++ b/include/linux/swapops.h
@@ -95,14 +95,15 @@ static inline int is_write_migration_entry(swp_entry_t entry)
return unlikely(swp_type(entry) == SWP_MIGRATION_WRITE);
}
-static inline struct page *migration_entry_to_page(swp_entry_t entry)
+static inline struct page *migration_entry_to_page(swp_entry_t entry,
+ bool lock_required)
{
struct page *p = pfn_to_page(swp_offset(entry));
/*
* Any use of migration entries may only occur while the
* corresponding page is locked
*/
- BUG_ON(!PageLocked(p));
+ BUG_ON(!PageLocked(p) && lock_required);
return p;
}
@@ -112,7 +113,7 @@ static inline void make_migration_entry_read(swp_entry_t *entry)
}
extern void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd,
- unsigned long address);
+ unsigned long address, bool lock_required);
#else
#define make_migration_entry(page, write) swp_entry(0, 0)
@@ -121,9 +122,9 @@ static inline int is_migration_entry(swp_entry_t swp)
return 0;
}
#define migration_entry_to_page(swp) NULL
-static inline void make_migration_entry_read(swp_entry_t *entryp) { }
+static inline void make_migration_entry_read(swp_entry_t *entryp, bool lock_required) { }
static inline void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd,
- unsigned long address) { }
+ unsigned long address, bool lock_required) { }
static inline int is_write_migration_entry(swp_entry_t entry)
{
return 0;
diff --git a/mm/memory.c b/mm/memory.c
index 833952d..9719138 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2619,7 +2619,7 @@ static int do_swap_page(struct mm_struct *mm, struct vm_area_struct *vma,
entry = pte_to_swp_entry(orig_pte);
if (unlikely(non_swap_entry(entry))) {
if (is_migration_entry(entry)) {
- migration_entry_wait(mm, pmd, address);
+ migration_entry_wait(mm, pmd, address, false);
} else if (is_hwpoison_entry(entry)) {
ret = VM_FAULT_HWPOISON;
} else {
diff --git a/mm/migrate.c b/mm/migrate.c
index 4afd6fe..308639b 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -114,7 +114,7 @@ static int remove_migration_pte(struct page *new, struct vm_area_struct *vma,
entry = pte_to_swp_entry(pte);
if (!is_migration_entry(entry) ||
- migration_entry_to_page(entry) != old)
+ migration_entry_to_page(entry, true) != old)
goto unlock;
get_page(new);
@@ -148,19 +148,23 @@ static void remove_migration_ptes(struct page *old, struct page *new)
/*
* Something used the pte of a page under migration. We need to
- * get to the page and wait until migration is finished.
+ * get to the page and wait until migration is finished. Alternatively,
+ * the migration of an unmapped swapcache page could have left a
+ * dangling migration pte due to the lack of certainity about an
+ * anon_vma. In that case, the migration pte is removed and rechecked.
* When we return from this function the fault will be retried.
*
* This function is called from do_swap_page().
*/
void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd,
- unsigned long address)
+ unsigned long address, bool lock_required)
{
pte_t *ptep, pte;
spinlock_t *ptl;
swp_entry_t entry;
struct page *page;
+recheck:
ptep = pte_offset_map_lock(mm, pmd, address, &ptl);
pte = *ptep;
if (!is_swap_pte(pte))
@@ -170,7 +174,7 @@ void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd,
if (!is_migration_entry(entry))
goto out;
- page = migration_entry_to_page(entry);
+ page = migration_entry_to_page(entry, lock_required);
/*
* Once radix-tree replacement of page migration started, page_count
@@ -181,6 +185,18 @@ void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd,
*/
if (!get_page_unless_zero(page))
goto out;
+
+ /* If unlocked, this is a dangling migration pte that needs removal */
+ if (!PageLocked(page)) {
+ BUG_ON(lock_required);
+ pte_unmap_unlock(ptep, ptl);
+ lock_page(page);
+ remove_migration_pte(page, find_vma(mm, address), address, page);
+ unlock_page(page);
+ put_page(page);
+ goto recheck;
+ }
+
pte_unmap_unlock(ptep, ptl);
wait_on_page_locked(page);
put_page(page);
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2010-04-22 9:28 UTC|newest]
Thread overview: 69+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-04-20 21:01 [PATCH 0/14] Memory Compaction v8 Mel Gorman
2010-04-20 21:01 ` [PATCH 01/14] mm,migration: Take a reference to the anon_vma before migrating Mel Gorman
2010-04-21 2:49 ` KAMEZAWA Hiroyuki
2010-04-20 21:01 ` [PATCH 02/14] mm,migration: Share the anon_vma ref counts between KSM and page migration Mel Gorman
2010-04-20 21:01 ` [PATCH 03/14] mm,migration: Do not try to migrate unmapped anonymous pages Mel Gorman
2010-04-20 21:01 ` [PATCH 04/14] mm,migration: Allow the migration of PageSwapCache pages Mel Gorman
2010-04-21 14:30 ` Christoph Lameter
2010-04-21 15:00 ` Mel Gorman
2010-04-21 15:05 ` Christoph Lameter
2010-04-21 15:14 ` Mel Gorman
2010-04-21 15:31 ` Christoph Lameter
2010-04-21 15:34 ` Mel Gorman
2010-04-21 15:46 ` Christoph Lameter
2010-04-22 9:28 ` Mel Gorman [this message]
2010-04-22 9:46 ` KAMEZAWA Hiroyuki
2010-04-22 10:13 ` Minchan Kim
2010-04-22 10:31 ` KAMEZAWA Hiroyuki
2010-04-22 10:51 ` KAMEZAWA Hiroyuki
2010-04-22 14:14 ` Mel Gorman
2010-04-22 14:18 ` Minchan Kim
2010-04-22 15:40 ` Mel Gorman
2010-04-22 16:13 ` Mel Gorman
2010-04-22 19:29 ` Mel Gorman
2010-04-22 19:40 ` Christoph Lameter
2010-04-22 23:52 ` KAMEZAWA Hiroyuki
2010-04-23 9:03 ` Mel Gorman
2010-04-22 14:23 ` Minchan Kim
2010-04-22 14:40 ` Minchan Kim
2010-04-22 15:44 ` Mel Gorman
2010-04-23 18:31 ` Andrea Arcangeli
2010-04-23 19:23 ` Mel Gorman
2010-04-23 19:39 ` Andrea Arcangeli
2010-04-23 21:35 ` Andrea Arcangeli
2010-04-24 10:52 ` Mel Gorman
2010-04-24 11:13 ` Andrea Arcangeli
2010-04-24 11:59 ` Mel Gorman
2010-04-24 14:30 ` Andrea Arcangeli
2010-04-26 21:54 ` Rik van Riel
2010-04-26 22:11 ` Mel Gorman
2010-04-26 22:26 ` Andrea Arcangeli
2010-04-25 14:41 ` Andrea Arcangeli
2010-04-27 9:40 ` Mel Gorman
2010-04-27 10:41 ` KAMEZAWA Hiroyuki
2010-04-27 11:12 ` Mel Gorman
2010-04-27 15:42 ` Andrea Arcangeli
2010-04-24 10:50 ` Mel Gorman
2010-04-22 15:14 ` Christoph Lameter
2010-04-23 3:39 ` Paul E. McKenney
2010-04-23 4:55 ` Minchan Kim
2010-04-21 23:59 ` KAMEZAWA Hiroyuki
2010-04-22 0:11 ` Minchan Kim
2010-04-20 21:01 ` [PATCH 05/14] mm: Allow CONFIG_MIGRATION to be set without CONFIG_NUMA or memory hot-remove Mel Gorman
2010-04-20 21:01 ` [PATCH 06/14] mm: Export unusable free space index via debugfs Mel Gorman
2010-04-20 21:01 ` [PATCH 07/14] mm: Export fragmentation " Mel Gorman
2010-04-20 21:01 ` [PATCH 08/14] mm: Move definition for LRU isolation modes to a header Mel Gorman
2010-04-20 21:01 ` [PATCH 09/14] mm,compaction: Memory compaction core Mel Gorman
2010-04-20 21:01 ` [PATCH 10/14] mm,compaction: Add /proc trigger for memory compaction Mel Gorman
2010-04-20 21:01 ` [PATCH 11/14] mm,compaction: Add /sys trigger for per-node " Mel Gorman
2010-04-20 21:01 ` [PATCH 12/14] mm,compaction: Direct compact when a high-order allocation fails Mel Gorman
2010-05-05 12:19 ` [PATCH] fix count_vm_event preempt in memory compaction direct reclaim Andrea Arcangeli
2010-05-05 12:51 ` Mel Gorman
2010-05-05 13:11 ` Andrea Arcangeli
2010-05-05 13:55 ` Mel Gorman
2010-05-05 14:48 ` Andrea Arcangeli
2010-05-05 15:14 ` Mel Gorman
2010-05-05 15:25 ` Andrea Arcangeli
2010-05-05 15:32 ` Mel Gorman
2010-04-20 21:01 ` [PATCH 13/14] mm,compaction: Add a tunable that decides when memory should be compacted and when it should be reclaimed Mel Gorman
2010-04-20 21:01 ` [PATCH 14/14] mm,compaction: Defer compaction using an exponential backoff when compaction fails Mel Gorman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100422092819.GR30306@csn.ul.ie \
--to=mel@csn.ul.ie \
--cc=aarcange@redhat.com \
--cc=agl@us.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=avi@redhat.com \
--cc=cl@linux-foundation.org \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=minchan.kim@gmail.com \
--cc=riel@redhat.com \
--cc=rientjes@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).