All of lore.kernel.org
 help / color / mirror / Atom feed
From: Will Deacon <will.deacon@arm.com>
To: linux-mm@kvack.org, linux-kernel@vger.kernel.org
Cc: mark.rutland@arm.com, akpm@linux-foundation.org,
	kirill.shutemov@linux.intel.com, Punit.Agrawal@arm.com,
	mgorman@suse.de, steve.capper@arm.com,
	Will Deacon <will.deacon@arm.com>
Subject: [PATCH 1/3] mm: numa: avoid waiting on freed migrated pages
Date: Tue,  6 Jun 2017 18:58:34 +0100	[thread overview]
Message-ID: <1496771916-28203-2-git-send-email-will.deacon@arm.com> (raw)
In-Reply-To: <1496771916-28203-1-git-send-email-will.deacon@arm.com>

From: Mark Rutland <mark.rutland@arm.com>

In do_huge_pmd_numa_page(), we attempt to handle a migrating thp pmd by
waiting until the pmd is unlocked before we return and retry. However,
we can race with migrate_misplaced_transhuge_page():

// do_huge_pmd_numa_page                // migrate_misplaced_transhuge_page()
// Holds 0 refs on page                 // Holds 2 refs on page

vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd);
/* ... */
if (pmd_trans_migrating(*vmf->pmd)) {
        page = pmd_page(*vmf->pmd);
        spin_unlock(vmf->ptl);
                                        ptl = pmd_lock(mm, pmd);
                                        if (page_count(page) != 2)) {
                                                /* roll back */
                                        }
                                        /* ... */
                                        mlock_migrate_page(new_page, page);
                                        /* ... */
                                        spin_unlock(ptl);
                                        put_page(page);
                                        put_page(page); // page freed here
        wait_on_page_locked(page);
        goto out;
}

This can result in the freed page having its waiters flag set
unexpectedly, which trips the PAGE_FLAGS_CHECK_AT_PREP checks in the
page alloc/free functions. This has been observed on arm64 KVM guests.

We can avoid this by having do_huge_pmd_numa_page() take a reference on
the page before dropping the pmd lock, mirroring what we do in
__migration_entry_wait().

When we hit the race, migrate_misplaced_transhuge_page() will see the
reference and abort the migration, as it may do today in other cases.

Acked-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 mm/huge_memory.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index a84909cf20d3..88c6167f194d 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1426,8 +1426,11 @@ int do_huge_pmd_numa_page(struct vm_fault *vmf, pmd_t pmd)
 	 */
 	if (unlikely(pmd_trans_migrating(*vmf->pmd))) {
 		page = pmd_page(*vmf->pmd);
+		if (!get_page_unless_zero(page))
+			goto out_unlock;
 		spin_unlock(vmf->ptl);
 		wait_on_page_locked(page);
+		put_page(page);
 		goto out;
 	}
 
@@ -1459,9 +1462,12 @@ int do_huge_pmd_numa_page(struct vm_fault *vmf, pmd_t pmd)
 
 	/* Migration could have started since the pmd_trans_migrating check */
 	if (!page_locked) {
+		page_nid = -1;
+		if (!get_page_unless_zero(page))
+			goto out_unlock;
 		spin_unlock(vmf->ptl);
 		wait_on_page_locked(page);
-		page_nid = -1;
+		put_page(page);
 		goto out;
 	}
 
-- 
2.1.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Will Deacon <will.deacon@arm.com>
To: linux-mm@kvack.org, linux-kernel@vger.kernel.org
Cc: mark.rutland@arm.com, akpm@linux-foundation.org,
	kirill.shutemov@linux.intel.com, Punit.Agrawal@arm.com,
	mgorman@suse.de, steve.capper@arm.com,
	Will Deacon <will.deacon@arm.com>
Subject: [PATCH 1/3] mm: numa: avoid waiting on freed migrated pages
Date: Tue,  6 Jun 2017 18:58:34 +0100	[thread overview]
Message-ID: <1496771916-28203-2-git-send-email-will.deacon@arm.com> (raw)
In-Reply-To: <1496771916-28203-1-git-send-email-will.deacon@arm.com>

From: Mark Rutland <mark.rutland@arm.com>

In do_huge_pmd_numa_page(), we attempt to handle a migrating thp pmd by
waiting until the pmd is unlocked before we return and retry. However,
we can race with migrate_misplaced_transhuge_page():

// do_huge_pmd_numa_page                // migrate_misplaced_transhuge_page()
// Holds 0 refs on page                 // Holds 2 refs on page

vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd);
/* ... */
if (pmd_trans_migrating(*vmf->pmd)) {
        page = pmd_page(*vmf->pmd);
        spin_unlock(vmf->ptl);
                                        ptl = pmd_lock(mm, pmd);
                                        if (page_count(page) != 2)) {
                                                /* roll back */
                                        }
                                        /* ... */
                                        mlock_migrate_page(new_page, page);
                                        /* ... */
                                        spin_unlock(ptl);
                                        put_page(page);
                                        put_page(page); // page freed here
        wait_on_page_locked(page);
        goto out;
}

This can result in the freed page having its waiters flag set
unexpectedly, which trips the PAGE_FLAGS_CHECK_AT_PREP checks in the
page alloc/free functions. This has been observed on arm64 KVM guests.

We can avoid this by having do_huge_pmd_numa_page() take a reference on
the page before dropping the pmd lock, mirroring what we do in
__migration_entry_wait().

When we hit the race, migrate_misplaced_transhuge_page() will see the
reference and abort the migration, as it may do today in other cases.

Acked-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 mm/huge_memory.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index a84909cf20d3..88c6167f194d 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1426,8 +1426,11 @@ int do_huge_pmd_numa_page(struct vm_fault *vmf, pmd_t pmd)
 	 */
 	if (unlikely(pmd_trans_migrating(*vmf->pmd))) {
 		page = pmd_page(*vmf->pmd);
+		if (!get_page_unless_zero(page))
+			goto out_unlock;
 		spin_unlock(vmf->ptl);
 		wait_on_page_locked(page);
+		put_page(page);
 		goto out;
 	}
 
@@ -1459,9 +1462,12 @@ int do_huge_pmd_numa_page(struct vm_fault *vmf, pmd_t pmd)
 
 	/* Migration could have started since the pmd_trans_migrating check */
 	if (!page_locked) {
+		page_nid = -1;
+		if (!get_page_unless_zero(page))
+			goto out_unlock;
 		spin_unlock(vmf->ptl);
 		wait_on_page_locked(page);
-		page_nid = -1;
+		put_page(page);
 		goto out;
 	}
 
-- 
2.1.4

  reply	other threads:[~2017-06-06 17:58 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-06 17:58 [PATCH 0/3] mm: huge pages: Misc fixes for issues found during fuzzing Will Deacon
2017-06-06 17:58 ` Will Deacon
2017-06-06 17:58 ` Will Deacon [this message]
2017-06-06 17:58   ` [PATCH 1/3] mm: numa: avoid waiting on freed migrated pages Will Deacon
2017-06-08  9:04   ` Vlastimil Babka
2017-06-08  9:04     ` Vlastimil Babka
2017-06-08 10:31     ` Mark Rutland
2017-06-08 10:31       ` Mark Rutland
2017-06-08 10:27   ` Kirill A. Shutemov
2017-06-08 10:27     ` Kirill A. Shutemov
2017-06-06 17:58 ` [PATCH 2/3] mm/page_ref: Ensure page_ref_unfreeze is ordered against prior accesses Will Deacon
2017-06-06 17:58   ` Will Deacon
2017-06-08  9:38   ` Vlastimil Babka
2017-06-08  9:38     ` Vlastimil Babka
2017-06-08 10:34     ` Will Deacon
2017-06-08 10:34       ` Will Deacon
2017-06-08 11:02       ` Vlastimil Babka
2017-06-08 11:02         ` Vlastimil Babka
2017-06-08 10:40     ` Kirill A. Shutemov
2017-06-08 10:40       ` Kirill A. Shutemov
2017-06-08 11:07       ` Vlastimil Babka
2017-06-08 11:07         ` Vlastimil Babka
2017-06-08 11:24         ` Will Deacon
2017-06-08 11:24           ` Will Deacon
2017-06-08 12:16           ` Peter Zijlstra
2017-06-08 12:16             ` Peter Zijlstra
2017-06-08 12:19             ` Peter Zijlstra
2017-06-08 12:19               ` Peter Zijlstra
2017-06-08 12:50           ` Peter Zijlstra
2017-06-08 12:50             ` Peter Zijlstra
2017-06-09 10:05             ` Will Deacon
2017-06-09 10:05               ` Will Deacon
2017-06-06 17:58 ` [PATCH 3/3] mm: migrate: Stabilise page count when migrating transparent hugepages Will Deacon
2017-06-06 17:58   ` Will Deacon
2017-06-08 10:47   ` Kirill A. Shutemov
2017-06-08 10:47     ` Kirill A. Shutemov
2017-06-08 10:52   ` Vlastimil Babka
2017-06-08 10:52     ` Vlastimil Babka
2017-06-08 12:07     ` Will Deacon
2017-06-08 12:07       ` Will Deacon
2017-06-09  8:25       ` zhong jiang
2017-06-09  8:25         ` zhong jiang
2017-06-09  9:16       ` zhong jiang
2017-06-09  9:16         ` zhong jiang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1496771916-28203-2-git-send-email-will.deacon@arm.com \
    --to=will.deacon@arm.com \
    --cc=Punit.Agrawal@arm.com \
    --cc=akpm@linux-foundation.org \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mark.rutland@arm.com \
    --cc=mgorman@suse.de \
    --cc=steve.capper@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.