From: Michel Lespinasse <walken@google.com>
To: Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Cc: Andrea Arcangeli <aarcange@redhat.com>,
Hugh Dickins <hughd@google.com>,
Minchan Kim <minchan.kim@gmail.com>,
Johannes Weiner <jweiner@redhat.com>,
Rik van Riel <riel@redhat.com>, Mel Gorman <mgorman@suse.de>,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
Shaohua Li <shaohua.li@intel.com>
Subject: [PATCH 6/9] mm: assert that get_page_unless_zero() callers hold the rcu lock
Date: Fri, 19 Aug 2011 00:48:28 -0700 [thread overview]
Message-ID: <1313740111-27446-7-git-send-email-walken@google.com> (raw)
In-Reply-To: <1313740111-27446-1-git-send-email-walken@google.com>
In order to guarantee that page counts are stable one rcu grace period
after page allocation, it is important that get_page_unless_zero
call sites follow the proper protocol and hold the rcu read lock from
the time they locate the desired page until they get a reference on it.
__isolate_lru_page() is exempted - it knows the page it's trying to get
a reference on can't get fully freed, as it is on LRU list and it holds
the zone LRU lock.
Other call sites in memory_hotplug.c, memory_failure.c and hwpoison-inject.c
are also exempted. It would be preferable if someone more familiar with
these features could determine if that's safe.
Signed-off-by: Michel Lespinasse <walken@google.com>
---
include/linux/mm.h | 16 +++++++++++++++-
include/linux/pagemap.h | 1 +
mm/hwpoison-inject.c | 2 +-
mm/memory-failure.c | 6 +++---
mm/memory_hotplug.c | 2 +-
mm/vmscan.c | 7 ++++++-
6 files changed, 27 insertions(+), 7 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 9670f71..9ff5f2d 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -275,13 +275,27 @@ static inline int put_page_testzero(struct page *page)
return atomic_dec_and_test(&page->_count);
}
+static inline int __get_page_unless_zero(struct page *page)
+{
+ return atomic_inc_not_zero(&page->_count);
+}
+
/*
* Try to grab a ref unless the page has a refcount of zero, return false if
* that is the case.
*/
static inline int get_page_unless_zero(struct page *page)
{
- return atomic_inc_not_zero(&page->_count);
+ /*
+ * See page_cache_get_speculative() comment in pagemap.h
+ * Note that for page counts to be guaranteed stable one
+ * RCU grace period after they've been allocated,
+ * all get_page_unless_zero call sites have to participate
+ * by taking an rcu read lock before locating the desired page
+ * and until getting a reference on it.
+ */
+ VM_BUG_ON(!rcu_read_lock_held());
+ return __get_page_unless_zero(page);
}
extern int page_is_ram(unsigned long pfn);
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 716875e..736f47d 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -131,6 +131,7 @@ void release_pages(struct page **pages, int nr, int cold);
*/
static inline int page_cache_get_speculative(struct page *page)
{
+ VM_BUG_ON(!rcu_read_lock_held());
VM_BUG_ON(in_interrupt());
#if !defined(CONFIG_SMP) && defined(CONFIG_TREE_RCU)
diff --git a/mm/hwpoison-inject.c b/mm/hwpoison-inject.c
index c7fc7fd..87e027b 100644
--- a/mm/hwpoison-inject.c
+++ b/mm/hwpoison-inject.c
@@ -30,7 +30,7 @@ static int hwpoison_inject(void *data, u64 val)
/*
* This implies unable to support free buddy pages.
*/
- if (!get_page_unless_zero(hpage))
+ if (!__get_page_unless_zero(hpage))
return 0;
if (!PageLRU(p) && !PageHuge(p))
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 740c4f5..6fc0409 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1025,7 +1025,7 @@ int __memory_failure(unsigned long pfn, int trapno, int flags)
* that may make page_freeze_refs()/page_unfreeze_refs() mismatch.
*/
if (!(flags & MF_COUNT_INCREASED) &&
- !get_page_unless_zero(hpage)) {
+ !__get_page_unless_zero(hpage)) {
if (is_free_buddy_page(p)) {
action_result(pfn, "free buddy", DELAYED);
return 0;
@@ -1210,7 +1210,7 @@ int unpoison_memory(unsigned long pfn)
nr_pages = 1 << compound_trans_order(page);
- if (!get_page_unless_zero(page)) {
+ if (!__get_page_unless_zero(page)) {
/*
* Since HWPoisoned hugepage should have non-zero refcount,
* race between memory failure and unpoison seems to happen.
@@ -1289,7 +1289,7 @@ static int get_any_page(struct page *p, unsigned long pfn, int flags)
* When the target page is a free hugepage, just remove it
* from free hugepage list.
*/
- if (!get_page_unless_zero(compound_head(p))) {
+ if (!__get_page_unless_zero(compound_head(p))) {
if (PageHuge(p)) {
pr_info("get_any_page: %#lx free huge page\n", pfn);
ret = dequeue_hwpoisoned_huge_page(compound_head(p));
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index c46887b..cf57dfc 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -710,7 +710,7 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
if (!pfn_valid(pfn))
continue;
page = pfn_to_page(pfn);
- if (!get_page_unless_zero(page))
+ if (!__get_page_unless_zero(page))
continue;
/*
* We can skip free pages. And we can only deal with pages on
diff --git a/mm/vmscan.c b/mm/vmscan.c
index d036e59..4c167da 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1001,11 +1001,16 @@ int __isolate_lru_page(struct page *page, int mode, int file)
ret = -EBUSY;
- if (likely(get_page_unless_zero(page))) {
+ if (likely(__get_page_unless_zero(page))) {
/*
* Be careful not to clear PageLRU until after we're
* sure the page is not being freed elsewhere -- the
* page release code relies on it.
+ *
+ * We are able to use the __get_page_unless_zero() variant
+ * because we know the page can't get fully freed before we
+ * get the reference on it - as it is on LRU list and we
+ * hold the zone LRU lock.
*/
ClearPageLRU(page);
ret = 0;
--
1.7.3.1
WARNING: multiple messages have this Message-ID (diff)
From: Michel Lespinasse <walken@google.com>
To: Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Cc: Andrea Arcangeli <aarcange@redhat.com>,
Hugh Dickins <hughd@google.com>,
Minchan Kim <minchan.kim@gmail.com>,
Johannes Weiner <jweiner@redhat.com>,
Rik van Riel <riel@redhat.com>, Mel Gorman <mgorman@suse.de>,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
Shaohua Li <shaohua.li@intel.com>
Subject: [PATCH 6/9] mm: assert that get_page_unless_zero() callers hold the rcu lock
Date: Fri, 19 Aug 2011 00:48:28 -0700 [thread overview]
Message-ID: <1313740111-27446-7-git-send-email-walken@google.com> (raw)
In-Reply-To: <1313740111-27446-1-git-send-email-walken@google.com>
In order to guarantee that page counts are stable one rcu grace period
after page allocation, it is important that get_page_unless_zero
call sites follow the proper protocol and hold the rcu read lock from
the time they locate the desired page until they get a reference on it.
__isolate_lru_page() is exempted - it knows the page it's trying to get
a reference on can't get fully freed, as it is on LRU list and it holds
the zone LRU lock.
Other call sites in memory_hotplug.c, memory_failure.c and hwpoison-inject.c
are also exempted. It would be preferable if someone more familiar with
these features could determine if that's safe.
Signed-off-by: Michel Lespinasse <walken@google.com>
---
include/linux/mm.h | 16 +++++++++++++++-
include/linux/pagemap.h | 1 +
mm/hwpoison-inject.c | 2 +-
mm/memory-failure.c | 6 +++---
mm/memory_hotplug.c | 2 +-
mm/vmscan.c | 7 ++++++-
6 files changed, 27 insertions(+), 7 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 9670f71..9ff5f2d 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -275,13 +275,27 @@ static inline int put_page_testzero(struct page *page)
return atomic_dec_and_test(&page->_count);
}
+static inline int __get_page_unless_zero(struct page *page)
+{
+ return atomic_inc_not_zero(&page->_count);
+}
+
/*
* Try to grab a ref unless the page has a refcount of zero, return false if
* that is the case.
*/
static inline int get_page_unless_zero(struct page *page)
{
- return atomic_inc_not_zero(&page->_count);
+ /*
+ * See page_cache_get_speculative() comment in pagemap.h
+ * Note that for page counts to be guaranteed stable one
+ * RCU grace period after they've been allocated,
+ * all get_page_unless_zero call sites have to participate
+ * by taking an rcu read lock before locating the desired page
+ * and until getting a reference on it.
+ */
+ VM_BUG_ON(!rcu_read_lock_held());
+ return __get_page_unless_zero(page);
}
extern int page_is_ram(unsigned long pfn);
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 716875e..736f47d 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -131,6 +131,7 @@ void release_pages(struct page **pages, int nr, int cold);
*/
static inline int page_cache_get_speculative(struct page *page)
{
+ VM_BUG_ON(!rcu_read_lock_held());
VM_BUG_ON(in_interrupt());
#if !defined(CONFIG_SMP) && defined(CONFIG_TREE_RCU)
diff --git a/mm/hwpoison-inject.c b/mm/hwpoison-inject.c
index c7fc7fd..87e027b 100644
--- a/mm/hwpoison-inject.c
+++ b/mm/hwpoison-inject.c
@@ -30,7 +30,7 @@ static int hwpoison_inject(void *data, u64 val)
/*
* This implies unable to support free buddy pages.
*/
- if (!get_page_unless_zero(hpage))
+ if (!__get_page_unless_zero(hpage))
return 0;
if (!PageLRU(p) && !PageHuge(p))
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 740c4f5..6fc0409 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1025,7 +1025,7 @@ int __memory_failure(unsigned long pfn, int trapno, int flags)
* that may make page_freeze_refs()/page_unfreeze_refs() mismatch.
*/
if (!(flags & MF_COUNT_INCREASED) &&
- !get_page_unless_zero(hpage)) {
+ !__get_page_unless_zero(hpage)) {
if (is_free_buddy_page(p)) {
action_result(pfn, "free buddy", DELAYED);
return 0;
@@ -1210,7 +1210,7 @@ int unpoison_memory(unsigned long pfn)
nr_pages = 1 << compound_trans_order(page);
- if (!get_page_unless_zero(page)) {
+ if (!__get_page_unless_zero(page)) {
/*
* Since HWPoisoned hugepage should have non-zero refcount,
* race between memory failure and unpoison seems to happen.
@@ -1289,7 +1289,7 @@ static int get_any_page(struct page *p, unsigned long pfn, int flags)
* When the target page is a free hugepage, just remove it
* from free hugepage list.
*/
- if (!get_page_unless_zero(compound_head(p))) {
+ if (!__get_page_unless_zero(compound_head(p))) {
if (PageHuge(p)) {
pr_info("get_any_page: %#lx free huge page\n", pfn);
ret = dequeue_hwpoisoned_huge_page(compound_head(p));
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index c46887b..cf57dfc 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -710,7 +710,7 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
if (!pfn_valid(pfn))
continue;
page = pfn_to_page(pfn);
- if (!get_page_unless_zero(page))
+ if (!__get_page_unless_zero(page))
continue;
/*
* We can skip free pages. And we can only deal with pages on
diff --git a/mm/vmscan.c b/mm/vmscan.c
index d036e59..4c167da 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1001,11 +1001,16 @@ int __isolate_lru_page(struct page *page, int mode, int file)
ret = -EBUSY;
- if (likely(get_page_unless_zero(page))) {
+ if (likely(__get_page_unless_zero(page))) {
/*
* Be careful not to clear PageLRU until after we're
* sure the page is not being freed elsewhere -- the
* page release code relies on it.
+ *
+ * We are able to use the __get_page_unless_zero() variant
+ * because we know the page can't get fully freed before we
+ * get the reference on it - as it is on LRU list and we
+ * hold the zone LRU lock.
*/
ClearPageLRU(page);
ret = 0;
--
1.7.3.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2011-08-19 7:49 UTC|newest]
Thread overview: 109+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-08-19 7:48 [PATCH 0/9] Use RCU to stabilize page counts Michel Lespinasse
2011-08-19 7:48 ` Michel Lespinasse
2011-08-19 7:48 ` [PATCH 1/9] mm: rcu read lock for getting reference on pages in migration_entry_wait() Michel Lespinasse
2011-08-19 7:48 ` Michel Lespinasse
2011-08-19 7:48 ` [PATCH 2/9] mm: avoid calling get_page_unless_zero() when charging cgroups Michel Lespinasse
2011-08-19 7:48 ` Michel Lespinasse
2011-08-19 7:48 ` [PATCH 3/9] mm: rcu read lock when getting from tail to head page Michel Lespinasse
2011-08-19 7:48 ` Michel Lespinasse
2011-08-19 7:48 ` [PATCH 4/9] mm: use get_page in deactivate_page() Michel Lespinasse
2011-08-19 7:48 ` Michel Lespinasse
2011-08-19 7:48 ` [PATCH 5/9] kvm: use get_page instead of get_page_unless_zero Michel Lespinasse
2011-08-19 7:48 ` Michel Lespinasse
2011-08-19 7:48 ` Michel Lespinasse [this message]
2011-08-19 7:48 ` [PATCH 6/9] mm: assert that get_page_unless_zero() callers hold the rcu lock Michel Lespinasse
2011-08-19 23:28 ` Andi Kleen
2011-08-19 23:28 ` Andi Kleen
2011-08-19 7:48 ` [PATCH 7/9] rcu: rcu_get_gp_cookie() / rcu_gp_cookie_elapsed() stand-ins Michel Lespinasse
2011-08-19 7:48 ` Michel Lespinasse
2011-08-19 7:48 ` [PATCH 8/9] mm: add API for setting a grace period cookie on compound pages Michel Lespinasse
2011-08-19 7:48 ` Michel Lespinasse
2011-08-19 7:48 ` [PATCH 9/9] mm: make sure tail page counts are stable before splitting THP pages Michel Lespinasse
2011-08-19 7:48 ` Michel Lespinasse
2011-08-19 7:53 ` [PATCH 0/9] Use RCU to stabilize page counts Michel Lespinasse
2011-08-19 7:53 ` Michel Lespinasse
2011-08-22 21:33 ` [PATCH] thp: tail page refcounting fix Andrea Arcangeli
2011-08-22 21:33 ` Andrea Arcangeli
2011-08-23 14:55 ` Andrea Arcangeli
2011-08-23 14:55 ` Andrea Arcangeli
2011-08-23 16:45 ` Minchan Kim
2011-08-23 16:45 ` Minchan Kim
2011-08-23 16:54 ` Andrea Arcangeli
2011-08-23 16:54 ` Andrea Arcangeli
2011-08-23 19:52 ` Michel Lespinasse
2011-08-23 19:52 ` Michel Lespinasse
2011-08-24 0:09 ` Andrea Arcangeli
2011-08-24 0:09 ` Andrea Arcangeli
2011-08-24 0:27 ` Andrea Arcangeli
2011-08-24 0:27 ` Andrea Arcangeli
2011-08-24 13:34 ` [PATCH] thp: tail page refcounting fix #2 Andrea Arcangeli
2011-08-24 13:34 ` Andrea Arcangeli
2011-08-26 6:24 ` Michel Lespinasse
2011-08-26 6:24 ` Michel Lespinasse
2011-08-26 16:10 ` Andrea Arcangeli
2011-08-26 16:10 ` Andrea Arcangeli
2011-08-26 18:54 ` [PATCH] thp: tail page refcounting fix #3 Andrea Arcangeli
2011-08-26 18:54 ` Andrea Arcangeli
2011-08-27 9:41 ` Michel Lespinasse
2011-08-27 9:41 ` Michel Lespinasse
2011-08-27 17:34 ` [PATCH] thp: tail page refcounting fix #4 Andrea Arcangeli
2011-08-27 17:34 ` Andrea Arcangeli
2011-08-29 4:20 ` Minchan Kim
2011-08-29 4:20 ` Minchan Kim
2011-09-01 15:24 ` [PATCH] thp: tail page refcounting fix #5 Andrea Arcangeli
2011-09-01 15:24 ` Andrea Arcangeli
2011-09-01 22:27 ` Michel Lespinasse
2011-09-01 22:27 ` Michel Lespinasse
2011-09-01 23:28 ` Andrew Morton
2011-09-01 23:28 ` Andrew Morton
2011-09-01 23:45 ` Andi Kleen
2011-09-01 23:45 ` Andi Kleen
2011-09-02 0:20 ` Andrea Arcangeli
2011-09-02 0:20 ` Andrea Arcangeli
2011-09-02 1:17 ` Andi Kleen
2011-09-02 1:17 ` Andi Kleen
2011-09-02 0:03 ` Andrew Morton
2011-09-02 0:03 ` Andrew Morton
2011-09-08 16:51 ` [PATCH] thp: tail page refcounting fix #6 Andrea Arcangeli
2011-09-08 16:51 ` Andrea Arcangeli
2011-09-23 15:57 ` Peter Zijlstra
2011-09-23 15:57 ` Peter Zijlstra
2011-09-30 13:58 ` Andrea Arcangeli
2011-09-30 13:58 ` Andrea Arcangeli
2011-10-16 20:37 ` thp: gup_fast ppc tail refcounting [was Re: [PATCH] thp: tail page refcounting fix #6] Andrea Arcangeli
2011-10-16 20:37 ` [PATCH 1/4] powerpc: remove superfluous PageTail checks on the pte gup_fast Andrea Arcangeli
2011-10-16 20:37 ` [PATCH 2/4] powerpc: get_hugepte() don't put_page() the wrong page Andrea Arcangeli
2011-10-16 20:37 ` [PATCH 3/4] powerpc: gup_hugepte() avoid to free the head page too many times Andrea Arcangeli
2011-10-16 20:37 ` [PATCH 4/4] powerpc: gup_hugepte() support THP based tail recounting Andrea Arcangeli
2011-10-16 20:40 ` thp: gup_fast ppc tail refcounting [was Re: [PATCH] thp: tail page refcounting fix #6] Andrea Arcangeli
2011-10-16 20:40 ` Andrea Arcangeli
2011-10-16 20:40 ` [PATCH 1/4] powerpc: remove superfluous PageTail checks on the pte gup_fast Andrea Arcangeli
2011-10-16 20:40 ` Andrea Arcangeli
2011-10-16 20:40 ` [PATCH 2/4] powerpc: get_hugepte() don't put_page() the wrong page Andrea Arcangeli
2011-10-16 20:40 ` Andrea Arcangeli
2011-10-16 20:40 ` [PATCH 3/4] powerpc: gup_hugepte() avoid to free the head page too many times Andrea Arcangeli
2011-10-16 20:40 ` Andrea Arcangeli
2011-10-16 20:40 ` [PATCH 4/4] powerpc: gup_hugepte() support THP based tail recounting Andrea Arcangeli
2011-10-16 20:40 ` Andrea Arcangeli
2011-10-17 14:41 ` thp: gup_fast s390/sparc tail refcounting [was Re: [PATCH] thp: tail page refcounting fix #6] Andrea Arcangeli
2011-10-17 14:41 ` Andrea Arcangeli
2011-10-17 14:41 ` [PATCH 1/3] s390: gup_huge_pmd() support THP tail recounting Andrea Arcangeli
2011-10-17 14:41 ` Andrea Arcangeli
2011-10-17 14:41 ` [PATCH 2/3] sparc: gup_pte_range() support THP based " Andrea Arcangeli
2011-10-17 14:41 ` Andrea Arcangeli
2011-10-17 22:44 ` David Miller
2011-10-17 22:44 ` David Miller
2011-10-17 14:41 ` [PATCH 3/3] thp: share get_huge_page_tail() Andrea Arcangeli
2011-10-17 14:41 ` Andrea Arcangeli
2011-10-17 21:32 ` fix two more s390/sparc gup_fast bugs Andrea Arcangeli
2011-10-17 21:32 ` Andrea Arcangeli
2011-10-17 21:32 ` [PATCH 1/2] s390: gup_huge_pmd() return 0 if pte changes Andrea Arcangeli
2011-10-17 21:32 ` Andrea Arcangeli
2011-10-17 21:32 ` [PATCH 2/2] powerpc: " Andrea Arcangeli
2011-10-17 21:32 ` Andrea Arcangeli
2011-08-29 22:40 ` [PATCH] thp: tail page refcounting fix #4 Michel Lespinasse
2011-08-29 22:40 ` Michel Lespinasse
2011-08-29 23:30 ` Andrea Arcangeli
2011-08-29 23:30 ` Andrea Arcangeli
2011-08-26 19:28 ` [PATCH] thp: tail page refcounting fix #2 Andrea Arcangeli
2011-08-26 19:28 ` Andrea Arcangeli
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1313740111-27446-7-git-send-email-walken@google.com \
--to=walken@google.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=hughd@google.com \
--cc=jweiner@redhat.com \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=minchan.kim@gmail.com \
--cc=riel@redhat.com \
--cc=shaohua.li@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.