linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] rmap: fix theoretical race between do_wp_page and shrink_active_list
@ 2015-05-12 10:18 Vladimir Davydov
  2015-05-12 10:48 ` Kirill A. Shutemov
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Vladimir Davydov @ 2015-05-12 10:18 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Paul E. McKenney, Kirill A. Shutemov,
	Rik van Riel, Hugh Dickins

As noted by Paul the compiler is free to store a temporary result in a
variable on stack, heap or global unless it is explicitly marked as
volatile, see:

  http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4455.html#sample-optimizations

This can result in a race between do_wp_page() and shrink_active_list()
as follows.

In do_wp_page() we can call page_move_anon_rmap(), which sets
page->mapping as follows:

  anon_vma = (void *) anon_vma + PAGE_MAPPING_ANON;
  page->mapping = (struct address_space *) anon_vma;

The page in question may be on an LRU list, because nowhere in
do_wp_page() we remove it from the list, neither do we take any LRU
related locks. Although the page is locked, shrink_active_list() can
still call page_referenced() on it concurrently, because the latter does
not require an anonymous page to be locked:

  CPU0                          CPU1
  ----                          ----
  do_wp_page                    shrink_active_list
   lock_page                     page_referenced
                                  PageAnon->yes, so skip trylock_page
   page_move_anon_rmap
    page->mapping = anon_vma
                                  rmap_walk
                                   PageAnon->no
                                   rmap_walk_file
                                    BUG
    page->mapping += PAGE_MAPPING_ANON

This patch fixes this race by explicitly forbidding the compiler to
split page->mapping store in page_move_anon_rmap() with the aid of
WRITE_ONCE.

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Rik van Riel <riel@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
---
Changes in v2:
 - do not add READ_ONCE to PageAnon and WRITE_ONCE to
   __page_set_anon_rmap and __hugepage_set_anon_rmap (Kirill)

 mm/rmap.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/rmap.c b/mm/rmap.c
index 24dd3f9fee27..8b18fd4227d1 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -950,7 +950,7 @@ void page_move_anon_rmap(struct page *page,
 	VM_BUG_ON_PAGE(page->index != linear_page_index(vma, address), page);
 
 	anon_vma = (void *) anon_vma + PAGE_MAPPING_ANON;
-	page->mapping = (struct address_space *) anon_vma;
+	WRITE_ONCE(page->mapping, (struct address_space *) anon_vma);
 }
 
 /**
-- 
1.7.10.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] rmap: fix theoretical race between do_wp_page and shrink_active_list
  2015-05-12 10:18 [PATCH v2] rmap: fix theoretical race between do_wp_page and shrink_active_list Vladimir Davydov
@ 2015-05-12 10:48 ` Kirill A. Shutemov
  2015-05-12 22:28 ` Andrew Morton
  2015-05-13  1:14 ` Minchan Kim
  2 siblings, 0 replies; 9+ messages in thread
From: Kirill A. Shutemov @ 2015-05-12 10:48 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: Andrew Morton, linux-mm, linux-kernel, Paul E. McKenney,
	Rik van Riel, Hugh Dickins

On Tue, May 12, 2015 at 01:18:39PM +0300, Vladimir Davydov wrote:
> As noted by Paul the compiler is free to store a temporary result in a
> variable on stack, heap or global unless it is explicitly marked as
> volatile, see:
> 
>   http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4455.html#sample-optimizations
> 
> This can result in a race between do_wp_page() and shrink_active_list()
> as follows.
> 
> In do_wp_page() we can call page_move_anon_rmap(), which sets
> page->mapping as follows:
> 
>   anon_vma = (void *) anon_vma + PAGE_MAPPING_ANON;
>   page->mapping = (struct address_space *) anon_vma;
> 
> The page in question may be on an LRU list, because nowhere in
> do_wp_page() we remove it from the list, neither do we take any LRU
> related locks. Although the page is locked, shrink_active_list() can
> still call page_referenced() on it concurrently, because the latter does
> not require an anonymous page to be locked:
> 
>   CPU0                          CPU1
>   ----                          ----
>   do_wp_page                    shrink_active_list
>    lock_page                     page_referenced
>                                   PageAnon->yes, so skip trylock_page
>    page_move_anon_rmap
>     page->mapping = anon_vma
>                                   rmap_walk
>                                    PageAnon->no
>                                    rmap_walk_file
>                                     BUG
>     page->mapping += PAGE_MAPPING_ANON
> 
> This patch fixes this race by explicitly forbidding the compiler to
> split page->mapping store in page_move_anon_rmap() with the aid of
> WRITE_ONCE.
> 
> Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Hugh Dickins <hughd@google.com>

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] rmap: fix theoretical race between do_wp_page and shrink_active_list
  2015-05-12 10:18 [PATCH v2] rmap: fix theoretical race between do_wp_page and shrink_active_list Vladimir Davydov
  2015-05-12 10:48 ` Kirill A. Shutemov
@ 2015-05-12 22:28 ` Andrew Morton
  2015-05-13  1:21   ` Minchan Kim
  2015-05-13  8:08   ` Vladimir Davydov
  2015-05-13  1:14 ` Minchan Kim
  2 siblings, 2 replies; 9+ messages in thread
From: Andrew Morton @ 2015-05-12 22:28 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: linux-mm, linux-kernel, Paul E. McKenney, Kirill A. Shutemov,
	Rik van Riel, Hugh Dickins

On Tue, 12 May 2015 13:18:39 +0300 Vladimir Davydov <vdavydov@parallels.com> wrote:

> As noted by Paul the compiler is free to store a temporary result in a
> variable on stack, heap or global unless it is explicitly marked as
> volatile, see:
> 
>   http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4455.html#sample-optimizations
> 
> This can result in a race between do_wp_page() and shrink_active_list()
> as follows.
> 
> In do_wp_page() we can call page_move_anon_rmap(), which sets
> page->mapping as follows:
> 
>   anon_vma = (void *) anon_vma + PAGE_MAPPING_ANON;
>   page->mapping = (struct address_space *) anon_vma;
> 
> The page in question may be on an LRU list, because nowhere in
> do_wp_page() we remove it from the list, neither do we take any LRU
> related locks. Although the page is locked, shrink_active_list() can
> still call page_referenced() on it concurrently, because the latter does
> not require an anonymous page to be locked:
> 
>   CPU0                          CPU1
>   ----                          ----
>   do_wp_page                    shrink_active_list
>    lock_page                     page_referenced
>                                   PageAnon->yes, so skip trylock_page
>    page_move_anon_rmap
>     page->mapping = anon_vma
>                                   rmap_walk
>                                    PageAnon->no
>                                    rmap_walk_file
>                                     BUG
>     page->mapping += PAGE_MAPPING_ANON
> 
> This patch fixes this race by explicitly forbidding the compiler to
> split page->mapping store in page_move_anon_rmap() with the aid of
> WRITE_ONCE.
> 
> ...
>
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -950,7 +950,7 @@ void page_move_anon_rmap(struct page *page,
>  	VM_BUG_ON_PAGE(page->index != linear_page_index(vma, address), page);
>  
>  	anon_vma = (void *) anon_vma + PAGE_MAPPING_ANON;
> -	page->mapping = (struct address_space *) anon_vma;
> +	WRITE_ONCE(page->mapping, (struct address_space *) anon_vma);

Please let's not put things like WRITE_ONCE() in there without
documenting them - otherwise it's terribly hard for readers to work out
why it was added.

How's this look?

--- a/mm/rmap.c~rmap-fix-theoretical-race-between-do_wp_page-and-shrink_active_list-fix
+++ a/mm/rmap.c
@@ -950,6 +950,11 @@ void page_move_anon_rmap(struct page *pa
 	VM_BUG_ON_PAGE(page->index != linear_page_index(vma, address), page);
 
 	anon_vma = (void *) anon_vma + PAGE_MAPPING_ANON;
+	/*
+	 * Ensure that anon_vma and the PAGE_MAPPING_ANON bit are written
+	 * simultaneously, so a concurrent reader (eg shrink_active_list) will
+	 * not see one without the other.
+	 */
 	WRITE_ONCE(page->mapping, (struct address_space *) anon_vma);
 }
 
_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] rmap: fix theoretical race between do_wp_page and shrink_active_list
  2015-05-12 10:18 [PATCH v2] rmap: fix theoretical race between do_wp_page and shrink_active_list Vladimir Davydov
  2015-05-12 10:48 ` Kirill A. Shutemov
  2015-05-12 22:28 ` Andrew Morton
@ 2015-05-13  1:14 ` Minchan Kim
  2 siblings, 0 replies; 9+ messages in thread
From: Minchan Kim @ 2015-05-13  1:14 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: Andrew Morton, linux-mm, linux-kernel, Paul E. McKenney,
	Kirill A. Shutemov, Rik van Riel, Hugh Dickins

On Tue, May 12, 2015 at 01:18:39PM +0300, Vladimir Davydov wrote:
> As noted by Paul the compiler is free to store a temporary result in a
> variable on stack, heap or global unless it is explicitly marked as
> volatile, see:
> 
>   http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4455.html#sample-optimizations
> 
> This can result in a race between do_wp_page() and shrink_active_list()
> as follows.
> 
> In do_wp_page() we can call page_move_anon_rmap(), which sets
> page->mapping as follows:
> 
>   anon_vma = (void *) anon_vma + PAGE_MAPPING_ANON;
>   page->mapping = (struct address_space *) anon_vma;
> 
> The page in question may be on an LRU list, because nowhere in
> do_wp_page() we remove it from the list, neither do we take any LRU
> related locks. Although the page is locked, shrink_active_list() can
> still call page_referenced() on it concurrently, because the latter does
> not require an anonymous page to be locked:
> 
>   CPU0                          CPU1
>   ----                          ----
>   do_wp_page                    shrink_active_list
>    lock_page                     page_referenced
>                                   PageAnon->yes, so skip trylock_page
>    page_move_anon_rmap
>     page->mapping = anon_vma
>                                   rmap_walk
>                                    PageAnon->no
>                                    rmap_walk_file
>                                     BUG
>     page->mapping += PAGE_MAPPING_ANON
> 
> This patch fixes this race by explicitly forbidding the compiler to
> split page->mapping store in page_move_anon_rmap() with the aid of
> WRITE_ONCE.
> 
> Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Hugh Dickins <hughd@google.com>
> ---

The paper says "This requires escape analysis: blah blah for this optimization
to be valid" So, I'm not sure it's the case but admit we couldn't guarantee
all of compiler optimization technique so I am in favor of the patch to make
sure future-proof with upcoming suprising compiler technique.

Another review point I had is whether lockless page in shrink_active_list
could be turn into PageKsm in the middle of page_referenced. IOW,

        page_referenced
                PageAnon && !PageKsm -> true so avoid try_lockpage
                <... amount of stall start >
                Other cpu makes the page into PageKsm
                <... amount of stall end >
                rmap_walk
                  PageKsm-> true
                  rmap_walk_ksm
                    -> bang because ksm expect the passed page was locked

However, we increased page->count in isolate_lru_page before passing
the page in page_referenced so KSM cannot make the page KsmPage so
it's safe.

Acked-by: Minchan Kim <minchan@kernel.org>

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] rmap: fix theoretical race between do_wp_page and shrink_active_list
  2015-05-12 22:28 ` Andrew Morton
@ 2015-05-13  1:21   ` Minchan Kim
  2015-05-13  8:08   ` Vladimir Davydov
  1 sibling, 0 replies; 9+ messages in thread
From: Minchan Kim @ 2015-05-13  1:21 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vladimir Davydov, linux-mm, linux-kernel, Paul E. McKenney,
	Kirill A. Shutemov, Rik van Riel, Hugh Dickins

Hello Andrew,

On Tue, May 12, 2015 at 03:28:40PM -0700, Andrew Morton wrote:
> On Tue, 12 May 2015 13:18:39 +0300 Vladimir Davydov <vdavydov@parallels.com> wrote:
> 
> > As noted by Paul the compiler is free to store a temporary result in a
> > variable on stack, heap or global unless it is explicitly marked as
> > volatile, see:
> > 
> >   http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4455.html#sample-optimizations
> > 
> > This can result in a race between do_wp_page() and shrink_active_list()
> > as follows.
> > 
> > In do_wp_page() we can call page_move_anon_rmap(), which sets
> > page->mapping as follows:
> > 
> >   anon_vma = (void *) anon_vma + PAGE_MAPPING_ANON;
> >   page->mapping = (struct address_space *) anon_vma;
> > 
> > The page in question may be on an LRU list, because nowhere in
> > do_wp_page() we remove it from the list, neither do we take any LRU
> > related locks. Although the page is locked, shrink_active_list() can
> > still call page_referenced() on it concurrently, because the latter does
> > not require an anonymous page to be locked:
> > 
> >   CPU0                          CPU1
> >   ----                          ----
> >   do_wp_page                    shrink_active_list
> >    lock_page                     page_referenced
> >                                   PageAnon->yes, so skip trylock_page
> >    page_move_anon_rmap
> >     page->mapping = anon_vma
> >                                   rmap_walk
> >                                    PageAnon->no
> >                                    rmap_walk_file
> >                                     BUG
> >     page->mapping += PAGE_MAPPING_ANON
> > 
> > This patch fixes this race by explicitly forbidding the compiler to
> > split page->mapping store in page_move_anon_rmap() with the aid of
> > WRITE_ONCE.
> > 
> > ...
> >
> > --- a/mm/rmap.c
> > +++ b/mm/rmap.c
> > @@ -950,7 +950,7 @@ void page_move_anon_rmap(struct page *page,
> >  	VM_BUG_ON_PAGE(page->index != linear_page_index(vma, address), page);
> >  
> >  	anon_vma = (void *) anon_vma + PAGE_MAPPING_ANON;
> > -	page->mapping = (struct address_space *) anon_vma;
> > +	WRITE_ONCE(page->mapping, (struct address_space *) anon_vma);
> 
> Please let's not put things like WRITE_ONCE() in there without
> documenting them - otherwise it's terribly hard for readers to work out
> why it was added.
> 
> How's this look?
> 
> --- a/mm/rmap.c~rmap-fix-theoretical-race-between-do_wp_page-and-shrink_active_list-fix
> +++ a/mm/rmap.c
> @@ -950,6 +950,11 @@ void page_move_anon_rmap(struct page *pa
>  	VM_BUG_ON_PAGE(page->index != linear_page_index(vma, address), page);
>  
>  	anon_vma = (void *) anon_vma + PAGE_MAPPING_ANON;
> +	/*
> +	 * Ensure that anon_vma and the PAGE_MAPPING_ANON bit are written
> +	 * simultaneously, so a concurrent reader (eg shrink_active_list) will

IMHO, rather than shrink_active_list, PageAnon in page_referenced is better to me.

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] rmap: fix theoretical race between do_wp_page and shrink_active_list
@ 2015-05-13  1:43 Minchan Kim
  2015-05-13  2:04 ` Rik van Riel
  0 siblings, 1 reply; 9+ messages in thread
From: Minchan Kim @ 2015-05-13  1:43 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: Andrew Morton, linux-mm, linux-kernel, Paul E. McKenney,
	Kirill A. Shutemov, Rik van Riel, Hugh Dickins

Hi, Rik

I'd like to bring up the issue in this thread although I already gave
my Acked-by.

Below issue causes by no PG_locked page in page_referenced while
page_move_anon_rmap depends on PG_locked to prevent race with rmap code.

So, although this patch fixes below one example, we still have a problem
in rmap.

If page_referenced holds PG_locked for all of pages unconditionally,
we don't need this patch and might remove READ_ONCE introduced by
80e148 and more than.

What do you think about?

On Tue, May 12, 2015 at 01:18:39PM +0300, Vladimir Davydov wrote:
> As noted by Paul the compiler is free to store a temporary result in a
> variable on stack, heap or global unless it is explicitly marked as
> volatile, see:
> 
>   http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4455.html#sample-optimizations
> 
> This can result in a race between do_wp_page() and shrink_active_list()
> as follows.
> 
> In do_wp_page() we can call page_move_anon_rmap(), which sets
> page->mapping as follows:
> 
>   anon_vma = (void *) anon_vma + PAGE_MAPPING_ANON;
>   page->mapping = (struct address_space *) anon_vma;
> 
> The page in question may be on an LRU list, because nowhere in
> do_wp_page() we remove it from the list, neither do we take any LRU
> related locks. Although the page is locked, shrink_active_list() can
> still call page_referenced() on it concurrently, because the latter does
> not require an anonymous page to be locked:
> 
>   CPU0                          CPU1
>   ----                          ----
>   do_wp_page                    shrink_active_list
>    lock_page                     page_referenced
>                                   PageAnon->yes, so skip trylock_page
>    page_move_anon_rmap
>     page->mapping = anon_vma
>                                   rmap_walk
>                                    PageAnon->no
>                                    rmap_walk_file
>                                     BUG
>     page->mapping += PAGE_MAPPING_ANON
> 
> This patch fixes this race by explicitly forbidding the compiler to
> split page->mapping store in page_move_anon_rmap() with the aid of
> WRITE_ONCE.
> 
> Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Hugh Dickins <hughd@google.com>
> ---
> Changes in v2:
>  - do not add READ_ONCE to PageAnon and WRITE_ONCE to
>    __page_set_anon_rmap and __hugepage_set_anon_rmap (Kirill)
> 
>  mm/rmap.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 24dd3f9fee27..8b18fd4227d1 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -950,7 +950,7 @@ void page_move_anon_rmap(struct page *page,
>  	VM_BUG_ON_PAGE(page->index != linear_page_index(vma, address), page);
>  
>  	anon_vma = (void *) anon_vma + PAGE_MAPPING_ANON;
> -	page->mapping = (struct address_space *) anon_vma;
> +	WRITE_ONCE(page->mapping, (struct address_space *) anon_vma);
>  }
>  
>  /**
> -- 
> 1.7.10.4
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] rmap: fix theoretical race between do_wp_page and shrink_active_list
  2015-05-13  1:43 Minchan Kim
@ 2015-05-13  2:04 ` Rik van Riel
  0 siblings, 0 replies; 9+ messages in thread
From: Rik van Riel @ 2015-05-13  2:04 UTC (permalink / raw)
  To: Minchan Kim, Vladimir Davydov
  Cc: Andrew Morton, linux-mm, linux-kernel, Paul E. McKenney,
	Kirill A. Shutemov, Hugh Dickins

On 05/12/2015 09:43 PM, Minchan Kim wrote:
> Hi, Rik
> 
> I'd like to bring up the issue in this thread although I already gave
> my Acked-by.
> 
> Below issue causes by no PG_locked page in page_referenced while
> page_move_anon_rmap depends on PG_locked to prevent race with rmap code.
> 
> So, although this patch fixes below one example, we still have a problem
> in rmap.
> 
> If page_referenced holds PG_locked for all of pages unconditionally,
> we don't need this patch and might remove READ_ONCE introduced by
> 80e148 and more than.
> 
> What do you think about?

Maybe the reclaim code and page_referenced are fine.

However, I have seen one real world bug report of a page->mapping
pointing to an anon_vma without the PAGE_MAPPING_ANON bit being
set.

This is a pretty hard to hit race, so I have only ever heard of
it happening once, and I do not remember the details of exactly
what code blew up trying to follow the page->mapping pointer in
the wrong way.

I wish I remember what needs this patch, but I have a rather
strong suspicion there is something that needs it...

Acked-by: Rik van Riel <riel@redhat.com>

> On Tue, May 12, 2015 at 01:18:39PM +0300, Vladimir Davydov wrote:
>> As noted by Paul the compiler is free to store a temporary result in a
>> variable on stack, heap or global unless it is explicitly marked as
>> volatile, see:
>>
>>   http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4455.html#sample-optimizations
>>
>> This can result in a race between do_wp_page() and shrink_active_list()
>> as follows.
>>
>> In do_wp_page() we can call page_move_anon_rmap(), which sets
>> page->mapping as follows:
>>
>>   anon_vma = (void *) anon_vma + PAGE_MAPPING_ANON;
>>   page->mapping = (struct address_space *) anon_vma;
>>
>> The page in question may be on an LRU list, because nowhere in
>> do_wp_page() we remove it from the list, neither do we take any LRU
>> related locks. Although the page is locked, shrink_active_list() can
>> still call page_referenced() on it concurrently, because the latter does
>> not require an anonymous page to be locked:
>>
>>   CPU0                          CPU1
>>   ----                          ----
>>   do_wp_page                    shrink_active_list
>>    lock_page                     page_referenced
>>                                   PageAnon->yes, so skip trylock_page
>>    page_move_anon_rmap
>>     page->mapping = anon_vma
>>                                   rmap_walk
>>                                    PageAnon->no
>>                                    rmap_walk_file
>>                                     BUG
>>     page->mapping += PAGE_MAPPING_ANON
>>
>> This patch fixes this race by explicitly forbidding the compiler to
>> split page->mapping store in page_move_anon_rmap() with the aid of
>> WRITE_ONCE.
>>
>> Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
>> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
>> Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
>> Cc: Rik van Riel <riel@redhat.com>
>> Cc: Hugh Dickins <hughd@google.com>
>> ---
>> Changes in v2:
>>  - do not add READ_ONCE to PageAnon and WRITE_ONCE to
>>    __page_set_anon_rmap and __hugepage_set_anon_rmap (Kirill)
>>
>>  mm/rmap.c |    2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/mm/rmap.c b/mm/rmap.c
>> index 24dd3f9fee27..8b18fd4227d1 100644
>> --- a/mm/rmap.c
>> +++ b/mm/rmap.c
>> @@ -950,7 +950,7 @@ void page_move_anon_rmap(struct page *page,
>>  	VM_BUG_ON_PAGE(page->index != linear_page_index(vma, address), page);
>>  
>>  	anon_vma = (void *) anon_vma + PAGE_MAPPING_ANON;
>> -	page->mapping = (struct address_space *) anon_vma;
>> +	WRITE_ONCE(page->mapping, (struct address_space *) anon_vma);
>>  }
>>  
>>  /**
>> -- 
>> 1.7.10.4
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 


-- 
All rights reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] rmap: fix theoretical race between do_wp_page and shrink_active_list
@ 2015-05-13  3:00 Minchan Kim
  0 siblings, 0 replies; 9+ messages in thread
From: Minchan Kim @ 2015-05-13  3:00 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Vladimir Davydov, Andrew Morton, linux-mm, linux-kernel,
	Paul E. McKenney, Kirill A. Shutemov, Hugh Dickins

Separate from Vladimir's thread.

I don't want to make a noise in there.

On Tue, May 12, 2015 at 10:04:12PM -0400, Rik van Riel wrote:
> On 05/12/2015 09:43 PM, Minchan Kim wrote:
> > Hi, Rik
> > 
> > I'd like to bring up the issue in this thread although I already gave
> > my Acked-by.
> > 
> > Below issue causes by no PG_locked page in page_referenced while
> > page_move_anon_rmap depends on PG_locked to prevent race with rmap code.
> > 
> > So, although this patch fixes below one example, we still have a problem
> > in rmap.
> > 
> > If page_referenced holds PG_locked for all of pages unconditionally,
> > we don't need this patch and might remove READ_ONCE introduced by
> > 80e148 and more than.
> > 
> > What do you think about?
> 
> Maybe the reclaim code and page_referenced are fine.
> 
> However, I have seen one real world bug report of a page->mapping
> pointing to an anon_vma without the PAGE_MAPPING_ANON bit being
> set.
> 
> This is a pretty hard to hit race, so I have only ever heard of
> it happening once, and I do not remember the details of exactly
> what code blew up trying to follow the page->mapping pointer in
> the wrong way.
> 
> I wish I remember what needs this patch, but I have a rather
> strong suspicion there is something that needs it...
> 
> Acked-by: Rik van Riel <riel@redhat.com>

It seems you misunderstood my point. My bad.
My point is you wrote down below comment above page_move_anon_rmap.

"Protected against the rmap code by the page lock"

but rmap code doesn't hold a page lock sometime so anon_vma
would be stale in rmap traverse.

But when I reviewed the code, worst case is rmap will look up
all of parent, siblings but it wouldn't affect integrity.

One thing I suspect is load-tearing when we get anon_vma from
the page->mapping but we used READ_ONCE for that so I couldn't
find any serious bug.

So is it okay to remove above wrong comment?

diff --git a/mm/memory.c b/mm/memory.c
index 22e037e..e35a782 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2329,7 +2329,6 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
 			/*
 			 * The page is all ours.  Move it to our anon_vma so
 			 * the rmap code will not search our parent or siblings.
-			 * Protected against the rmap code by the page lock.
 			 */
 			page_move_anon_rmap(old_page, vma, address);
 			unlock_page(old_page);
> 
> > On Tue, May 12, 2015 at 01:18:39PM +0300, Vladimir Davydov wrote:
> >> As noted by Paul the compiler is free to store a temporary result in a
> >> variable on stack, heap or global unless it is explicitly marked as
> >> volatile, see:
> >>
> >>   http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4455.html#sample-optimizations
> >>
> >> This can result in a race between do_wp_page() and shrink_active_list()
> >> as follows.
> >>
> >> In do_wp_page() we can call page_move_anon_rmap(), which sets
> >> page->mapping as follows:
> >>
> >>   anon_vma = (void *) anon_vma + PAGE_MAPPING_ANON;
> >>   page->mapping = (struct address_space *) anon_vma;
> >>
> >> The page in question may be on an LRU list, because nowhere in
> >> do_wp_page() we remove it from the list, neither do we take any LRU
> >> related locks. Although the page is locked, shrink_active_list() can
> >> still call page_referenced() on it concurrently, because the latter does
> >> not require an anonymous page to be locked:
> >>
> >>   CPU0                          CPU1
> >>   ----                          ----
> >>   do_wp_page                    shrink_active_list
> >>    lock_page                     page_referenced
> >>                                   PageAnon->yes, so skip trylock_page
> >>    page_move_anon_rmap
> >>     page->mapping = anon_vma
> >>                                   rmap_walk
> >>                                    PageAnon->no
> >>                                    rmap_walk_file
> >>                                     BUG
> >>     page->mapping += PAGE_MAPPING_ANON
> >>
> >> This patch fixes this race by explicitly forbidding the compiler to
> >> split page->mapping store in page_move_anon_rmap() with the aid of
> >> WRITE_ONCE.
> >>
> >> Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
> >> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> >> Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
> >> Cc: Rik van Riel <riel@redhat.com>
> >> Cc: Hugh Dickins <hughd@google.com>
> >> ---
> >> Changes in v2:
> >>  - do not add READ_ONCE to PageAnon and WRITE_ONCE to
> >>    __page_set_anon_rmap and __hugepage_set_anon_rmap (Kirill)
> >>
> >>  mm/rmap.c |    2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/mm/rmap.c b/mm/rmap.c
> >> index 24dd3f9fee27..8b18fd4227d1 100644
> >> --- a/mm/rmap.c
> >> +++ b/mm/rmap.c
> >> @@ -950,7 +950,7 @@ void page_move_anon_rmap(struct page *page,
> >>  	VM_BUG_ON_PAGE(page->index != linear_page_index(vma, address), page);
> >>  
> >>  	anon_vma = (void *) anon_vma + PAGE_MAPPING_ANON;
> >> -	page->mapping = (struct address_space *) anon_vma;
> >> +	WRITE_ONCE(page->mapping, (struct address_space *) anon_vma);
> >>  }
> >>  
> >>  /**
> >> -- 
> >> 1.7.10.4
> >>
> >> --
> >> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> >> the body to majordomo@kvack.org.  For more info on Linux MM,
> >> see: http://www.linux-mm.org/ .
> >> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> > 
> 
> 
> -- 
> All rights reversed

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] rmap: fix theoretical race between do_wp_page and shrink_active_list
  2015-05-12 22:28 ` Andrew Morton
  2015-05-13  1:21   ` Minchan Kim
@ 2015-05-13  8:08   ` Vladimir Davydov
  1 sibling, 0 replies; 9+ messages in thread
From: Vladimir Davydov @ 2015-05-13  8:08 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Paul E. McKenney, Kirill A. Shutemov,
	Rik van Riel, Hugh Dickins

On Tue, May 12, 2015 at 03:28:40PM -0700, Andrew Morton wrote:
> Please let's not put things like WRITE_ONCE() in there without
> documenting them - otherwise it's terribly hard for readers to work out
> why it was added.
> 
> How's this look?
> 
> --- a/mm/rmap.c~rmap-fix-theoretical-race-between-do_wp_page-and-shrink_active_list-fix
> +++ a/mm/rmap.c
> @@ -950,6 +950,11 @@ void page_move_anon_rmap(struct page *pa
>  	VM_BUG_ON_PAGE(page->index != linear_page_index(vma, address), page);
>  
>  	anon_vma = (void *) anon_vma + PAGE_MAPPING_ANON;
> +	/*
> +	 * Ensure that anon_vma and the PAGE_MAPPING_ANON bit are written
> +	 * simultaneously, so a concurrent reader (eg shrink_active_list) will
> +	 * not see one without the other.
> +	 */
>  	WRITE_ONCE(page->mapping, (struct address_space *) anon_vma);
>  }

Looks good to me.

Thanks,
Vladimir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2015-05-13  8:08 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-05-12 10:18 [PATCH v2] rmap: fix theoretical race between do_wp_page and shrink_active_list Vladimir Davydov
2015-05-12 10:48 ` Kirill A. Shutemov
2015-05-12 22:28 ` Andrew Morton
2015-05-13  1:21   ` Minchan Kim
2015-05-13  8:08   ` Vladimir Davydov
2015-05-13  1:14 ` Minchan Kim
  -- strict thread matches above, loose matches on Subject: below --
2015-05-13  1:43 Minchan Kim
2015-05-13  2:04 ` Rik van Riel
2015-05-13  3:00 Minchan Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).