All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Vladimir Davydov <vdavydov@parallels.com>
Cc: Rik van Riel <riel@redhat.com>, Hugh Dickins <hughd@google.com>,
	Christoph Lameter <cl@linux.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Minchan Kim <minchan@kernel.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@suse.cz>, Greg Thelen <gthelen@google.com>,
	Michel Lespinasse <walken@google.com>,
	David Rientjes <rientjes@google.com>,
	Pavel Emelyanov <xemul@parallels.com>,
	Cyrill Gorcunov <gorcunov@openvz.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [RFC] rmap: fix "race" between do_wp_page and shrink_active_list
Date: Mon, 11 May 2015 07:24:02 -0700	[thread overview]
Message-ID: <20150511142402.GJ6776@linux.vnet.ibm.com> (raw)
In-Reply-To: <1431330677-24476-1-git-send-email-vdavydov@parallels.com>

On Mon, May 11, 2015 at 10:51:17AM +0300, Vladimir Davydov wrote:
> Hi,
> 
> I've been arguing with Minchan for a while about whether store-tearing
> is possible while setting page->mapping in __page_set_anon_rmap and
> friends, see
> 
>   http://thread.gmane.org/gmane.linux.kernel.mm/131949/focus=132132
> 
> This patch is intended to draw attention to this discussion. It fixes a
> race that could happen if store-tearing were possible. The race is as
> follows.
> 
> In do_wp_page() we can call page_move_anon_rmap(), which sets
> page->mapping as follows:
> 
>         anon_vma = (void *) anon_vma + PAGE_MAPPING_ANON;
>         page->mapping = (struct address_space *) anon_vma;
> 
> The page in question may be on an LRU list, because nowhere in
> do_wp_page() we remove it from the list, neither do we take any LRU
> related locks. Although the page is locked, shrink_active_list() can
> still call page_referenced() on it concurrently, because the latter does
> not require an anonymous page to be locked.
> 
> If store tearing described in the thread were possible, we could face
> the following race resulting in kernel panic:
> 
>   CPU0                          CPU1
>   ----                          ----
>   do_wp_page                    shrink_active_list
>    lock_page                     page_referenced
>                                   PageAnon->yes, so skip trylock_page
>    page_move_anon_rmap
>     page->mapping = anon_vma
>                                   rmap_walk
>                                    PageAnon->no
>                                    rmap_walk_file
>                                     BUG
>     page->mapping += PAGE_MAPPING_ANON
> 
> This patch fixes this race by explicitly forbidding the compiler to
> split page->mapping store in __page_set_anon_rmap() and friends and load
> in PageAnon() with the aid of WRITE/READ_ONCE.
> 
> Personally, I don't believe that this can ever happen on any sane
> compiler, because such an "optimization" would only result in two stores
> vs one (note, anon_vma is not a constant), but since I can be mistaken I
> would like to hear from synchronization experts what they think about
> it.

An example "insane" compiler might notice that the value set cannot be
safely observed without multiple CPUs accessing that variable at the
same time.  A paper entitled "No Sane Compiler Would Optimize Atomics"
has some examples:

	http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4455.html

If this paper doesn't scare you, then you didn't read it carefully enough.
And yes, I did give the author a very hard time about the need to suppress
some of these optimizations in order to correctly compile old code, and
will continue to do so.  However, a READ_ONCE() would be a most excellent
and very cheap way to future-proof this code, and is highly recommended.

							Thanx, Paul

> Thanks,
> Vladimir
> ---
>  include/linux/page-flags.h |    3 ++-
>  mm/rmap.c                  |    6 +++---
>  2 files changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index 5e7c4f50a644..a529e0a35fe9 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -320,7 +320,8 @@ PAGEFLAG(Idle, idle)
> 
>  static inline int PageAnon(struct page *page)
>  {
> -	return ((unsigned long)page->mapping & PAGE_MAPPING_ANON) != 0;
> +	return ((unsigned long)READ_ONCE(page->mapping) &
> +		PAGE_MAPPING_ANON) != 0;
>  }
> 
>  #ifdef CONFIG_KSM
> diff --git a/mm/rmap.c b/mm/rmap.c
> index eca7416f55d7..aa60c63704e6 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -958,7 +958,7 @@ void page_move_anon_rmap(struct page *page,
>  	VM_BUG_ON_PAGE(page->index != linear_page_index(vma, address), page);
> 
>  	anon_vma = (void *) anon_vma + PAGE_MAPPING_ANON;
> -	page->mapping = (struct address_space *) anon_vma;
> +	WRITE_ONCE(page->mapping, (struct address_space *) anon_vma);
>  }
> 
>  /**
> @@ -987,7 +987,7 @@ static void __page_set_anon_rmap(struct page *page,
>  		anon_vma = anon_vma->root;
> 
>  	anon_vma = (void *) anon_vma + PAGE_MAPPING_ANON;
> -	page->mapping = (struct address_space *) anon_vma;
> +	WRITE_ONCE(page->mapping, (struct address_space *) anon_vma);
>  	page->index = linear_page_index(vma, address);
>  }
> 
> @@ -1579,7 +1579,7 @@ static void __hugepage_set_anon_rmap(struct page *page,
>  		anon_vma = anon_vma->root;
> 
>  	anon_vma = (void *) anon_vma + PAGE_MAPPING_ANON;
> -	page->mapping = (struct address_space *) anon_vma;
> +	WRITE_ONCE(page->mapping, (struct address_space *) anon_vma);
>  	page->index = linear_page_index(vma, address);
>  }
> 
> -- 
> 1.7.10.4
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Vladimir Davydov <vdavydov@parallels.com>
Cc: Rik van Riel <riel@redhat.com>, Hugh Dickins <hughd@google.com>,
	Christoph Lameter <cl@linux.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Minchan Kim <minchan@kernel.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@suse.cz>, Greg Thelen <gthelen@google.com>,
	Michel Lespinasse <walken@google.com>,
	David Rientjes <rientjes@google.com>,
	Pavel Emelyanov <xemul@parallels.com>,
	Cyrill Gorcunov <gorcunov@openvz.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [RFC] rmap: fix "race" between do_wp_page and shrink_active_list
Date: Mon, 11 May 2015 07:24:02 -0700	[thread overview]
Message-ID: <20150511142402.GJ6776@linux.vnet.ibm.com> (raw)
In-Reply-To: <1431330677-24476-1-git-send-email-vdavydov@parallels.com>

On Mon, May 11, 2015 at 10:51:17AM +0300, Vladimir Davydov wrote:
> Hi,
> 
> I've been arguing with Minchan for a while about whether store-tearing
> is possible while setting page->mapping in __page_set_anon_rmap and
> friends, see
> 
>   http://thread.gmane.org/gmane.linux.kernel.mm/131949/focus=132132
> 
> This patch is intended to draw attention to this discussion. It fixes a
> race that could happen if store-tearing were possible. The race is as
> follows.
> 
> In do_wp_page() we can call page_move_anon_rmap(), which sets
> page->mapping as follows:
> 
>         anon_vma = (void *) anon_vma + PAGE_MAPPING_ANON;
>         page->mapping = (struct address_space *) anon_vma;
> 
> The page in question may be on an LRU list, because nowhere in
> do_wp_page() we remove it from the list, neither do we take any LRU
> related locks. Although the page is locked, shrink_active_list() can
> still call page_referenced() on it concurrently, because the latter does
> not require an anonymous page to be locked.
> 
> If store tearing described in the thread were possible, we could face
> the following race resulting in kernel panic:
> 
>   CPU0                          CPU1
>   ----                          ----
>   do_wp_page                    shrink_active_list
>    lock_page                     page_referenced
>                                   PageAnon->yes, so skip trylock_page
>    page_move_anon_rmap
>     page->mapping = anon_vma
>                                   rmap_walk
>                                    PageAnon->no
>                                    rmap_walk_file
>                                     BUG
>     page->mapping += PAGE_MAPPING_ANON
> 
> This patch fixes this race by explicitly forbidding the compiler to
> split page->mapping store in __page_set_anon_rmap() and friends and load
> in PageAnon() with the aid of WRITE/READ_ONCE.
> 
> Personally, I don't believe that this can ever happen on any sane
> compiler, because such an "optimization" would only result in two stores
> vs one (note, anon_vma is not a constant), but since I can be mistaken I
> would like to hear from synchronization experts what they think about
> it.

An example "insane" compiler might notice that the value set cannot be
safely observed without multiple CPUs accessing that variable at the
same time.  A paper entitled "No Sane Compiler Would Optimize Atomics"
has some examples:

	http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4455.html

If this paper doesn't scare you, then you didn't read it carefully enough.
And yes, I did give the author a very hard time about the need to suppress
some of these optimizations in order to correctly compile old code, and
will continue to do so.  However, a READ_ONCE() would be a most excellent
and very cheap way to future-proof this code, and is highly recommended.

							Thanx, Paul

> Thanks,
> Vladimir
> ---
>  include/linux/page-flags.h |    3 ++-
>  mm/rmap.c                  |    6 +++---
>  2 files changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index 5e7c4f50a644..a529e0a35fe9 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -320,7 +320,8 @@ PAGEFLAG(Idle, idle)
> 
>  static inline int PageAnon(struct page *page)
>  {
> -	return ((unsigned long)page->mapping & PAGE_MAPPING_ANON) != 0;
> +	return ((unsigned long)READ_ONCE(page->mapping) &
> +		PAGE_MAPPING_ANON) != 0;
>  }
> 
>  #ifdef CONFIG_KSM
> diff --git a/mm/rmap.c b/mm/rmap.c
> index eca7416f55d7..aa60c63704e6 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -958,7 +958,7 @@ void page_move_anon_rmap(struct page *page,
>  	VM_BUG_ON_PAGE(page->index != linear_page_index(vma, address), page);
> 
>  	anon_vma = (void *) anon_vma + PAGE_MAPPING_ANON;
> -	page->mapping = (struct address_space *) anon_vma;
> +	WRITE_ONCE(page->mapping, (struct address_space *) anon_vma);
>  }
> 
>  /**
> @@ -987,7 +987,7 @@ static void __page_set_anon_rmap(struct page *page,
>  		anon_vma = anon_vma->root;
> 
>  	anon_vma = (void *) anon_vma + PAGE_MAPPING_ANON;
> -	page->mapping = (struct address_space *) anon_vma;
> +	WRITE_ONCE(page->mapping, (struct address_space *) anon_vma);
>  	page->index = linear_page_index(vma, address);
>  }
> 
> @@ -1579,7 +1579,7 @@ static void __hugepage_set_anon_rmap(struct page *page,
>  		anon_vma = anon_vma->root;
> 
>  	anon_vma = (void *) anon_vma + PAGE_MAPPING_ANON;
> -	page->mapping = (struct address_space *) anon_vma;
> +	WRITE_ONCE(page->mapping, (struct address_space *) anon_vma);
>  	page->index = linear_page_index(vma, address);
>  }
> 
> -- 
> 1.7.10.4
> 


  parent reply	other threads:[~2015-05-11 14:29 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-11  7:51 [RFC] rmap: fix "race" between do_wp_page and shrink_active_list Vladimir Davydov
2015-05-11  7:51 ` Vladimir Davydov
2015-05-11  8:59 ` yalin wang
2015-05-11  8:59   ` yalin wang
2015-05-12  8:34   ` Vladimir Davydov
2015-05-12  8:34     ` Vladimir Davydov
2015-05-17 12:44     ` yalin
2015-05-17 12:44       ` yalin
2015-05-11  9:36 ` Kirill A. Shutemov
2015-05-11  9:36   ` Kirill A. Shutemov
2015-05-12  9:27   ` Vladimir Davydov
2015-05-12  9:27     ` Vladimir Davydov
2015-05-11 14:24 ` Paul E. McKenney [this message]
2015-05-11 14:24   ` Paul E. McKenney
2015-05-12  9:31   ` Vladimir Davydov
2015-05-12  9:31     ` Vladimir Davydov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150511142402.GJ6776@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux.com \
    --cc=gorcunov@openvz.org \
    --cc=gthelen@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.cz \
    --cc=minchan@kernel.org \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    --cc=rientjes@google.com \
    --cc=vdavydov@parallels.com \
    --cc=walken@google.com \
    --cc=xemul@parallels.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.