Re: [patch] mm: fix race in COW logic

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Peter Zijlstra <peterz@infradead.org>
To: Nick Piggin <npiggin@suse.de>
Cc: Hugh Dickins <hugh@veritas.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Linux Memory Management List <linux-mm@kvack.org>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [patch] mm: fix race in COW logic
Date: Fri, 27 Jun 2008 11:19:26 +0200	[thread overview]
Message-ID: <1214558366.2801.26.camel@twins.programming.kicks-ass.net> (raw)
In-Reply-To: <20080623123030.GB26555@wotan.suse.de>

On Mon, 2008-06-23 at 14:30 +0200, Nick Piggin wrote:
> On Mon, Jun 23, 2008 at 02:18:31PM +0200, Nick Piggin wrote:
> > On Mon, Jun 23, 2008 at 11:04:31AM +0100, Hugh Dickins wrote:
> > > moving the page_remove_rmap down was to be fully effective, it needed
> > > to move through a suitable barrier; it hadn't occurred to me that it
> > > was carrying the suitable barrier with it.  But if that is indeed
> > > correct, I think it would be better to rely upon that, than resort
> > > to more difficult arguments.
> > 
> > No I actually think you make a good point, and I'll resubmit the
> > patch with a replacement comment to say we've got the ordering
> > covered if nothing else then by the atomic op in rmap.
> 
> OK, this is a new comment. I don't actually know if it is any good.
> It is hard to be coherent if you write these things in English.
> Maybe it is best to illustrate with the interleaving diagram in the
> changelog?
> 
> --
> There is a race in the COW logic. It contains a shortcut to avoid the
> COW and reuse the page if we have the sole reference on the page, however it
> is possible to have two racing do_wp_page()ers with one causing the other to
> mistakenly believe it is safe to take the shortcut when it is not. This could
> lead to data corruption.
> 
> Process 1 and process2 each have a wp pte of the same anon page (ie. one
> forked the other). The page's mapcount is 2. Then they both attempt to write
> to it around the same time...
> 
>   proc1				proc2 thr1			proc2 thr2
>   CPU0				CPU1				CPU3
>   do_wp_page()			do_wp_page()
> 				 trylock_page()
> 				  can_share_swap_page()
> 				   load page mapcount (==2)
> 				  reuse = 0
> 				 pte unlock
> 				 copy page to new_page
> 				 pte lock
> 				 page_remove_rmap(page);
>    trylock_page()	
>     can_share_swap_page()
>      load page mapcount (==1)
>     reuse = 1
>    ptep_set_access_flags (allow W)
> 
>   write private key into page
> 								read from page
> 				ptep_clear_flush()
> 				set_pte_at(pte of new_page)
> 
> 
> Fix this by moving the page_remove_rmap of the old page after the pte clear
> and flush. Potentially the entire branch could be moved down here, but in
> order to stay consistent, I won't (should probably move all the *_mm_counter
> stuff with one patch).

Since I bothered to read all the way through this thread, I might as
well provide an ack,..

Acked-by: Peter Zijlstra <peterz@infradead.org>

> Signed-off-by: Nick Piggin <npiggin@suse.de>
> ---
> Index: linux-2.6/mm/memory.c
> ===================================================================
> --- linux-2.6.orig/mm/memory.c
> +++ linux-2.6/mm/memory.c
> @@ -1766,7 +1766,6 @@ gotten:
>  	page_table = pte_offset_map_lock(mm, pmd, address, &ptl);
>  	if (likely(pte_same(*page_table, orig_pte))) {
>  		if (old_page) {
> -			page_remove_rmap(old_page, vma);
>  			if (!PageAnon(old_page)) {
>  				dec_mm_counter(mm, file_rss);
>  				inc_mm_counter(mm, anon_rss);
> @@ -1788,6 +1787,32 @@ gotten:
>  		lru_cache_add_active(new_page);
>  		page_add_new_anon_rmap(new_page, vma, address);
>  
> +		if (old_page) {
> +			/*
> +			 * Only after switching the pte to the new page may
> +			 * we remove the mapcount here. Otherwise another
> +			 * process may come and find the rmap count decremented
> +			 * before the pte is switched to the new page, and
> +			 * "reuse" the old page writing into it while our pte
> +			 * here still points into it and can be read by other
> +			 * threads.
> +			 *
> +			 * The critical issue is to order this
> +			 * page_remove_rmap with the ptp_clear_flush above.
> +			 * Those stores are ordered by (if nothing else,)
> +			 * the barrier present in the atomic_add_negative
> +			 * in page_remove_rmap.
> +			 *
> +			 * Then the TLB flush in ptep_clear_flush ensures that
> +			 * no process can access the old page before the
> +			 * decremented mapcount is visible. And the old page
> +			 * cannot be reused until after the decremented
> +			 * mapcount is visible. So transitively, TLBs to
> +			 * old page will be flushed before it can be reused.
> +			 */
> +			page_remove_rmap(old_page, vma);
> +		}
> +
>  		/* Free the old page.. */
>  		new_page = old_page;
>  		ret |= VM_FAULT_WRITE;
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2008-06-27  9:19 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-06-22 15:30 [patch] mm: fix race in COW logic Nick Piggin
2008-06-22 17:11 ` Hugh Dickins
2008-06-22 17:35   ` Linus Torvalds
2008-06-22 18:10     ` Hugh Dickins
2008-06-22 18:18       ` Linus Torvalds
2008-06-23  1:49       ` Nick Piggin
2008-06-23 10:04         ` Hugh Dickins
2008-06-23 12:18           ` Nick Piggin
2008-06-23 12:30             ` Nick Piggin
2008-06-23 15:39               ` Hugh Dickins
2008-06-27  9:19               ` Peter Zijlstra [this message]
2008-06-27  9:13             ` Peter Zijlstra
2008-06-23  1:52     ` Nick Piggin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1214558366.2801.26.camel@twins.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=akpm@linux-foundation.org \
    --cc=hugh@veritas.com \
    --cc=linux-mm@kvack.org \
    --cc=npiggin@suse.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.