All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nick Piggin <nickpiggin@yahoo.com.au>
To: Herbert Xu <herbert@gondor.apana.org.au>
Cc: benh@kernel.crashing.org, hugh@veritas.com, paulus@samba.org,
	anton@samba.org, torvalds@osdl.org, akpm@osdl.org,
	andrea@suse.de, linux-kernel@vger.kernel.org
Subject: Re: Possible memory ordering bug in page reclaim?
Date: Sat, 15 Oct 2005 23:35:50 +1000	[thread overview]
Message-ID: <435105B6.4040507@yahoo.com.au> (raw)
In-Reply-To: <E1EQkpc-0007FI-00@gondolin.me.apana.org.au>

[-- Attachment #1: Type: text/plain, Size: 891 bytes --]

Herbert Xu wrote:
> Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> 
>>Well yes, that's on the store side (1, above). However can't a CPU
>>still speculatively (eg. guess the branch) load the page->flags
>>cacheline which might be satisfied from memory before the page->count
>>cacheline loads? Ie. you can still have the correct write ordering
>>but have incorrect read ordering?
>>
>>Because neither PageDirty nor page_count is a barrier, and there is
>>no read barrier between them.
> 
> 
> Yes you're right.  A read barrier is required here.
> 
> I think Ben was actually agreeing with you.  He's just questioning
> whether the corresponding write barrier existed on CPU 1 (the answer
> to which is affirmative).
>  

Ah, that clears up my misunderstanding.

Yes I agree the write side is OK.

Thanks Ben and Herbert. I guess I should do a proper patch then.

-- 
SUSE Labs, Novell Inc.


[-- Attachment #2: mm-reclaim-memorder-fix.patch --]
[-- Type: text/plain, Size: 1560 bytes --]

In mm/vmscan.c, the page reclaim may have the following sequence 2
running concurrently with sequence 1 on another CPU:

1                                2
find_get_page();
write to page                    write_lock(tree_lock);
SetPageDirty();                  if (page_count != 2
put_page();                              || PageDirty())
                                     /* page dirty or busy */
				 else
				     /* free it */

The comment indicates that PageDirty must be checked *after* page_count
indicates there are no users of this page, which prevents the dirty bit
from being lost in the case that that sequence 2 might see the state of
PageDirty() *before* SetPageDirty() in 1, but page_count *after* put_page
in 1.

However, there is no read memory barrier there, and so nothing to stop a
CPU from loading page_count before PageDirty (ie. ->flags). Theoretically,
data corruption is possible.

Signed-off-by: Nick Piggin <npiggin@suse.de>

Index: linux-2.6/mm/vmscan.c
===================================================================
--- linux-2.6.orig/mm/vmscan.c
+++ linux-2.6/mm/vmscan.c
@@ -511,7 +511,12 @@ static int shrink_list(struct list_head 
 		 * PageDirty _after_ making sure that the page is freeable and
 		 * not in use by anybody. 	(pagecache + us == 2)
 		 */
-		if (page_count(page) != 2 || PageDirty(page)) {
+		if (page_count(page) != 2) {
+			write_unlock_irq(&mapping->tree_lock);
+			goto keep_locked;
+		}
+		smp_rmb();
+		if (PageDirty(page)) {
 			write_unlock_irq(&mapping->tree_lock);
 			goto keep_locked;
 		}

  reply	other threads:[~2005-10-15 13:35 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-10-15  3:28 Possible memory ordering bug in page reclaim? Nick Piggin
2005-10-15  6:17 ` Hugh Dickins
2005-10-15  7:43   ` Benjamin Herrenschmidt
2005-10-15  8:00     ` Herbert Xu
2005-10-15 16:57       ` Linus Torvalds
2005-10-15 19:29         ` David S. Miller
2005-10-15 22:17           ` Benjamin Herrenschmidt
2005-10-16  0:04         ` Nick Piggin
2005-10-15  8:59     ` Nick Piggin
2005-10-15 12:08       ` Herbert Xu
2005-10-15 13:35         ` Nick Piggin [this message]
2005-10-15 18:00         ` Andrea Arcangeli
2005-10-15 19:48           ` Herbert Xu
2005-10-15 20:07             ` Andrea Arcangeli
2005-10-15 23:07               ` David S. Miller
2005-10-16 19:36                 ` Ivan Kokshaysky
2005-10-17  4:29                   ` David S. Miller
2005-10-17  7:23                     ` Ivan Kokshaysky
2005-10-17 11:28                   ` Andrea Arcangeli
2005-10-15 22:16           ` Benjamin Herrenschmidt
2005-10-15 23:13             ` David S. Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=435105B6.4040507@yahoo.com.au \
    --to=nickpiggin@yahoo.com.au \
    --cc=akpm@osdl.org \
    --cc=andrea@suse.de \
    --cc=anton@samba.org \
    --cc=benh@kernel.crashing.org \
    --cc=herbert@gondor.apana.org.au \
    --cc=hugh@veritas.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=paulus@samba.org \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.