public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Nick Piggin <nickpiggin@yahoo.com.au>
To: Herbert Xu <herbert@gondor.apana.org.au>
Cc: benh@kernel.crashing.org, hugh@veritas.com, paulus@samba.org,
	anton@samba.org, torvalds@osdl.org, akpm@osdl.org,
	andrea@suse.de, linux-kernel@vger.kernel.org
Subject: Re: Possible memory ordering bug in page reclaim?
Date: Sat, 15 Oct 2005 23:35:50 +1000	[thread overview]
Message-ID: <435105B6.4040507@yahoo.com.au> (raw)
In-Reply-To: <E1EQkpc-0007FI-00@gondolin.me.apana.org.au>

[-- Attachment #1: Type: text/plain, Size: 891 bytes --]

Herbert Xu wrote:
> Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> 
>>Well yes, that's on the store side (1, above). However can't a CPU
>>still speculatively (eg. guess the branch) load the page->flags
>>cacheline which might be satisfied from memory before the page->count
>>cacheline loads? Ie. you can still have the correct write ordering
>>but have incorrect read ordering?
>>
>>Because neither PageDirty nor page_count is a barrier, and there is
>>no read barrier between them.
> 
> 
> Yes you're right.  A read barrier is required here.
> 
> I think Ben was actually agreeing with you.  He's just questioning
> whether the corresponding write barrier existed on CPU 1 (the answer
> to which is affirmative).
>  

Ah, that clears up my misunderstanding.

Yes I agree the write side is OK.

Thanks Ben and Herbert. I guess I should do a proper patch then.

-- 
SUSE Labs, Novell Inc.


[-- Attachment #2: mm-reclaim-memorder-fix.patch --]
[-- Type: text/plain, Size: 1560 bytes --]

In mm/vmscan.c, the page reclaim may have the following sequence 2
running concurrently with sequence 1 on another CPU:

1                                2
find_get_page();
write to page                    write_lock(tree_lock);
SetPageDirty();                  if (page_count != 2
put_page();                              || PageDirty())
                                     /* page dirty or busy */
				 else
				     /* free it */

The comment indicates that PageDirty must be checked *after* page_count
indicates there are no users of this page, which prevents the dirty bit
from being lost in the case that that sequence 2 might see the state of
PageDirty() *before* SetPageDirty() in 1, but page_count *after* put_page
in 1.

However, there is no read memory barrier there, and so nothing to stop a
CPU from loading page_count before PageDirty (ie. ->flags). Theoretically,
data corruption is possible.

Signed-off-by: Nick Piggin <npiggin@suse.de>

Index: linux-2.6/mm/vmscan.c
===================================================================
--- linux-2.6.orig/mm/vmscan.c
+++ linux-2.6/mm/vmscan.c
@@ -511,7 +511,12 @@ static int shrink_list(struct list_head 
 		 * PageDirty _after_ making sure that the page is freeable and
 		 * not in use by anybody. 	(pagecache + us == 2)
 		 */
-		if (page_count(page) != 2 || PageDirty(page)) {
+		if (page_count(page) != 2) {
+			write_unlock_irq(&mapping->tree_lock);
+			goto keep_locked;
+		}
+		smp_rmb();
+		if (PageDirty(page)) {
 			write_unlock_irq(&mapping->tree_lock);
 			goto keep_locked;
 		}

  reply	other threads:[~2005-10-15 13:35 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-10-15  3:28 Possible memory ordering bug in page reclaim? Nick Piggin
2005-10-15  6:17 ` Hugh Dickins
2005-10-15  7:43   ` Benjamin Herrenschmidt
2005-10-15  8:00     ` Herbert Xu
2005-10-15 16:57       ` Linus Torvalds
2005-10-15 19:29         ` David S. Miller
2005-10-15 22:17           ` Benjamin Herrenschmidt
2005-10-16  0:04         ` Nick Piggin
2005-10-15  8:59     ` Nick Piggin
2005-10-15 12:08       ` Herbert Xu
2005-10-15 13:35         ` Nick Piggin [this message]
2005-10-15 18:00         ` Andrea Arcangeli
2005-10-15 19:48           ` Herbert Xu
2005-10-15 20:07             ` Andrea Arcangeli
2005-10-15 23:07               ` David S. Miller
2005-10-16 19:36                 ` Ivan Kokshaysky
2005-10-17  4:29                   ` David S. Miller
2005-10-17  7:23                     ` Ivan Kokshaysky
2005-10-17 11:28                   ` Andrea Arcangeli
2005-10-15 22:16           ` Benjamin Herrenschmidt
2005-10-15 23:13             ` David S. Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=435105B6.4040507@yahoo.com.au \
    --to=nickpiggin@yahoo.com.au \
    --cc=akpm@osdl.org \
    --cc=andrea@suse.de \
    --cc=anton@samba.org \
    --cc=benh@kernel.crashing.org \
    --cc=herbert@gondor.apana.org.au \
    --cc=hugh@veritas.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=paulus@samba.org \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox