From: Nick Piggin <nickpiggin@yahoo.com.au>
To: Herbert Xu <herbert@gondor.apana.org.au>
Cc: benh@kernel.crashing.org, hugh@veritas.com, paulus@samba.org,
anton@samba.org, torvalds@osdl.org, akpm@osdl.org,
andrea@suse.de, linux-kernel@vger.kernel.org
Subject: Re: Possible memory ordering bug in page reclaim?
Date: Sat, 15 Oct 2005 23:35:50 +1000 [thread overview]
Message-ID: <435105B6.4040507@yahoo.com.au> (raw)
In-Reply-To: <E1EQkpc-0007FI-00@gondolin.me.apana.org.au>
[-- Attachment #1: Type: text/plain, Size: 891 bytes --]
Herbert Xu wrote:
> Nick Piggin <nickpiggin@yahoo.com.au> wrote:
>
>>Well yes, that's on the store side (1, above). However can't a CPU
>>still speculatively (eg. guess the branch) load the page->flags
>>cacheline which might be satisfied from memory before the page->count
>>cacheline loads? Ie. you can still have the correct write ordering
>>but have incorrect read ordering?
>>
>>Because neither PageDirty nor page_count is a barrier, and there is
>>no read barrier between them.
>
>
> Yes you're right. A read barrier is required here.
>
> I think Ben was actually agreeing with you. He's just questioning
> whether the corresponding write barrier existed on CPU 1 (the answer
> to which is affirmative).
>
Ah, that clears up my misunderstanding.
Yes I agree the write side is OK.
Thanks Ben and Herbert. I guess I should do a proper patch then.
--
SUSE Labs, Novell Inc.
[-- Attachment #2: mm-reclaim-memorder-fix.patch --]
[-- Type: text/plain, Size: 1560 bytes --]
In mm/vmscan.c, the page reclaim may have the following sequence 2
running concurrently with sequence 1 on another CPU:
1 2
find_get_page();
write to page write_lock(tree_lock);
SetPageDirty(); if (page_count != 2
put_page(); || PageDirty())
/* page dirty or busy */
else
/* free it */
The comment indicates that PageDirty must be checked *after* page_count
indicates there are no users of this page, which prevents the dirty bit
from being lost in the case that that sequence 2 might see the state of
PageDirty() *before* SetPageDirty() in 1, but page_count *after* put_page
in 1.
However, there is no read memory barrier there, and so nothing to stop a
CPU from loading page_count before PageDirty (ie. ->flags). Theoretically,
data corruption is possible.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Index: linux-2.6/mm/vmscan.c
===================================================================
--- linux-2.6.orig/mm/vmscan.c
+++ linux-2.6/mm/vmscan.c
@@ -511,7 +511,12 @@ static int shrink_list(struct list_head
* PageDirty _after_ making sure that the page is freeable and
* not in use by anybody. (pagecache + us == 2)
*/
- if (page_count(page) != 2 || PageDirty(page)) {
+ if (page_count(page) != 2) {
+ write_unlock_irq(&mapping->tree_lock);
+ goto keep_locked;
+ }
+ smp_rmb();
+ if (PageDirty(page)) {
write_unlock_irq(&mapping->tree_lock);
goto keep_locked;
}
next prev parent reply other threads:[~2005-10-15 13:35 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-10-15 3:28 Possible memory ordering bug in page reclaim? Nick Piggin
2005-10-15 6:17 ` Hugh Dickins
2005-10-15 7:43 ` Benjamin Herrenschmidt
2005-10-15 8:00 ` Herbert Xu
2005-10-15 16:57 ` Linus Torvalds
2005-10-15 19:29 ` David S. Miller
2005-10-15 22:17 ` Benjamin Herrenschmidt
2005-10-16 0:04 ` Nick Piggin
2005-10-15 8:59 ` Nick Piggin
2005-10-15 12:08 ` Herbert Xu
2005-10-15 13:35 ` Nick Piggin [this message]
2005-10-15 18:00 ` Andrea Arcangeli
2005-10-15 19:48 ` Herbert Xu
2005-10-15 20:07 ` Andrea Arcangeli
2005-10-15 23:07 ` David S. Miller
2005-10-16 19:36 ` Ivan Kokshaysky
2005-10-17 4:29 ` David S. Miller
2005-10-17 7:23 ` Ivan Kokshaysky
2005-10-17 11:28 ` Andrea Arcangeli
2005-10-15 22:16 ` Benjamin Herrenschmidt
2005-10-15 23:13 ` David S. Miller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=435105B6.4040507@yahoo.com.au \
--to=nickpiggin@yahoo.com.au \
--cc=akpm@osdl.org \
--cc=andrea@suse.de \
--cc=anton@samba.org \
--cc=benh@kernel.crashing.org \
--cc=herbert@gondor.apana.org.au \
--cc=hugh@veritas.com \
--cc=linux-kernel@vger.kernel.org \
--cc=paulus@samba.org \
--cc=torvalds@osdl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox