* writepages drops bh on not uptodate page
@ 2004-07-28 4:51 Andrea Arcangeli
2004-07-28 5:08 ` Andrew Morton
2004-07-28 13:03 ` Chris Mason
0 siblings, 2 replies; 4+ messages in thread
From: Andrea Arcangeli @ 2004-07-28 4:51 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel
Hi Andrew,
I think I understood why some ext2 fs corruption still happens even
after the last i_size fix.
what happened I believe is that the writepages layer got a not a fully
uptodate page (in turn with bh mapped on top of it), and then right
before unlocking the page and entering the writeback mode, it freed all
the bh. Without bh a not uptodate page will trigger a full readpage from
disk, that overwrites the pagecache before the multi-bio gets submitted,
generating fs corruption.
I believe the below patch should fix it (untested) against kernel CVS.
The testcases developed by Kurt showed the pagecache being overwritten
with on-disk data at block offsets, and Chris as well was wondering
about races between wait_on_page_writeback and readpage. the below fix
just explains everything we've seen since not-fully-uptodate pages must
have always bh on them and the below patch enforces just that invariant,
and it should fix our pagecache-overwritten-by-disk-data problem.
Signed-off-by: Andrea Arcangeli <andrea@suse.de>
Index: writepages-bh-race/fs/mpage.c
===================================================================
RCS file: /home/andrea/crypto/cvs/linux-2.5/fs/mpage.c,v
retrieving revision 1.55
diff -u -p -r1.55 mpage.c
--- writepages-bh-race/fs/mpage.c 11 Jul 2004 16:38:19 -0000 1.55
+++ writepages-bh-race/fs/mpage.c 28 Jul 2004 04:39:13 -0000
@@ -553,7 +553,12 @@ alloc_new:
bh = bh->b_this_page;
} while (bh != head);
- if (buffer_heads_over_limit)
+ /*
+ * we cannot drop the bh if the page is not uptodate
+ * or a concurrent readpage would fail to serialize with the bh
+ * and it would read from disk before we reach the platter.
+ */
+ if (buffer_heads_over_limit && PageUptodate(page))
try_to_free_buffers(page);
}
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: writepages drops bh on not uptodate page
2004-07-28 4:51 writepages drops bh on not uptodate page Andrea Arcangeli
@ 2004-07-28 5:08 ` Andrew Morton
2004-07-28 13:03 ` Chris Mason
1 sibling, 0 replies; 4+ messages in thread
From: Andrew Morton @ 2004-07-28 5:08 UTC (permalink / raw)
To: Andrea Arcangeli; +Cc: linux-kernel
Andrea Arcangeli <andrea@suse.de> wrote:
>
> the below fix
> just explains everything we've seen since not-fully-uptodate pages must
> have always bh on them and the below patch enforces just that invariant,
> and it should fix our pagecache-overwritten-by-disk-data problem.
Yes, I agree. Thanks again for that.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: writepages drops bh on not uptodate page
2004-07-28 4:51 writepages drops bh on not uptodate page Andrea Arcangeli
2004-07-28 5:08 ` Andrew Morton
@ 2004-07-28 13:03 ` Chris Mason
2004-07-28 13:46 ` Andrea Arcangeli
1 sibling, 1 reply; 4+ messages in thread
From: Chris Mason @ 2004-07-28 13:03 UTC (permalink / raw)
To: Andrea Arcangeli; +Cc: Andrew Morton, linux-kernel
On Wed, 2004-07-28 at 00:51, Andrea Arcangeli wrote:
> Hi Andrew,
>
> I think I understood why some ext2 fs corruption still happens even
> after the last i_size fix.
>
> what happened I believe is that the writepages layer got a not a fully
> uptodate page (in turn with bh mapped on top of it), and then right
> before unlocking the page and entering the writeback mode, it freed all
> the bh.
Ahhhh, this really explains it, thanks andrea. I agree your fix should
solve things, but I'm wondering if we shouldn't make readpage[s] do a
wait_on_page_writeback(). That might do a better job of protecting us
from future variations of this problem.
-chris
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: writepages drops bh on not uptodate page
2004-07-28 13:03 ` Chris Mason
@ 2004-07-28 13:46 ` Andrea Arcangeli
0 siblings, 0 replies; 4+ messages in thread
From: Andrea Arcangeli @ 2004-07-28 13:46 UTC (permalink / raw)
To: Chris Mason; +Cc: Andrew Morton, linux-kernel
On Wed, Jul 28, 2004 at 09:03:39AM -0400, Chris Mason wrote:
> Ahhhh, this really explains it, thanks andrea. I agree your fix should
> solve things, but I'm wondering if we shouldn't make readpage[s] do a
> wait_on_page_writeback(). That might do a better job of protecting us
> from future variations of this problem.
the invariant is that not-fully-uptodate pages needs the bh on them
while they're under writeback. As far as this holds true, there seems
not to be other issues since readpage checks the bh, and readpages don't
need to check for anything since it only does I/O on newly allocated
pages (which cannot be under writeback). try_to_free_buffers doesn't
allow the bh to go away under writeback.
So the other approach would be to wait_on_page_writeback before a
->readpage (not ->readpages), if we're not in a add_to_page_cache case
(where we know the page cannot be under writeback). But that looks less
efficient and we don't need that as long as we don't break the above
invariant. Though I certainly agree that adding the
wait_on_page_writeback in the highlevel code that goes to call readpage
would have fixed the bug too.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2004-07-28 13:46 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-07-28 4:51 writepages drops bh on not uptodate page Andrea Arcangeli
2004-07-28 5:08 ` Andrew Morton
2004-07-28 13:03 ` Chris Mason
2004-07-28 13:46 ` Andrea Arcangeli
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.