All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrea Arcangeli <andrea@suse.de>
To: Andrew Morton <akpm@osdl.org>
Cc: linux-kernel@vger.kernel.org, torvalds@osdl.org, mason@suse.com
Subject: Re: writepage fs corruption fixes
Date: Sat, 10 Jul 2004 06:59:20 +0200	[thread overview]
Message-ID: <20040710045920.GY20947@dualathlon.random> (raw)
In-Reply-To: <20040710010738.GX20947@dualathlon.random>

On Sat, Jul 10, 2004 at 03:07:38AM +0200, Andrea Arcangeli wrote:
> On Sat, Jul 10, 2004 at 02:16:00AM +0200, Andrea Arcangeli wrote:
> > I hope this time I'm done with it.
> 
> I'm not after more thinking it seems the below is not a bug.
> 
> I'm lost since I can't find bugs anymore in this function but I need to
> find something.

Ok, this is the last try (at least for this week ;), if this doesn't fix
anything too then I giveup and I will reasonably start to suspect/blame
on the compiler early next week ;).

I believe reading the i_size from memory multiple times can generate fs
corruption. The "offset" and the "end_index" were not coherent. this is
writepages and it runs w/o the i_sem, so the i_size can change from
under us anytime. If a parallel write happens while writepages run, the
i_size could advance from 4095 to 4100. With the current 2.6 code that
could translate in end_index = 0 and offset = 4. That's broken because
end_index and offset could be not coherent. Either end_index=1 and
offset =4, or end_index = 0 and offset = 4095. When they lose coherency
the memset can zeroout actual data. The below patch fixes that (it's at
least a theoretical bug).

I don't really expect this tiny race to fix the bug in practice after the
more serious bugs we covered yesterday didn't fix it (more likely the
compiler will get involved into the equation soon ;).

This is also an optimization for 32bit archs that needs special locking
to read 64bit i_size coherenty.

This also includes Chris's fix (which looks valid to me), and it
includes my yesterday's fix for the memory-pressure on the mempool bio
allocator with GFP_NOFS.

--- sles/fs/mpage.c.~1~	2004-07-09 23:48:33.233205496 +0200
+++ sles/fs/mpage.c	2004-07-10 06:32:24.784449776 +0200
@@ -404,6 +404,7 @@ mpage_writepage(struct bio *bio, struct 
 	struct block_device *boundary_bdev = NULL;
 	int length;
 	struct buffer_head map_bh;
+	loff_t i_size = i_size_read(inode);
 
 	if (page_has_buffers(page)) {
 		struct buffer_head *head = page_buffers(page);
@@ -460,7 +461,7 @@ mpage_writepage(struct bio *bio, struct 
 	 */
 	BUG_ON(!PageUptodate(page));
 	block_in_file = page->index << (PAGE_CACHE_SHIFT - blkbits);
-	last_block = (i_size_read(inode) - 1) >> blkbits;
+	last_block = (i_size - 1) >> blkbits;
 	map_bh.b_page = page;
 	for (page_block = 0; page_block < blocks_per_page; ) {
 
@@ -490,9 +491,11 @@ mpage_writepage(struct bio *bio, struct 
 
 	first_unmapped = page_block;
 
-	end_index = i_size_read(inode) >> PAGE_CACHE_SHIFT;
+page_is_mapped:
+
+	end_index = i_size >> PAGE_CACHE_SHIFT;
 	if (page->index >= end_index) {
-		unsigned offset = i_size_read(inode) & (PAGE_CACHE_SIZE - 1);
+		unsigned offset = i_size & (PAGE_CACHE_SIZE - 1);
 		char *kaddr;
 
 		if (page->index > end_index || !offset)
@@ -503,8 +506,6 @@ mpage_writepage(struct bio *bio, struct 
 		kunmap_atomic(kaddr, KM_USER0);
 	}
 
-page_is_mapped:
-
 	/*
 	 * This page will go to BIO.  Do we need to send this BIO off first?
 	 */
@@ -519,6 +520,12 @@ alloc_new:
 			goto confused;
 	}
 
+	length = first_unmapped << blkbits;
+	if (bio_add_page(bio, page, length, 0) < length) {
+		bio = mpage_bio_submit(WRITE, bio);
+		goto alloc_new;
+	}
+
 	/*
 	 * OK, we have our BIO, so we can now mark the buffers clean.  Make
 	 * sure to only clean buffers which we know we'll be writing.
@@ -539,12 +546,6 @@ alloc_new:
 			try_to_free_buffers(page);
 	}
 
-	length = first_unmapped << blkbits;
-	if (bio_add_page(bio, page, length, 0) < length) {
-		bio = mpage_bio_submit(WRITE, bio);
-		goto alloc_new;
-	}
-
 	BUG_ON(PageWriteback(page));
 	set_page_writeback(page);
 	unlock_page(page);

  parent reply	other threads:[~2004-07-10  4:59 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-07-09  4:01 writepage fs corruption fixes Andrea Arcangeli
2004-07-09  4:06 ` Andrea Arcangeli
2004-07-09  4:19   ` Andrea Arcangeli
2004-07-09  4:29 ` Andrew Morton
2004-07-09  4:42   ` Andrea Arcangeli
2004-07-09  4:56     ` Andrew Morton
2004-07-09 12:43       ` Chris Mason
2004-07-10  0:16       ` Andrea Arcangeli
2004-07-10  1:07         ` Andrea Arcangeli
2004-07-10  4:30           ` Andrew Morton
2004-07-10  4:59           ` Andrea Arcangeli [this message]
2004-07-10  5:56             ` Andrew Morton
2004-07-10  6:11               ` Andrea Arcangeli
2004-07-10  6:13                 ` Andrew Morton
2004-07-14 16:14                 ` Andrea Arcangeli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20040710045920.GY20947@dualathlon.random \
    --to=andrea@suse.de \
    --cc=akpm@osdl.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mason@suse.com \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.