linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: linux-fsdevel@vger.kernel.org
Cc: linux-mm@vger.kernel.org, npiggin@suse.de, Jan Kara <jack@suse.cz>
Subject: [PATCH 2/5] fs: Don't zero out page on writepage()
Date: Tue, 11 Aug 2009 00:20:44 +0200	[thread overview]
Message-ID: <1249942847-23851-3-git-send-email-jack@suse.cz> (raw)
In-Reply-To: <1249942847-23851-1-git-send-email-jack@suse.cz>

From: Nick Piggin <npiggin@suse.de>

When writing a page to filesystem, vfs zeroes out parts of the page past
i_size in an attempt to get zeroes into those blocks on disk, so as to honour
the requirement that an expanding truncate should zero-fill the file.

Unfortunately, this is racy. The reason we can get something other than
zeroes here is via an mmaped write to the block beyond i_size. Zeroing it
out before writepage narrows the window, but it is still possible to store
junk beyond i_size on disk, by storing into the page after writepage zeroes,
but before DMA (or copy) completes. This allows process A to break posix
semantics for process B (or even inadvertently for itsef).

It could also be possible that the filesystem has written data into the
block but not yet expanded the inode size when the system crashes for
some reason. Unless its journal reply / fsck process etc checks for this
condition, it could also cause subsequent breakage in semantics.

So just remove this zeroing completely for the time being. It removes the
constraint that i_size has to be updated under page lock in write_end()
which helps subsequent patches. The Posix semantics will be fixed later.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/buffer.c |   35 ++++-------------------------------
 fs/mpage.c  |   13 ++-----------
 2 files changed, 6 insertions(+), 42 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index c160aa0..33da488 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -1667,9 +1667,6 @@ static int __block_write_full_page(struct inode *inode, struct page *page,
 			 * this page can be outside i_size when there is a
 			 * truncate in progress.
 			 */
-			/*
-			 * The buffer was zeroed by block_write_full_page()
-			 */
 			clear_buffer_dirty(bh);
 			set_buffer_uptodate(bh);
 		} else if ((!buffer_mapped(bh) || buffer_delay(bh)) &&
@@ -2656,26 +2653,14 @@ int nobh_writepage(struct page *page, get_block_t *get_block,
 	unsigned offset;
 	int ret;
 
-	/* Is the page fully inside i_size? */
-	if (page->index < end_index)
-		goto out;
-
 	/* Is the page fully outside i_size? (truncate in progress) */
 	offset = i_size & (PAGE_CACHE_SIZE-1);
-	if (page->index >= end_index+1 || !offset) {
+	if (page->index >= end_index &&
+			(page->index >= end_index+1 || !offset)) {
 		unlock_page(page);
 		return 0;
 	}
 
-	/*
-	 * The page straddles i_size.  It must be zeroed out on each and every
-	 * writepage invocation because it may be mmapped.  "A file is mapped
-	 * in multiples of the page size.  For a file that is not a multiple of
-	 * the  page size, the remaining memory is zeroed when mapped, and
-	 * writes to that region are not written out to the file."
-	 */
-	zero_user_segment(page, offset, PAGE_CACHE_SIZE);
-out:
 	ret = mpage_writepage(page, get_block, wbc);
 	if (ret == -EAGAIN)
 		ret = __block_write_full_page(inode, page, get_block, wbc,
@@ -2849,26 +2834,14 @@ int block_write_full_page_endio(struct page *page, get_block_t *get_block,
 	const pgoff_t end_index = i_size >> PAGE_CACHE_SHIFT;
 	unsigned offset;
 
-	/* Is the page fully inside i_size? */
-	if (page->index < end_index)
-		return __block_write_full_page(inode, page, get_block, wbc,
-					       handler);
-
 	/* Is the page fully outside i_size? (truncate in progress) */
 	offset = i_size & (PAGE_CACHE_SIZE-1);
-	if (page->index >= end_index+1 || !offset) {
+	if (page->index >= end_index &&
+	    (page->index >= end_index + 1 || !offset)) {
 		unlock_page(page);
 		return 0;
 	}
 
-	/*
-	 * The page straddles i_size.  It must be zeroed out on each and every
-	 * writepage invokation because it may be mmapped.  "A file is mapped
-	 * in multiples of the page size.  For a file that is not a multiple of
-	 * the  page size, the remaining memory is zeroed when mapped, and
-	 * writes to that region are not written out to the file."
-	 */
-	zero_user_segment(page, offset, PAGE_CACHE_SIZE);
 	return __block_write_full_page(inode, page, get_block, wbc, handler);
 }
 
diff --git a/fs/mpage.c b/fs/mpage.c
index 42381bd..9317762 100644
--- a/fs/mpage.c
+++ b/fs/mpage.c
@@ -559,19 +559,10 @@ static int __mpage_writepage(struct page *page, struct writeback_control *wbc,
 page_is_mapped:
 	end_index = i_size >> PAGE_CACHE_SHIFT;
 	if (page->index >= end_index) {
-		/*
-		 * The page straddles i_size.  It must be zeroed out on each
-		 * and every writepage invokation because it may be mmapped.
-		 * "A file is mapped in multiples of the page size.  For a file
-		 * that is not a multiple of the page size, the remaining memory
-		 * is zeroed when mapped, and writes to that region are not
-		 * written out to the file."
-		 */
 		unsigned offset = i_size & (PAGE_CACHE_SIZE - 1);
 
-		if (page->index > end_index || !offset)
-			goto confused;
-		zero_user_segment(page, offset, PAGE_CACHE_SIZE);
+		if (page->index >= end_index+1 || !offset)
+			goto confused; /* page fully outside i_size */
 	}
 
 	/*
-- 
1.6.0.2


  parent reply	other threads:[~2009-08-10 22:20 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-08-10 22:20 [PATCH 0/5] [RFC] Fix page_mkwrite for blocksize < pagesize Jan Kara
2009-08-10 22:20 ` [PATCH 1/5] fs: buffer_head writepage no invalidate Jan Kara
2009-08-10 22:20 ` Jan Kara [this message]
2009-08-10 22:20 ` [PATCH 3/5] vfs: Create dirty buffer only inside i_size Jan Kara
2009-08-10 22:20 ` [PATCH 4/5] fs: Move i_size update in write_end() from under page lock Jan Kara
2009-08-10 22:20 ` [PATCH 5/5] vfs: Add better VFS support for page_mkwrite when blocksize < pagesize Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1249942847-23851-3-git-send-email-jack@suse.cz \
    --to=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@vger.kernel.org \
    --cc=npiggin@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).