public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Mark Fasheh <mark.fasheh@oracle.com>
To: linux-kernel@vger.kernel.org
Cc: nickpiggin@yahoo.com.au, npiggin@suse.de, akpm@osdl.org
Subject: Re: + fs-prepare_write-fixes.patch added to -mm tree
Date: Wed, 18 Oct 2006 18:42:09 -0700	[thread overview]
Message-ID: <20061019014209.GA10128@ca-server1.us.oracle.com> (raw)
In-Reply-To: <200610182150.k9ILoLNk019702@shell0.pdx.osdl.net>

Hi Nick,

On Wed, Oct 18, 2006 at 02:50:21PM -0700, akpm@osdl.org wrote:
> Some prepare/commit_write implementations have possible pre-existing bugs,
> possible data leaks (by setting uptodate too early) and data corruption (by
> not reading in non-modified parts of a !uptodate page).
> 
> Others are (also) broken when commit_write passes in a 0 length commit with
> a !uptodate page (a change caused by buffered write deadlock fix patch).
> 
> Fix filesystems as best we can.  GFS2, OCFS2, Reiserfs, JFFS are nontrivial
> and are likely broken.  All others at least need a glance.
I would have liked a CC on this patch, considering that ypu might have just
broken ocfs2 :) I wouldn't have even seen the patch had I not been looking
through my mm-commits mailbox for an unrelated patch.


>    commit_write: If prepare_write succeeds, new data will be copied
> -        into the page and then commit_write will be called.  It will
> -        typically update the size of the file (if appropriate) and
> -        mark the inode as dirty, and do any other related housekeeping
> -        operations.  It should avoid returning an error if possible -
> -        errors should have been handled by prepare_write.
> +        into the page and then commit_write will be called. commit_write may
> +	be called with a range that is smaller than that passed in to
> +	prepare_write, it could even be zero. If the page is not uptodate,
> +	the range will *only* be zero or the full length that was passed to
> +	prepare_write, if it is zero, the page should not be marked uptodate
> +	(success should still be returned, if possible -- the write will be
> +	retried).
Doesn't this scheme have the potential to leave dirty data in holes? If a
file system does it's allocation for a hole in the middle of a file (so no
i_size update) in ->prepare_write(), but then in ->commit_write() it's not
allowed to write out the page, or add it to a transaction (in the case of
ext3/ocfs2 ordered writes), we might commit the allocation tree changes
without writing data to the actual disk region that the allocation covers. A
subsequent read would then return whatever junk was on disk.


> diff -puN fs/ext3/inode.c~fs-prepare_write-fixes fs/ext3/inode.c
> --- a/fs/ext3/inode.c~fs-prepare_write-fixes
> +++ a/fs/ext3/inode.c
> @@ -1214,21 +1214,24 @@ static int ext3_ordered_commit_write(str
>  	struct inode *inode = page->mapping->host;
>  	int ret = 0, ret2;
>  
> -	ret = walk_page_buffers(handle, page_buffers(page),
> -		from, to, NULL, ext3_journal_dirty_data);
> +	if (to - from > 0) {
> +		ret = walk_page_buffers(handle, page_buffers(page),
> +			from, to, NULL, ext3_journal_dirty_data);
I think this perhaps illustrates my worry. If we don't add all the page
buffers to the transaction which cover a hole that was filled in
->prepare_write(), we'll commit at the bottom of ext3_ordered_commit_write()
without having covered the entire range which we allocated for.


What probably needs to happen is one of two things:

a) The file system rolls back the allocation

b) We somehow write zero's to the part of the region skipped before
   ->commit_write() returns.

Thanks,
	--Mark

--
Mark Fasheh
Senior Software Developer, Oracle
mark.fasheh@oracle.com

       reply	other threads:[~2006-10-19  1:42 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <200610182150.k9ILoLNk019702@shell0.pdx.osdl.net>
2006-10-19  1:42 ` Mark Fasheh [this message]
2006-10-19  5:25   ` + fs-prepare_write-fixes.patch added to -mm tree Nick Piggin
2006-10-19 23:09     ` Mark Fasheh
2006-10-19 23:27       ` Nick Piggin
2006-10-20 18:55         ` Mark Fasheh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20061019014209.GA10128@ca-server1.us.oracle.com \
    --to=mark.fasheh@oracle.com \
    --cc=akpm@osdl.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nickpiggin@yahoo.com.au \
    --cc=npiggin@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox