linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Nick Piggin <npiggin@suse.de>
To: Neil Brown <neilb@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Linux Filesystems <linux-fsdevel@vger.kernel.org>,
	Mark Fasheh <mark.fasheh@oracle.com>,
	Linux Memory Management <linux-mm@kvack.org>
Subject: Re: [patch 12/44] fs: introduce write_begin, write_end, and perform_write aops
Date: Tue, 24 Apr 2007 12:37:42 +0200	[thread overview]
Message-ID: <20070424103742.GA32738@wotan.suse.de> (raw)
In-Reply-To: <17965.46748.634169.563467@notabene.brown>

On Tue, Apr 24, 2007 at 05:49:48PM +1000, Neil Brown wrote:
> On Tuesday April 24, npiggin@suse.de wrote:
> > 
> > BTW. AOP_FLAG_UNINTERRUPTIBLE can be used by filesystems to avoid
> > an initial read or other sequence they might be using to handle the
> > case of a short write. ecryptfs uses it, others can too.
> > 
> > For buffered writes, this doesn't get passed in (unless they are
> > coming from kernel space), so I was debating whether to have it at
> > all.  However, in the previous API, _nobody_ had to worry about
> > short writes, so this flag means I avoid making an API that is
> > fundamentally less performant in some situations.
> 
> Ahhh I think I get it now.
> 
>   In general, the address_space must cope with the possibility that
>   fewer than the expected number of bytes is copied.  This may leave
>   parts of the page with invalid data.  This can be handled by
>   pre-loading the page with valid data, however this may cause a
>   significant performance cost.

Right. Bringing the page uptodate at write_begin-time is probably
the simplest way to handle it. However, more sophisticated schemes
are possible. For example, the generic block routines can recover
at write_end-time, and probably can't make use of the flag to do
things much better...


>   The write_begin/write_end interface provide two mechanism by which
>   this case can be handled more efficiently.
>   1/ The AOP_FLAG_UNINTERRUPTIBLE flag declares that the write will
>     not be partial (maybe a different name? AOP_FLAG_NO_PARTIAL).
>     If that is set, inefficient preparation can be avoided.  However the
>     most common write paths will never set this flag.

Yes, loop, nfsd, and filesystem-specific pagecache modification
(eg. ext2 directories) are probably the main things that use it.


>   2/ The return from write_end can declare that fewer bytes have been
>     accepted. e.g. part of the page may have been loaded from backing
>     store, overwriting some of the newly written bytes.  If this
>     return value is reduced, a new write_begin/write_end cycle
>     may be called to attempt to write the bytes again.

Yeah, although you'd have to be careful not to overwrite things if
the page is uptodate (because uptodate *really* means uptodate --
ie.  it is the only thing we have to synchronise buffered reads from
returning the data to userspace).


> 
> Also
> +  write_end: After a successful write_begin, and data copy, write_end must
> +        be called. len is the original len passed to write_begin, and copied
> +        is the amount that was able to be copied (they must be equal if
> +        write_begin was called with intr == 0).
> +
> 
> That should be "... called without AOP_FLAG_UNINTERRUPTIBLE being
> set".
> And "that was able to be copied" is misleading, as the copy is not done in
> write_end.  Maybe "that was accepted".

Thanks, very good eyes and good suggestions.

Actually I'm a bit worried about this copied vs accepted thing -- we've
already copied some number of bytes into the pagecache by the time write_end
is called. So if the filesystem accepts less and the pagecache page is marked
uptodate, then the pagecache is now out of sync with the filesystem. There
are a few places where it looks like we get this wrong... but that's for a
future patch :P

  reply	other threads:[~2007-04-24 10:37 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-04-24  1:23 [patch 00/44] Buffered write deadlock fix and new aops for 2.6.21-rc6-mm1 Nick Piggin
2007-04-24  1:23 ` [patch 01/44] mm: revert KERNEL_DS buffered write optimisation Nick Piggin
2007-04-24  1:23 ` [patch 02/44] Revert 81b0c8713385ce1b1b9058e916edcf9561ad76d6 Nick Piggin
2007-04-24  1:23 ` [patch 03/44] Revert 6527c2bdf1f833cc18e8f42bd97973d583e4aa83 Nick Piggin
2007-04-24  1:23 ` [patch 04/44] mm: clean up buffered write code Nick Piggin
2007-04-24  1:23 ` [patch 05/44] mm: debug write deadlocks Nick Piggin
2007-04-24  1:23 ` [patch 06/44] mm: trim more holes Nick Piggin
2007-04-24  6:07   ` Neil Brown
2007-04-24  6:17     ` Nick Piggin
2007-04-24  1:23 ` [patch 07/44] mm: buffered write cleanup Nick Piggin
2007-04-24  1:23 ` [patch 08/44] mm: write iovec cleanup Nick Piggin
2007-04-24  1:23 ` [patch 09/44] mm: fix pagecache write deadlocks Nick Piggin
2007-04-24  1:23 ` [patch 10/44] mm: buffered write iterator Nick Piggin
2007-04-24  1:23 ` [patch 11/44] fs: fix data-loss on error Nick Piggin
2007-04-24  1:23 ` [patch 12/44] fs: introduce write_begin, write_end, and perform_write aops Nick Piggin
2007-04-24  6:59   ` Neil Brown
2007-04-24  7:23     ` Nick Piggin
2007-04-24  7:49       ` Neil Brown
2007-04-24 10:37         ` Nick Piggin [this message]
2007-04-24  1:23 ` [patch 13/44] mm: restore KERNEL_DS optimisations Nick Piggin
2007-04-24 10:43   ` Christoph Hellwig
2007-04-24 11:03     ` Nick Piggin
2007-04-24  1:24 ` [patch 14/44] implement simple fs aops Nick Piggin
2007-04-24  1:24 ` [patch 15/44] block_dev convert to new aops Nick Piggin
2007-04-24  1:24 ` [patch 16/44] rd " Nick Piggin
2007-04-24 10:46   ` Christoph Hellwig
2007-04-24 11:05     ` Nick Piggin
2007-04-24 11:11       ` Christoph Hellwig
2007-04-24 11:16         ` Nick Piggin
2007-04-24 11:18           ` Christoph Hellwig
2007-04-24 11:20             ` Nick Piggin
2007-04-24 11:42           ` Neil Brown
2007-04-24  1:24 ` [patch 17/44] ext2 " Nick Piggin
2007-04-24  1:24 ` [patch 18/44] ext3 " Nick Piggin
2007-04-24  1:24 ` [patch 19/44] ext4 " Nick Piggin
2007-04-24  1:24 ` [patch 20/44] xfs " Nick Piggin
2007-04-24  1:24 ` [patch 21/44] fs: new cont helpers Nick Piggin
2007-04-24  1:24 ` [patch 22/44] fat convert to new aops Nick Piggin
2007-04-24  1:24 ` [patch 23/44] adfs " Nick Piggin
2007-04-24  1:24 ` [patch 24/44] affs " Nick Piggin
2007-04-24  1:24 ` [patch 25/44] hfs " Nick Piggin
2007-04-24  1:24 ` [patch 26/44] hfsplus " Nick Piggin
2007-04-24  1:24 ` [patch 27/44] hpfs " Nick Piggin
2007-04-24  1:24 ` [patch 28/44] bfs " Nick Piggin
2007-04-24  1:24 ` [patch 29/44] qnx4 " Nick Piggin
2007-04-24  1:24 ` [patch 30/44] nfs " Nick Piggin
2007-04-24  1:24 ` [patch 31/44] smb " Nick Piggin
2007-04-24  1:24 ` [patch 32/44] ocfs2: " Nick Piggin
2007-04-24  1:24 ` [patch 33/44] gfs2 " Nick Piggin
2007-04-24  1:24 ` [patch 34/44] fs: no AOP_TRUNCATED_PAGE for writes Nick Piggin
2007-04-24  1:24 ` [patch 35/44] ecryptfs convert to new aops Nick Piggin
2007-04-24  1:24 ` [patch 36/44] fuse " Nick Piggin
2007-04-24  1:24 ` [patch 37/44] hostfs " Nick Piggin
2007-04-27 16:11   ` Jeff Dike
2007-04-24  1:24 ` [patch 38/44] jffs2 " Nick Piggin
2007-04-24  1:24 ` [patch 39/44] cifs " Nick Piggin
2007-04-24  1:24 ` [patch 40/44] ufs " Nick Piggin
2007-04-24  1:24 ` [patch 41/44] udf " Nick Piggin
2007-04-24  1:24 ` [patch 42/44] sysv " Nick Piggin
2007-04-24  1:24 ` [patch 43/44] minix " Nick Piggin
2007-04-24  1:24 ` [patch 44/44] jfs " Nick Piggin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070424103742.GA32738@wotan.suse.de \
    --to=npiggin@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mark.fasheh@oracle.com \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).