Re: [RFC] new ->perform_write fop

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Dave Chinner <david@fromorbit.com>
To: Nick Piggin <npiggin@suse.de>
Cc: Christoph Hellwig <hch@infradead.org>,
	Josef Bacik <josef@redhat.com>,
	linux-fsdevel@vger.kernel.org, chris.mason@oracle.com,
	akpm@linux-foundation.org, linux-kernel@vger.kernel.org
Subject: Re: [RFC] new ->perform_write fop
Date: Sat, 22 May 2010 18:37:03 +1000	[thread overview]
Message-ID: <20100522083703.GA12087@dastard> (raw)
In-Reply-To: <20100522023102.GP2516@laptop>

On Sat, May 22, 2010 at 12:31:02PM +1000, Nick Piggin wrote:
> On Fri, May 21, 2010 at 11:15:18AM -0400, Christoph Hellwig wrote:
> > Nick, what exactly is the problem with the reserve + allocate design?
> > 
> > In a delalloc filesystem (which is all those that will care about high
> > performance large writes) the write path fundamentally consists of those
> > two operations.  Getting rid of the get_blocks mess and replacing it
> > with a dedicated operations vector will simplify things a lot.
> 
> Nothing wrong with it, I think it's a fine idea (although you may still
> need a per-bh call to connect the fs metadata to each page).
> 
> I just much prefer to have operations after the copy not able to fail,
> otherwise you get into all those pagecache corner cases.
> 
> BTW. when you say reserve + allocate, this is in the page-dirty path,
> right? I thought delalloc filesystems tend to do the actual allocation
> in the page-cleaning path? Or am I confused?

See my reply to Jan - delayed allocate has two parts to it - space
reservation (accounting for ENOSPC) and recording of the delalloc extents
(allocate). This is separate to the writeback path where we convert
delalloc extents to real extents....

> > Punching holes is a rather problematic operation, and as mentioned not
> > actually implemented for most filesystems - just decrementing counters
> > on errors increases the chances that our error handling will actually
> > work massively.
> 
> It's just harder for the pagecache. Invalidating and throwing out old
> pagecache and splicing in new pages seems a bit of a hack.

Hardly a hack - it turns a buffered write into an operation that
does not expose transient page state and hence prevents torn writes.
That will allow us to use DIF enabled storage paths for buffered
filesystem IO(*), perhaps even allow us to generate checksums during
copy-in to do end-to-end checksum protection of data....

Cheers,

Dave.

(*) Yes, I know that mmap writes will still break DIF even if we do
writes this way.
-- 
Dave Chinner
david@fromorbit.com

next prev parent reply	other threads:[~2010-05-22  8:37 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-05-12 21:24 [RFC] new ->perform_write fop Josef Bacik
2010-05-13  1:39 ` Josef Bacik
2010-05-13 15:36   ` Christoph Hellwig
2010-05-14  1:00   ` Dave Chinner
2010-05-14  3:30     ` Josef Bacik
2010-05-14  5:50       ` Nick Piggin
2010-05-14  7:20         ` Dave Chinner
2010-05-14  7:33           ` Nick Piggin
2010-05-14  6:41       ` Dave Chinner
2010-05-14  7:22         ` Nick Piggin
2010-05-14  8:38           ` Dave Chinner
2010-05-14 13:33             ` Chris Mason
2010-05-18  6:36             ` Nick Piggin
2010-05-18  8:05               ` Dave Chinner
2010-05-18 10:43                 ` Nick Piggin
2010-05-18 12:27                   ` Dave Chinner
2010-05-18 15:09                     ` Nick Piggin
2010-05-19 23:50                       ` Dave Chinner
2010-05-20  6:48                         ` Nick Piggin
2010-05-20 20:12                         ` Jan Kara
2010-05-20 23:05                           ` Dave Chinner
2010-05-21  9:05                             ` Steven Whitehouse
2010-05-21 13:50                             ` Josef Bacik
2010-05-21 14:23                               ` Nick Piggin
2010-05-21 15:19                                 ` Josef Bacik
2010-05-24  3:29                                   ` Nick Piggin
2010-05-22  0:31                               ` Dave Chinner
2010-05-21 18:58                             ` Jan Kara
2010-05-22  0:27                               ` Dave Chinner
2010-05-24  9:20                                 ` Jan Kara
2010-05-24  9:33                                   ` Nick Piggin
2010-06-05 15:05                                   ` tytso
2010-06-06  7:59                                     ` Nick Piggin
2010-05-21 15:15           ` Christoph Hellwig
2010-05-22  2:31             ` Nick Piggin
2010-05-22  8:37               ` Dave Chinner [this message]
2010-05-24  3:09                 ` Nick Piggin
2010-05-24  5:53                   ` Dave Chinner
2010-05-24  6:55                     ` Nick Piggin
2010-05-24 10:21                       ` Dave Chinner
2010-06-01  6:27                         ` Nick Piggin
2010-05-24 18:40                       ` Joel Becker
2010-05-17 23:35         ` Jan Kara
2010-05-18  1:21           ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100522083703.GA12087@dastard \
    --to=david@fromorbit.com \
    --cc=akpm@linux-foundation.org \
    --cc=chris.mason@oracle.com \
    --cc=hch@infradead.org \
    --cc=josef@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=npiggin@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).