Re: [PATCH] direct-io: allow file systems to do their own waiting for io

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Chris Mason <chris.mason@fusionio.com>
To: Christoph Hellwig <hch@infradead.org>
Cc: Josef Bacik <JBacik@fusionio.com>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"viro@ZenIV.linux.org.uk" <viro@ZenIV.linux.org.uk>,
	"jmoyer@redhat.com" <jmoyer@redhat.com>,
	"zab@redhat.com" <zab@redhat.com>
Subject: Re: [PATCH] direct-io: allow file systems to do their own waiting for io
Date: Fri, 14 Dec 2012 08:44:21 -0500	[thread overview]
Message-ID: <20121214134421.GA19606@shiny> (raw)
In-Reply-To: <20121208123541.GE25713@shiny>

On Sat, Dec 08, 2012 at 05:35:41AM -0700, Chris Mason wrote:
> On Sat, Dec 08, 2012 at 05:17:31AM -0700, Christoph Hellwig wrote:
> > On Mon, Dec 03, 2012 at 11:14:03AM -0500, Josef Bacik wrote:
> > > On Mon, Dec 03, 2012 at 08:41:25AM -0700, Christoph Hellwig wrote:
> > > > On Mon, Dec 03, 2012 at 08:37:20AM -0500, Josef Bacik wrote:
> > > > > Btrfs is terrible with O_DIRECT|O_SYNC, mostly because of the constant
> > > > > waiting.  The thing is we have a handy way of waiting for IO that we can
> > > > > delay to the very last second so we do all of the O_SYNC work and then wait
> > > > > for a bunch of IO to complete.  So introduce a flag to allow the generic
> > > > > direct io stuff to forgo waiting and leave that up to the file system.
> > > > > Thanks,
> > > > 
> > > > I don't really like passing another flag for this, if we we are going to
> > > > do something like this it should be in a way where:
> > > > 
> > > >  - the actualy waiting code is a helper that btrfs would also use
> > > >  - the main dio code is structured in a way that we have a lower level
> > > >    entry point that skips the waiting, and a higher level one that also
> > > >    calls it.
> > > > 
> > > > That beeing said I'm not imaginative enough to see how you're actually
> > > > going to use it.  Posting the btrfs side would help with that.
> > > > 
> > > 
> > > Hrm so I can do that, but it may not make much sense.  Here are the two patches
> > > that are relevant (older versions but they get the idea across)
> > > 
> > > http://git.kernel.org/?p=linux/kernel/git/josef/btrfs-next.git;a=commit;h=78b40072c556d82fac5e58793a3178887ac057ec
> > > http://git.kernel.org/?p=linux/kernel/git/josef/btrfs-next.git;a=commit;h=b7728f1b19eeb2041e3d4da22fd3d5a5c11abd3c
> > 
> > I've looked over the patches but I still don't know what's going on,
> > sorry for having to poke a bit deeper by mail.
> > 
> > > 
> > > Basically what happens with btrfs now in O_SYNC/fsync() with either O_DIRECT or
> > > not is this
> > > 
> > > write()
> > > fsync()/O_SYNC
> > > 	start and wait on all io to complete
> > > 	log changed metadata into special tree
> > > 	write and wait on our new log
> > > 	sync super which points at our new log
> > > 
> > > What I'm trying to accomplish is this
> > > 
> > > write()
> > > fsync()/O_SYNC
> > > 	start io
> > > 	log changed metadata into special tree
> > > 	write log and then wait on log and data
> > 
> > How is going to be safe?  You must only update the metadata once the
> > data has made it to disk, that is the actual disk I/O for the metadata
> > must only start once the disk I/O for the data has finished.  For
> > exactly that scenario the direct I/O code supports the end_io callback
> > to notify the filesystem efficiently.
> 
> Thanks for reading through things.  The current model without the patch
> looks like this:
> 
> [ write data, wait for data ] [ write various tree blocks, wait ]
> [ write the super, wait ]
> 
> One data block, 3 waits.  But thanks to cow, the super commits the
> metadata, so we could do this:
> 
> [ write the data ] [ write various tree blocks ] [ wait on all of it ]
> [ write the super, wait ]
> 
> That's down to two waits.  If we start using atomic writes on flash, we can
> do it all as a single IO.

So I have this (Josef's v2) in a branch here.  I'm happy to wait a kernel release if
we'd like to hash it out.  But if it makes sense I'll send in with my
pull request.

-chris

next prev parent reply	other threads:[~2012-12-14 13:44 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-12-03 13:37 [PATCH] direct-io: allow file systems to do their own waiting for io Josef Bacik
2012-12-03 15:41 ` Christoph Hellwig
2012-12-03 16:14   ` Josef Bacik
2012-12-08 12:17     ` Christoph Hellwig
2012-12-08 12:35       ` Chris Mason
2012-12-14 13:44         ` Chris Mason [this message]
2012-12-11 10:00       ` Liu Bo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20121214134421.GA19606@shiny \
    --to=chris.mason@fusionio.com \
    --cc=JBacik@fusionio.com \
    --cc=hch@infradead.org \
    --cc=jmoyer@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=viro@ZenIV.linux.org.uk \
    --cc=zab@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).