public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: David Chinner <dgc@sgi.com>
To: Neil Brown <neilb@suse.de>
Cc: David Chinner <dgc@sgi.com>, xfs@oss.sgi.com, hch@infradead.org
Subject: Re: XFS and write barriers.
Date: Sun, 25 Mar 2007 15:17:55 +1100	[thread overview]
Message-ID: <20070325041755.GJ32602149@melbourne.sgi.com> (raw)
In-Reply-To: <17923.34462.210758.852042@notabene.brown>

On Fri, Mar 23, 2007 at 06:49:50PM +1100, Neil Brown wrote:
> On Friday March 23, dgc@sgi.com wrote:
> > On Fri, Mar 23, 2007 at 12:26:31PM +1100, Neil Brown wrote:
> > > Secondly, if a barrier write fails due to EOPNOTSUPP, it should be
> > > retried without the barrier (after possibly waiting for dependant
> > > requests to complete).  This is what other filesystems do, but I
> > > cannot find the code in xfs which does this.
> > 
> > XFS doesn't handle this - I was unaware that the barrier status of the
> > underlying block device could change....
> > 
> > OOC, when did this behaviour get introduced?
> 
> Probably when md/raid1 started supported barriers....
> 
> The problem is that this interface is (as far as I can see) undocumented
> and not fully specified.

And not communicated very far, either.

> Barriers only make sense inside drive firmware.

I disagree. e.g. Barriers have to be handled by the block layer to
prevent reordering of I/O in the request queues as well. The
block layer is responsible for ensuring barrier I/Os, as
indicated by the filesystem, act as real barriers.

> Trying to emulate it
> in the md layer doesn't make any sense as the filesystem is in a much
> better position to do any emulation required.

You're saying that the emulation of block layer functionality is the
responsibility of layers above the block layer. Why is this not
considered a layering violation?

> > > This is particularly important for md/raid1 as it is quite possible
> > > that barriers will be supported at first, but after a failure and
> > > different device on a different controller could be swapped in that
> > > does not support barriers.
> > 
> > I/O errors are not the way this should be handled. What happens if
> > the opposite happens? A drive that needs barriers is used as a
> > replacement on a filesystem that has barriers disabled because they
> > weren't needed? Now a crash can result in filesystem corruption, but
> > the filesystem has not been able to warn the admin that this
> > situation occurred. 
> 
> There should never be a possibility of filesystem corruption.
> If the a barrier request fails, the filesystem should:
>   wait for any dependant request to complete
>   call blkdev_issue_flush
>   schedule the write of the 'barrier' block
>   call blkdev_issue_flush again.

IOWs, the filesystem has to use block device calls to emulate a block device
barrier I/O. Why can't the block layer, on reception of a barrier write
and detecting that barriers are no longer supported by the underlying
device (i.e. in MD), do:

	wait for all queued I/Os to complete
	call blkdev_issue_flush
	schedule the write of the 'barrier' block
	call blkdev_issue_flush again.

And not involve the filesystem at all? i.e. why should the filesystem
have to do this?

> My understand is that that sequence is as safe as a barrier, but maybe
> not as fast.

Yes, and my understanding is that the block device is perfectly capable
of implementing this just as safely as the filesystem.

> The patch looks at least believable.  As you can imagine it is awkward
> to test thoroughly.

As well as being pretty much impossible to test reliably with an
automated testing framework. Hence so ongoing test coverage will
approach zero.....

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

  reply	other threads:[~2007-03-25  4:18 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-03-23  1:26 XFS and write barriers Neil Brown
2007-03-23  5:30 ` David Chinner
2007-03-23  7:49   ` Neil Brown
2007-03-25  4:17     ` David Chinner [this message]
2007-03-25 23:21       ` Neil Brown
2007-03-26  3:14         ` David Chinner
2007-03-26  4:27           ` Neil Brown
2007-03-26  9:04             ` David Chinner
2007-03-29 14:56               ` Martin Steigerwald
2007-03-29 15:18                 ` David Chinner
2007-03-29 16:49                   ` Martin Steigerwald
2007-03-23  9:50   ` Christoph Hellwig
2007-03-25  3:51     ` David Chinner
2007-03-25 23:58       ` Neil Brown
2007-03-26  1:11     ` Neil Brown
2007-03-23  6:20 ` Timothy Shimmin
2007-03-23  8:00   ` Neil Brown
2007-03-25  3:19     ` David Chinner
2007-03-26  0:01       ` Neil Brown
2007-03-26  3:58         ` David Chinner
2007-03-27  3:58       ` Timothy Shimmin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070325041755.GJ32602149@melbourne.sgi.com \
    --to=dgc@sgi.com \
    --cc=hch@infradead.org \
    --cc=neilb@suse.de \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox