From: David Chinner <dgc@sgi.com>
To: Neil Brown <neilb@suse.de>
Cc: David Chinner <dgc@sgi.com>, xfs@oss.sgi.com, hch@infradead.org
Subject: Re: XFS and write barriers.
Date: Sun, 25 Mar 2007 15:17:55 +1100 [thread overview]
Message-ID: <20070325041755.GJ32602149@melbourne.sgi.com> (raw)
In-Reply-To: <17923.34462.210758.852042@notabene.brown>
On Fri, Mar 23, 2007 at 06:49:50PM +1100, Neil Brown wrote:
> On Friday March 23, dgc@sgi.com wrote:
> > On Fri, Mar 23, 2007 at 12:26:31PM +1100, Neil Brown wrote:
> > > Secondly, if a barrier write fails due to EOPNOTSUPP, it should be
> > > retried without the barrier (after possibly waiting for dependant
> > > requests to complete). This is what other filesystems do, but I
> > > cannot find the code in xfs which does this.
> >
> > XFS doesn't handle this - I was unaware that the barrier status of the
> > underlying block device could change....
> >
> > OOC, when did this behaviour get introduced?
>
> Probably when md/raid1 started supported barriers....
>
> The problem is that this interface is (as far as I can see) undocumented
> and not fully specified.
And not communicated very far, either.
> Barriers only make sense inside drive firmware.
I disagree. e.g. Barriers have to be handled by the block layer to
prevent reordering of I/O in the request queues as well. The
block layer is responsible for ensuring barrier I/Os, as
indicated by the filesystem, act as real barriers.
> Trying to emulate it
> in the md layer doesn't make any sense as the filesystem is in a much
> better position to do any emulation required.
You're saying that the emulation of block layer functionality is the
responsibility of layers above the block layer. Why is this not
considered a layering violation?
> > > This is particularly important for md/raid1 as it is quite possible
> > > that barriers will be supported at first, but after a failure and
> > > different device on a different controller could be swapped in that
> > > does not support barriers.
> >
> > I/O errors are not the way this should be handled. What happens if
> > the opposite happens? A drive that needs barriers is used as a
> > replacement on a filesystem that has barriers disabled because they
> > weren't needed? Now a crash can result in filesystem corruption, but
> > the filesystem has not been able to warn the admin that this
> > situation occurred.
>
> There should never be a possibility of filesystem corruption.
> If the a barrier request fails, the filesystem should:
> wait for any dependant request to complete
> call blkdev_issue_flush
> schedule the write of the 'barrier' block
> call blkdev_issue_flush again.
IOWs, the filesystem has to use block device calls to emulate a block device
barrier I/O. Why can't the block layer, on reception of a barrier write
and detecting that barriers are no longer supported by the underlying
device (i.e. in MD), do:
wait for all queued I/Os to complete
call blkdev_issue_flush
schedule the write of the 'barrier' block
call blkdev_issue_flush again.
And not involve the filesystem at all? i.e. why should the filesystem
have to do this?
> My understand is that that sequence is as safe as a barrier, but maybe
> not as fast.
Yes, and my understanding is that the block device is perfectly capable
of implementing this just as safely as the filesystem.
> The patch looks at least believable. As you can imagine it is awkward
> to test thoroughly.
As well as being pretty much impossible to test reliably with an
automated testing framework. Hence so ongoing test coverage will
approach zero.....
Cheers,
Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group
next prev parent reply other threads:[~2007-03-25 4:18 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-03-23 1:26 XFS and write barriers Neil Brown
2007-03-23 5:30 ` David Chinner
2007-03-23 7:49 ` Neil Brown
2007-03-25 4:17 ` David Chinner [this message]
2007-03-25 23:21 ` Neil Brown
2007-03-26 3:14 ` David Chinner
2007-03-26 4:27 ` Neil Brown
2007-03-26 9:04 ` David Chinner
2007-03-29 14:56 ` Martin Steigerwald
2007-03-29 15:18 ` David Chinner
2007-03-29 16:49 ` Martin Steigerwald
2007-03-23 9:50 ` Christoph Hellwig
2007-03-25 3:51 ` David Chinner
2007-03-25 23:58 ` Neil Brown
2007-03-26 1:11 ` Neil Brown
2007-03-23 6:20 ` Timothy Shimmin
2007-03-23 8:00 ` Neil Brown
2007-03-25 3:19 ` David Chinner
2007-03-26 0:01 ` Neil Brown
2007-03-26 3:58 ` David Chinner
2007-03-27 3:58 ` Timothy Shimmin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070325041755.GJ32602149@melbourne.sgi.com \
--to=dgc@sgi.com \
--cc=hch@infradead.org \
--cc=neilb@suse.de \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox