From: Jens Axboe <axboe@suse.de>
To: Daniel Phillips <phillips@arcor.de>
Cc: Linux Kernel <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] ide write barrier support
Date: Mon, 20 Oct 2003 21:56:32 +0200 [thread overview]
Message-ID: <20031020195632.GA1128@suse.de> (raw)
In-Reply-To: <200310201910.48837.phillips@arcor.de>
On Mon, Oct 20 2003, Daniel Phillips wrote:
> Hi Jens,
>
> On Monday 13 October 2003 16:08, Jens Axboe wrote:
> > Forward ported and tested today (with the dummy ext3 patch included),
> > works for me. Some todo's left, but I thought I'd send it out to gauge
> > interest.
>
> This is highly interesting of course, but is it suitable for
> submission during the stability freeze? There is no correctness issue
> so long as no filesystem in mainline sets the BIO_RW_BARRIER bit,
> which appears to be the case. Therefore this is really a performance
> patch that introduces a new internal API.
I've said this to you 3 times now I think, I don't think you understand
what I mean with 'correctness issue'. There IS a correctness issue, in
that drives ship with write back caching enabled. The fs assumes that
once wait_on_buffer() returns, the data is on disk. Which is false, can
remain false for quite a long number of seconds.
So let me state again that this is NOT a performance patch. I've never
considered it a performance patch, there are no performance gains with
the patch I posted. It is purely a data integrity issue. I don't know
how I can state this any clearer than I already have...
There are possibilities for performance gains in the _future_, that's
just an added _future_ bonus.
> It seems to me there are a few unresolved issues with the barrier API. It
I agree.
> needs to be clearly stated that only write barriers are supported, not read
> or read/write barriers, if that is in fact the intention. Assuming it is,
> then BIOs with read barriers need to be failed.
read barriers can be just as easily supported, I still think that the
notion of read/write barriers is something you are inventing that I
don't see any practical use for. So I wont expand on that at all. From
my point of view, a read barrier is simply an io scheduler barrier. The
drive/driver never sees that bit. But it is 100% expressable with the
current logic.
> The current BIO API provides no way to express a rw barrier, only read
> barriers and write barriers (the combination of direction bit and
> barrier bit indicates the barrier type). This is minor but it but how
> nice it would be if the API was either orthogonal or there was a clear
> explanation of why RW barriers never make sense. And if they don't,
> why read barriers do make sense. Another possible wart is that the
> API doesn't allow for a read barrier carried by a write BIO or a write
> barrier carried by a read BIO. From a practical point of view the
> only immediate use we have for barriers is to accelerate journal
> writes and everything else comes under the heading of R&D. It would
> help if the code clearly reflected that modest goal.
Please come up with at least pseudo-rational exampes for why this would
ever be needed, I refuse to design API's based on loose whims or ideas.
The API is "designed" for the practical use of today and what I assumed
would be useful within reason, that's as far as I think it makes sense
to go. To bend the API for a doctored example such as 'rw barrier' is
stupid imho.
> The BIO barrier scheme doesn't mesh properly with your proposed
> QUEUE_ORDERED_* scheme. It seems to me that what you want is just
> QUEUE_ORDERED_NONE and QUEUE_ORDERED_WRITE. Is there any case where
> the distinction between a tag based implemenation versus a flush
> matters to high level code?
The difference comes from the early reiser implementation in 2.4, I'm
sure Chris can expand on that. I think it's long gone though, and it's
just an over sight on my part that the ORDERED_TAG is still there. It
will go.
> Also, the blk_queue_ordered function isn't a sufficient interface to
> enable the functionality at a high level, a filesystem also needs a
> way to know whether barriers are supported or not, short of just
> submitting a barrier request and seeing if it fails.
Why? Sometimes the only reliable way to detect whether you can support
barrier writes or not is to issue one. So I can't really help you there.
> The high level interface needs to be able to handled stacked devices,
> i.e., device mapper, but not just device mapper. Barriers have to be
> supported by all the devices in the stack, not just the top or bottom
> one. I don't have a concrete suggestion on what the interface should
> be just now.
I completely agree. And I'm very open to patches correcting that issue,
thanks.
> The point of this is, there still remain a number of open issues with
> this patch, no doubt more than just the ones I touched on. Though it
> is clearly headed in the right direction, I'd suggest holding off
> during the stability freeze and taking the needed time to get it
> right.
You touched on 1 valid point, the md/dm issue. That goes doubly for the
2.4 version (that we don't need to care more about). And I agree with
you there, it needs to be done. And feel free to knock yourself out.
It's not a trivial issue.
--
Jens Axboe
next prev parent reply other threads:[~2003-10-20 19:56 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2003-10-13 14:08 [PATCH] ide write barrier support Jens Axboe
2003-10-13 15:23 ` Jeff Garzik
2003-10-13 15:35 ` Jens Axboe
2003-10-13 15:37 ` Jens Axboe
2003-10-13 22:39 ` Matthias Andree
2003-10-14 0:16 ` Jeff Garzik
2003-10-16 10:36 ` Jens Axboe
2003-10-16 10:46 ` Jeff Garzik
2003-10-16 10:48 ` Jens Axboe
2003-10-13 23:07 ` Andrew Morton
2003-10-14 6:48 ` Jens Axboe
2003-10-15 3:40 ` Greg Stark
2003-10-16 7:10 ` Jens Axboe
2003-10-20 17:10 ` Daniel Phillips
2003-10-20 19:56 ` Jens Axboe [this message]
2003-10-20 23:46 ` Daniel Phillips
2003-10-21 5:40 ` Jens Axboe
2003-10-23 16:22 ` Daniel Phillips
2003-10-23 16:23 ` Jens Axboe
2003-10-23 17:20 ` Daniel Phillips
2003-10-23 23:21 ` Nick Piggin
2003-10-26 21:06 ` Daniel Phillips
2003-10-27 10:29 ` Lars Marowsky-Bree
2003-10-27 21:35 ` Daniel Phillips
2003-10-24 9:36 ` Helge Hafting
2003-10-26 15:38 ` Daniel Phillips
-- strict thread matches above, loose matches on Subject: below --
2003-10-16 16:51 Mudama, Eric
2003-10-16 20:43 ` Greg Stark
2003-10-17 6:44 ` Jens Axboe
2003-10-17 6:46 ` Jens Axboe
2003-10-16 20:51 Mudama, Eric
2003-10-17 6:48 ` Jens Axboe
2003-10-17 16:07 Mudama, Eric
2003-10-17 18:08 ` Jens Axboe
2003-10-17 17:59 Manfred Spraul
2003-10-17 18:06 ` Jens Axboe
2003-10-21 0:47 ` Matthias Andree
2003-10-17 18:42 Mudama, Eric
[not found] <IXzh.61g.5@gated-at.bofh.it>
2003-10-21 19:24 ` Anton Ertl
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20031020195632.GA1128@suse.de \
--to=axboe@suse.de \
--cc=linux-kernel@vger.kernel.org \
--cc=phillips@arcor.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).