From: Jens Axboe <axboe@suse.de>
To: Daniel Phillips <phillips@arcor.de>
Cc: Linux Kernel <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] ide write barrier support
Date: Mon, 20 Oct 2003 21:56:32 +0200 [thread overview]
Message-ID: <20031020195632.GA1128@suse.de> (raw)
In-Reply-To: <200310201910.48837.phillips@arcor.de>
On Mon, Oct 20 2003, Daniel Phillips wrote:
> Hi Jens,
>
> On Monday 13 October 2003 16:08, Jens Axboe wrote:
> > Forward ported and tested today (with the dummy ext3 patch included),
> > works for me. Some todo's left, but I thought I'd send it out to gauge
> > interest.
>
> This is highly interesting of course, but is it suitable for
> submission during the stability freeze? There is no correctness issue
> so long as no filesystem in mainline sets the BIO_RW_BARRIER bit,
> which appears to be the case. Therefore this is really a performance
> patch that introduces a new internal API.
I've said this to you 3 times now I think, I don't think you understand
what I mean with 'correctness issue'. There IS a correctness issue, in
that drives ship with write back caching enabled. The fs assumes that
once wait_on_buffer() returns, the data is on disk. Which is false, can
remain false for quite a long number of seconds.
So let me state again that this is NOT a performance patch. I've never
considered it a performance patch, there are no performance gains with
the patch I posted. It is purely a data integrity issue. I don't know
how I can state this any clearer than I already have...
There are possibilities for performance gains in the _future_, that's
just an added _future_ bonus.
> It seems to me there are a few unresolved issues with the barrier API. It
I agree.
> needs to be clearly stated that only write barriers are supported, not read
> or read/write barriers, if that is in fact the intention. Assuming it is,
> then BIOs with read barriers need to be failed.
read barriers can be just as easily supported, I still think that the
notion of read/write barriers is something you are inventing that I
don't see any practical use for. So I wont expand on that at all. From
my point of view, a read barrier is simply an io scheduler barrier. The
drive/driver never sees that bit. But it is 100% expressable with the
current logic.
> The current BIO API provides no way to express a rw barrier, only read
> barriers and write barriers (the combination of direction bit and
> barrier bit indicates the barrier type). This is minor but it but how
> nice it would be if the API was either orthogonal or there was a clear
> explanation of why RW barriers never make sense. And if they don't,
> why read barriers do make sense. Another possible wart is that the
> API doesn't allow for a read barrier carried by a write BIO or a write
> barrier carried by a read BIO. From a practical point of view the
> only immediate use we have for barriers is to accelerate journal
> writes and everything else comes under the heading of R&D. It would
> help if the code clearly reflected that modest goal.
Please come up with at least pseudo-rational exampes for why this would
ever be needed, I refuse to design API's based on loose whims or ideas.
The API is "designed" for the practical use of today and what I assumed
would be useful within reason, that's as far as I think it makes sense
to go. To bend the API for a doctored example such as 'rw barrier' is
stupid imho.
> The BIO barrier scheme doesn't mesh properly with your proposed
> QUEUE_ORDERED_* scheme. It seems to me that what you want is just
> QUEUE_ORDERED_NONE and QUEUE_ORDERED_WRITE. Is there any case where
> the distinction between a tag based implemenation versus a flush
> matters to high level code?
The difference comes from the early reiser implementation in 2.4, I'm
sure Chris can expand on that. I think it's long gone though, and it's
just an over sight on my part that the ORDERED_TAG is still there. It
will go.
> Also, the blk_queue_ordered function isn't a sufficient interface to
> enable the functionality at a high level, a filesystem also needs a
> way to know whether barriers are supported or not, short of just
> submitting a barrier request and seeing if it fails.
Why? Sometimes the only reliable way to detect whether you can support
barrier writes or not is to issue one. So I can't really help you there.
> The high level interface needs to be able to handled stacked devices,
> i.e., device mapper, but not just device mapper. Barriers have to be
> supported by all the devices in the stack, not just the top or bottom
> one. I don't have a concrete suggestion on what the interface should
> be just now.
I completely agree. And I'm very open to patches correcting that issue,
thanks.
> The point of this is, there still remain a number of open issues with
> this patch, no doubt more than just the ones I touched on. Though it
> is clearly headed in the right direction, I'd suggest holding off
> during the stability freeze and taking the needed time to get it
> right.
You touched on 1 valid point, the md/dm issue. That goes doubly for the
2.4 version (that we don't need to care more about). And I agree with
you there, it needs to be done. And feel free to knock yourself out.
It's not a trivial issue.
--
Jens Axboe
next prev parent reply other threads:[~2003-10-20 19:56 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2003-10-13 14:08 [PATCH] ide write barrier support Jens Axboe
2003-10-13 15:23 ` Jeff Garzik
2003-10-13 15:35 ` Jens Axboe
2003-10-13 15:37 ` Jens Axboe
2003-10-13 22:39 ` Matthias Andree
2003-10-14 0:16 ` Jeff Garzik
2003-10-16 10:36 ` Jens Axboe
2003-10-16 10:46 ` Jeff Garzik
2003-10-16 10:48 ` Jens Axboe
2003-10-13 23:07 ` Andrew Morton
2003-10-14 6:48 ` Jens Axboe
2003-10-15 3:40 ` Greg Stark
2003-10-16 7:10 ` Jens Axboe
2003-10-20 17:10 ` Daniel Phillips
2003-10-20 19:56 ` Jens Axboe [this message]
2003-10-20 23:46 ` Daniel Phillips
2003-10-21 5:40 ` Jens Axboe
2003-10-23 16:22 ` Daniel Phillips
2003-10-23 16:23 ` Jens Axboe
2003-10-23 17:20 ` Daniel Phillips
2003-10-23 23:21 ` Nick Piggin
2003-10-26 21:06 ` Daniel Phillips
2003-10-27 10:29 ` Lars Marowsky-Bree
2003-10-27 21:35 ` Daniel Phillips
2003-10-24 9:36 ` Helge Hafting
2003-10-26 15:38 ` Daniel Phillips
-- strict thread matches above, loose matches on Subject: below --
2003-10-16 16:51 Mudama, Eric
2003-10-16 20:43 ` Greg Stark
2003-10-17 6:44 ` Jens Axboe
2003-10-17 6:46 ` Jens Axboe
2003-10-16 20:51 Mudama, Eric
2003-10-17 6:48 ` Jens Axboe
2003-10-17 16:07 Mudama, Eric
2003-10-17 18:08 ` Jens Axboe
2003-10-17 17:59 Manfred Spraul
2003-10-17 18:06 ` Jens Axboe
2003-10-21 0:47 ` Matthias Andree
2003-10-17 18:42 Mudama, Eric
[not found] <IXzh.61g.5@gated-at.bofh.it>
2003-10-21 19:24 ` Anton Ertl
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20031020195632.GA1128@suse.de \
--to=axboe@suse.de \
--cc=linux-kernel@vger.kernel.org \
--cc=phillips@arcor.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.