linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@suse.de>
To: Daniel Phillips <phillips@arcor.de>
Cc: Linux Kernel <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] ide write barrier support
Date: Mon, 20 Oct 2003 21:56:32 +0200	[thread overview]
Message-ID: <20031020195632.GA1128@suse.de> (raw)
In-Reply-To: <200310201910.48837.phillips@arcor.de>

On Mon, Oct 20 2003, Daniel Phillips wrote:
> Hi Jens,
> 
> On Monday 13 October 2003 16:08, Jens Axboe wrote:
> > Forward ported and tested today (with the dummy ext3 patch included),
> > works for me. Some todo's left, but I thought I'd send it out to gauge
> > interest.
> 
> This is highly interesting of course, but is it suitable for
> submission during the stability freeze?  There is no correctness issue
> so long as no filesystem in mainline sets the BIO_RW_BARRIER bit,
> which appears to be the case.  Therefore this is really a performance
> patch that introduces a new internal API.

I've said this to you 3 times now I think, I don't think you understand
what I mean with 'correctness issue'. There IS a correctness issue, in
that drives ship with write back caching enabled. The fs assumes that
once wait_on_buffer() returns, the data is on disk. Which is false, can
remain false for quite a long number of seconds.

So let me state again that this is NOT a performance patch. I've never
considered it a performance patch, there are no performance gains with
the patch I posted. It is purely a data integrity issue. I don't know
how I can state this any clearer than I already have...

There are possibilities for performance gains in the _future_, that's
just an added _future_ bonus.

> It seems to me there are a few unresolved issues with the barrier API.  It 

I agree.

> needs to be clearly stated that only write barriers are supported, not read 
> or read/write barriers, if that is in fact the intention.  Assuming it is, 
> then BIOs with read barriers need to be failed.

read barriers can be just as easily supported, I still think that the
notion of read/write barriers is something you are inventing that I
don't see any practical use for. So I wont expand on that at all. From
my point of view, a read barrier is simply an io scheduler barrier. The
drive/driver never sees that bit. But it is 100% expressable with the
current logic.

> The current BIO API provides no way to express a rw barrier, only read
> barriers and write barriers (the combination of direction bit and
> barrier bit indicates the barrier type).  This is minor but it but how
> nice it would be if the API was either orthogonal or there was a clear
> explanation of why RW barriers never make sense.  And if they don't,
> why read barriers do make sense.  Another possible wart is that the
> API doesn't allow for a read barrier carried by a write BIO or a write
> barrier carried by a read BIO.  From a practical point of view the
> only immediate use we have for barriers is to accelerate journal
> writes and everything else comes under the heading of R&D.  It would
> help if the code clearly reflected that modest goal.

Please come up with at least pseudo-rational exampes for why this would
ever be needed, I refuse to design API's based on loose whims or ideas.
The API is "designed" for the practical use of today and what I assumed
would be useful within reason, that's as far as I think it makes sense
to go. To bend the API for a doctored example such as 'rw barrier' is
stupid imho.

> The BIO barrier scheme doesn't mesh properly with your proposed
> QUEUE_ORDERED_* scheme.  It seems to me that what you want is just
> QUEUE_ORDERED_NONE and QUEUE_ORDERED_WRITE.  Is there any case where
> the distinction between a tag based implemenation versus a flush
> matters to high level code?

The difference comes from the early reiser implementation in 2.4, I'm
sure Chris can expand on that. I think it's long gone though, and it's
just an over sight on my part that the ORDERED_TAG is still there. It
will go.

> Also, the blk_queue_ordered function isn't a sufficient interface to
> enable the functionality at a high level, a filesystem also needs a
> way to know whether barriers are supported or not, short of just
> submitting a barrier request and seeing if it fails.

Why? Sometimes the only reliable way to detect whether you can support
barrier writes or not is to issue one. So I can't really help you there.

> The high level interface needs to be able to handled stacked devices,
> i.e., device mapper, but not just device mapper.  Barriers have to be
> supported by all the devices in the stack, not just the top or bottom
> one.  I don't have a concrete suggestion on what the interface should
> be just now.

I completely agree. And I'm very open to patches correcting that issue,
thanks.

> The point of this is, there still remain a number of open issues with
> this patch, no doubt more than just the ones I touched on.  Though it
> is clearly headed in the right direction, I'd suggest holding off
> during the stability freeze and taking the needed time to get it
> right.

You touched on 1 valid point, the md/dm issue. That goes doubly for the
2.4 version (that we don't need to care more about). And I agree with
you there, it needs to be done. And feel free to knock yourself out.
It's not a trivial issue.

-- 
Jens Axboe


  reply	other threads:[~2003-10-20 19:56 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-10-13 14:08 [PATCH] ide write barrier support Jens Axboe
2003-10-13 15:23 ` Jeff Garzik
2003-10-13 15:35   ` Jens Axboe
2003-10-13 15:37     ` Jens Axboe
2003-10-13 22:39 ` Matthias Andree
2003-10-14  0:16   ` Jeff Garzik
2003-10-16 10:36     ` Jens Axboe
2003-10-16 10:46       ` Jeff Garzik
2003-10-16 10:48         ` Jens Axboe
2003-10-13 23:07 ` Andrew Morton
2003-10-14  6:48   ` Jens Axboe
2003-10-15  3:40 ` Greg Stark
2003-10-16  7:10   ` Jens Axboe
2003-10-20 17:10 ` Daniel Phillips
2003-10-20 19:56   ` Jens Axboe [this message]
2003-10-20 23:46     ` Daniel Phillips
2003-10-21  5:40       ` Jens Axboe
2003-10-23 16:22         ` Daniel Phillips
2003-10-23 16:23           ` Jens Axboe
2003-10-23 17:20             ` Daniel Phillips
2003-10-23 23:21               ` Nick Piggin
2003-10-26 21:06                 ` Daniel Phillips
2003-10-27 10:29                   ` Lars Marowsky-Bree
2003-10-27 21:35                     ` Daniel Phillips
2003-10-24  9:36               ` Helge Hafting
2003-10-26 15:38                 ` Daniel Phillips
  -- strict thread matches above, loose matches on Subject: below --
2003-10-16 16:51 Mudama, Eric
2003-10-16 20:43 ` Greg Stark
2003-10-17  6:44   ` Jens Axboe
2003-10-17  6:46 ` Jens Axboe
2003-10-16 20:51 Mudama, Eric
2003-10-17  6:48 ` Jens Axboe
2003-10-17 16:07 Mudama, Eric
2003-10-17 18:08 ` Jens Axboe
2003-10-17 17:59 Manfred Spraul
2003-10-17 18:06 ` Jens Axboe
2003-10-21  0:47   ` Matthias Andree
2003-10-17 18:42 Mudama, Eric
     [not found] <IXzh.61g.5@gated-at.bofh.it>
2003-10-21 19:24 ` Anton Ertl

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20031020195632.GA1128@suse.de \
    --to=axboe@suse.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=phillips@arcor.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).