Questions on block drivers, REQ_FLUSH and REQ

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Questions on block drivers, REQ_FLUSH and REQ_FUA
@ 2011-05-24 21:29 Alex Bligh
  2011-05-24 22:32 ` Vivek Goyal
  0 siblings, 1 reply; 11+ messages in thread
From: Alex Bligh @ 2011-05-24 21:29 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Alex Bligh

I am writing (well, designing at the moment) a block driver where there is
a very high variance in time to write the underlying blocks. I have some
questions about the interface to block drivers which I would be really
grateful if someone could answer. I believe they are not answered by
existing documentation and I am happy to write up and contribute something
for Documentation/ in return for answers.

Q1: This may seem pretty basic, but I presume I am allowed to answer
requests submitted to my block driver in an order other than the order
in which they are submitted. IE can I do this:

        Receive        Reply
        =======        =====
        READ1
        READ2
                       REPLY READ2
                       REPLY READ1

My understanding is "yes". If the answer is "no", I am overcomplicating
things and most of the rest of this is irrelevant.

Q2: Will I ever get a sequence where a I receive a read for a block that
I have already received, but not responded to, a write? If so, I presume
I have to ensure I do not send back "old data", but that reordering is
still acceptable, i.e. this is OK, but different values for replies
are not:

        Receive        Reply
        =======        =====
        WRITE1 blkX=A
        READ1 blkX
        WRITE2 blkX=B
	READ2 blkX
                       REPLY READ2=B
                       REPLY READ1=A
                       REPLY WRITE1
                       REPLY WRITE2

Q3: Apparently there are no longer concepts of barriers, just REQ_FLUSH
and REQ_FUA. REQ_FLUSH guarantees all "completed" I/O requests are written
to disk prior to that BIO starting. However, what about non-completed I/O
requests? For instance, is the following legitimate:

        Receive        Send to disk         Reply
        =======        ============         =====
        WRITE1
        WRITE2
                                            WRITE2 (cached)
        FLUSH+WRITE3
                       WRITE2
                       WRITE3
                                            WRITE3
        WRITE4
                       WRITE4
                                            WRITE4
                       WRITE1
                                            WRITE1

Here WRITE1 was not 'completed', and thus by the text of
Documentation/writeback_cache_control.txt, need not be written to disk
before starting WRITE3 (which had REQ_FLUSH attached).

> The REQ_FLUSH flag can be OR ed into the r/w flags of a bio submitted from
> the filesystem and will make sure the volatile cache of the storage device
> has been flushed before the actual I/O operation is started.  This
> explicitly guarantees that previously completed write requests are on
> non-volatile storage before the flagged bio starts.

I presume this is illegal and is a documentation issue.

Q4. Can I reorder forwards write requests across flushes? IE, can I do
this:

        Receive        Send to disk         Reply
        =======        ============         =====
        WRITE1
                                            WRITE2 (cached)
        WRITE2
                                            WRITE2 (cached)
        FLUSH+WRITE3
        WRITE4
                       WRITE4
                                            WRITE4
                       WRITE2
                       WRITE3
                                            WRITE3

Again this does not appear to be illegal, as the FLUSH operation is
not defined as a barrier, meaning it should in theory be possible
to handle (and write to disk) requests received after the
FLUSH request before the FLUSH request finishes, provided that the
commands received before the FLUSH request itself complete before
the FLUSH request is replied to. I really don't know what the answer
is to this one. It makes a big difference to me as I can write multiple
blocks in parallel, and would really rather not slow up future write
requests until everything is flushed unless I need to.

Any assistance gratefully received!

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Questions on block drivers, REQ_FLUSH and REQ_FUA
  2011-05-24 21:29 Questions on block drivers, REQ_FLUSH and REQ_FUA Alex Bligh
@ 2011-05-24 22:32 ` Vivek Goyal
  2011-05-24 22:37   ` Vivek Goyal
                     ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Vivek Goyal @ 2011-05-24 22:32 UTC (permalink / raw)
  To: Alex Bligh; +Cc: linux-fsdevel, Tejun Heo

On Tue, May 24, 2011 at 10:29:09PM +0100, Alex Bligh wrote:

[..]
> Q3: Apparently there are no longer concepts of barriers, just REQ_FLUSH
> and REQ_FUA. REQ_FLUSH guarantees all "completed" I/O requests are written
> to disk prior to that BIO starting. However, what about non-completed I/O
> requests? For instance, is the following legitimate:
> 
>        Receive        Send to disk         Reply
>        =======        ============         =====
>        WRITE1
>        WRITE2
>                                            WRITE2 (cached)
>        FLUSH+WRITE3
>                       WRITE2
>                       WRITE3
>                                            WRITE3
>        WRITE4
>                       WRITE4
>                                            WRITE4
>                       WRITE1
>                                            WRITE1
> 
> Here WRITE1 was not 'completed', and thus by the text of
> Documentation/writeback_cache_control.txt, need not be written to disk
> before starting WRITE3 (which had REQ_FLUSH attached).
> 
> >The REQ_FLUSH flag can be OR ed into the r/w flags of a bio submitted from
> >the filesystem and will make sure the volatile cache of the storage device
> >has been flushed before the actual I/O operation is started.  This
> >explicitly guarantees that previously completed write requests are on
> >non-volatile storage before the flagged bio starts.
> 
> I presume this is illegal and is a documentation issue.

I know very little about flush semantics but still try to answer two
of your questions.

CCing Tejun.

Tejun, please correct me if I got this wrong.

I think documentation is fine. It specifically talks about completed
requests. The requests which have been sent to drive (and may be in
controller's cache). 

So in above example, if driver holds back WRITE1 and never signals
the completion of request, then I think it is fine to complete
the WRITE3+FLUSH ahead of WRITE1.

I think issue will arise only if you signaled that WRITE1 has completed
and cached it in driver (as you seem to indicating) and never sent to the
drive and then you received WRITE3 + FLUSH requests. In that case you shall
have to make sure that by the time WRITE3 + FLUSH completion is signaled,
WRITE1 is on the disk.
 
> 
> Q4. Can I reorder forwards write requests across flushes? IE, can I do
> this:
> 
>        Receive        Send to disk         Reply
>        =======        ============         =====
>        WRITE1
>                                            WRITE2 (cached)
>        WRITE2
>                                            WRITE2 (cached)
>        FLUSH+WRITE3
>        WRITE4
>                       WRITE4
>                                            WRITE4
>                       WRITE2
>                       WRITE3
>                                            WRITE3
> 
> Again this does not appear to be illegal, as the FLUSH operation is
> not defined as a barrier, meaning it should in theory be possible
> to handle (and write to disk) requests received after the
> FLUSH request before the FLUSH request finishes, provided that the
> commands received before the FLUSH request itself complete before
> the FLUSH request is replied to. I really don't know what the answer
> is to this one. It makes a big difference to me as I can write multiple
> blocks in parallel, and would really rather not slow up future write
> requests until everything is flushed unless I need to.

IIUC, you are right. You can finish WRITE4 before completing FLUSH+WRITE3
here.

We just need to make sure that any request completed by the driver
is on disk by the time FLUSH+WRITE3 completes.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Questions on block drivers, REQ_FLUSH and REQ_FUA
  2011-05-24 22:32 ` Vivek Goyal
@ 2011-05-24 22:37   ` Vivek Goyal
  2011-05-25  8:06   ` Alex Bligh
  2011-05-25  8:59   ` Tejun Heo
  2 siblings, 0 replies; 11+ messages in thread
From: Vivek Goyal @ 2011-05-24 22:37 UTC (permalink / raw)
  To: Alex Bligh; +Cc: linux-fsdevel, Tejun Heo

On Tue, May 24, 2011 at 06:32:20PM -0400, Vivek Goyal wrote:
> On Tue, May 24, 2011 at 10:29:09PM +0100, Alex Bligh wrote:
> 
> [..]
> > Q3: Apparently there are no longer concepts of barriers, just REQ_FLUSH
> > and REQ_FUA. REQ_FLUSH guarantees all "completed" I/O requests are written
> > to disk prior to that BIO starting. However, what about non-completed I/O
> > requests? For instance, is the following legitimate:
> > 
> >        Receive        Send to disk         Reply
> >        =======        ============         =====
> >        WRITE1
> >        WRITE2
> >                                            WRITE2 (cached)
> >        FLUSH+WRITE3
> >                       WRITE2
> >                       WRITE3
> >                                            WRITE3
> >        WRITE4
> >                       WRITE4
> >                                            WRITE4
> >                       WRITE1
> >                                            WRITE1
> > 
> > Here WRITE1 was not 'completed', and thus by the text of
> > Documentation/writeback_cache_control.txt, need not be written to disk
> > before starting WRITE3 (which had REQ_FLUSH attached).
> > 
> > >The REQ_FLUSH flag can be OR ed into the r/w flags of a bio submitted from
> > >the filesystem and will make sure the volatile cache of the storage device
> > >has been flushed before the actual I/O operation is started.  This
> > >explicitly guarantees that previously completed write requests are on
> > >non-volatile storage before the flagged bio starts.

Are you writing a bio based driver? For a request based driver request
queue should break down FLUSH + WRITE3 request in two parts. Issue FLUSH
first and when that completes, issue WRITE3.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Questions on block drivers, REQ_FLUSH and REQ_FUA
  2011-05-24 22:32 ` Vivek Goyal
  2011-05-24 22:37   ` Vivek Goyal
@ 2011-05-25  8:06   ` Alex Bligh
  2011-05-25  8:59   ` Tejun Heo
  2 siblings, 0 replies; 11+ messages in thread
From: Alex Bligh @ 2011-05-25  8:06 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: linux-fsdevel, Tejun Heo, Alex Bligh



--On 24 May 2011 18:32:20 -0400 Vivek Goyal <vgoyal@redhat.com> wrote:

> On Tue, May 24, 2011 at 10:29:09PM +0100, Alex Bligh wrote:
>
> [..]
>> Q3: Apparently there are no longer concepts of barriers, just REQ_FLUSH
>> and REQ_FUA. REQ_FLUSH guarantees all "completed" I/O requests are
>> written to disk prior to that BIO starting. However, what about
>> non-completed I/O requests? For instance, is the following legitimate:

[see diagram duplicated below, snipped to save space]

>> Here WRITE1 was not 'completed', and thus by the text of
>> Documentation/writeback_cache_control.txt, need not be written to disk
>> before starting WRITE3 (which had REQ_FLUSH attached).
...
>> I presume this is illegal and is a documentation issue.
>
> I know very little about flush semantics but still try to answer two
> of your questions.
>
> I think documentation is fine. It specifically talks about completed
> requests. The requests which have been sent to drive (and may be in
> controller's cache).
>
> So in above example, if driver holds back WRITE1 and never signals
> the completion of request, then I think it is fine to complete
> the WRITE3+FLUSH ahead of WRITE1.
>
> I think issue will arise only if you signaled that WRITE1 has completed
> and cached it in driver (as you seem to indicating) and never sent to the
> drive and then you received WRITE3 + FLUSH requests. In that case you
> shall have to make sure that by the time WRITE3 + FLUSH completion is
> signaled, WRITE1 is on the disk.

That conforms to the documentation, but the reason why I think it
is unlikely is that from the kernel's point of view, there is
no difference in effect between what I suggested:

       Receive        Send to disk         Reply
       =======        ============         =====
       WRITE1
       WRITE2
                                           WRITE2 (cached)
       FLUSH+WRITE3
                      WRITE2
                      WRITE3
                                           WRITE3
       WRITE4
                      WRITE4
                                           WRITE4
                      WRITE1
                                           WRITE1

and what the kernel is trying to avoid:

       Receive        Send to disk         Reply
       =======        ============         =====
       WRITE1 (processed write1, send to writeback cache, do not reply)
       WRITE2
                                           WRITE2 (cached)
       FLUSH+WRITE3
                      WRITE2
                      WRITE3
                                           WRITE3
       WRITE4
                      WRITE4
                                           WRITE4
                      WRITE1
                                           WRITE1

IE I can't see how a strict reading of the specification gains the
kernel anything.

> IIUC, you are right. You can finish WRITE4 before completing FLUSH+WRITE3
> here.
>
> We just need to make sure that any request completed by the driver
> is on disk by the time FLUSH+WRITE3 completes.

OK, that's less surprising as the kernel still gains something.

> Are you writing a bio based driver? For a request based driver request
> queue should break down FLUSH + WRITE3 request in two parts. Issue FLUSH
> first and when that completes, issue WRITE3.

Currently it's request-based (in fact the kernel bit of it is based on nbd
at the moment), though I could rewrite to make it bio based.

The characteristics I have are: large variance in time to complete a given
operation, desirability of ordering of requests by block number (i.e.
elevator is useful to me), large operations very disporportionately cheaper
than small ones, parallelisation of requests gives huge benefits (i.e. I
can write many many many blocks in parallel).

If a request-based driver is a bad structure, I could relatively easily
rewrite (it mostly lives in userland at the moment, and the kernel bit is
quite small). We'd get a bio-based nbd out of it too for free (I have no
idea whether that would be an advantage though I note loop has gone
make_request_function based).

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Questions on block drivers, REQ_FLUSH and REQ_FUA
  2011-05-24 22:32 ` Vivek Goyal
  2011-05-24 22:37   ` Vivek Goyal
  2011-05-25  8:06   ` Alex Bligh
@ 2011-05-25  8:59   ` Tejun Heo
  2011-05-25 15:54     ` Alex Bligh
  2 siblings, 1 reply; 11+ messages in thread
From: Tejun Heo @ 2011-05-25  8:59 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: Alex Bligh, linux-fsdevel

Hey, Vivek.

On Tue, May 24, 2011 at 06:32:20PM -0400, Vivek Goyal wrote:
> I think documentation is fine. It specifically talks about completed
> requests. The requests which have been sent to drive (and may be in
> controller's cache). 
> 
> So in above example, if driver holds back WRITE1 and never signals
> the completion of request, then I think it is fine to complete
> the WRITE3+FLUSH ahead of WRITE1.

Yeap, that's correct.  Ordering between flush and other writes are now
completely the responsibility of filesystems.  Block layer just
doesn't care.

> I think issue will arise only if you signaled that WRITE1 has completed
> and cached it in driver (as you seem to indicating) and never sent to the
> drive and then you received WRITE3 + FLUSH requests. In that case you shall
> have to make sure that by the time WRITE3 + FLUSH completion is signaled,
> WRITE1 is on the disk.

A FLUSH command means "flush out all data from writes upto this
point".  If a driver has indicated completion of a write and then
received a FLUSH, the data from the write should be written to disk.

> > Again this does not appear to be illegal, as the FLUSH operation is
> > not defined as a barrier, meaning it should in theory be possible
> > to handle (and write to disk) requests received after the
> > FLUSH request before the FLUSH request finishes, provided that the
> > commands received before the FLUSH request itself complete before
> > the FLUSH request is replied to. I really don't know what the answer
> > is to this one. It makes a big difference to me as I can write multiple
> > blocks in parallel, and would really rather not slow up future write
> > requests until everything is flushed unless I need to.
> 
> IIUC, you are right. You can finish WRITE4 before completing FLUSH+WRITE3
> here.
> 
> We just need to make sure that any request completed by the driver
> is on disk by the time FLUSH+WRITE3 completes.

Yeap, again, block layer just doesn't care and the only thing block
driver should pay attention to regarding FLUSH is implementing FLUSH
command properly.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Questions on block drivers, REQ_FLUSH and REQ_FUA
  2011-05-25  8:59   ` Tejun Heo
@ 2011-05-25 15:54     ` Alex Bligh
  2011-05-25 16:43       ` Tejun Heo
  0 siblings, 1 reply; 11+ messages in thread
From: Alex Bligh @ 2011-05-25 15:54 UTC (permalink / raw)
  To: Tejun Heo, Vivek Goyal; +Cc: linux-fsdevel, Alex Bligh

Tejun,

--On 25 May 2011 10:59:50 +0200 Tejun Heo <tj@kernel.org> wrote:

> Yeap, that's correct.  Ordering between flush and other writes are now
> completely the responsibility of filesystems.  Block layer just
> doesn't care.
...
> A FLUSH command means "flush out all data from writes upto this
> point".  If a driver has indicated completion of a write and then
> received a FLUSH, the data from the write should be written to disk.

So to be clear

a) If I do not complete a write command, I may avoid writing it to disk
   indefinitely (despite completing subsequently received FLUSH
   commands). The only flushes to disk that I am obliged to flush
   are those that I've actually told the block layer that I have done.

b) If I receive a flush command, and prior to completing that flush
   command, I receive subsequent write commands, I may execute
   (and, if I like, write, to disk) write commands received AFTER that
   flush command. I presume if the subsequent write commands write to
   blocks that I am meant to be flushing, I can just forget about
   the blocks I am meant to be flushing (because they would be
   overwritten) provided *something* overwritten what was there before.

If my understanding is correct, then for future readers of the archive
(perhaps I should put this list in Documentation/ ?) the semantics are
something like:

1. Block drivers may handle requests received in any order, and may
   issue completions in any order, subject only to the rules below.

2. If a read covering a given block X is received after one or more writes
   for that block, then irrespective of the order in which the read
   and write(s) are handled/completed, the read shall return the
   value written by the immediately preceding write to that block.

   Therefore whilst the following is legal...

        Driver sends                        Driver replies

        WRITE BLOCK 1 = X
                                            WRITE BLOCK 1 COMPLETED 

        .... time passes ...
        READ BLOCK 1
        WRITE BLOCK 1 = Y
                                            WRITE BLOCK 1 COMPLETED
                                            READ BLOCK 1 COMPLETED

   ...the read from block 1 should return X and not Y, even if it was
   handled by the driver after the write.

3. If a flush request is received, then before completing it (and,
   in the case of a make_request_function driver) before initiating
   any attached write, the driver MUST have written to non-volatile
   storage any writes which were COMPLETED prior to the reception
   of the flush. This does not affect any writes received, but
   not completed, prior to the flush, nor does it prevent a block driver
   from completing subsequently issued writes before completion of the
   flush. IE the flush does not act as a barrier, it merely ensures that
   on completion of the flush non-volatile storage contains either the
   blocks written to prior to the flush or blocks written to in commands
   issued subsequent to the flush, but completed prior to it.

4. Requests marked FUA should be written to non-volatile storage prior
   to completion, but impose no restrictions on ordering.

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Questions on block drivers, REQ_FLUSH and REQ_FUA
  2011-05-25 15:54     ` Alex Bligh
@ 2011-05-25 16:43       ` Tejun Heo
  2011-05-25 17:43         ` Alex Bligh
  2011-05-25 19:15         ` Vivek Goyal
  0 siblings, 2 replies; 11+ messages in thread
From: Tejun Heo @ 2011-05-25 16:43 UTC (permalink / raw)
  To: Alex Bligh; +Cc: Vivek Goyal, linux-fsdevel

Hello, Alex.

On Wed, May 25, 2011 at 5:54 PM, Alex Bligh <alex@alex.org.uk> wrote:
> a) If I do not complete a write command, I may avoid writing it to disk
>  indefinitely (despite completing subsequently received FLUSH
>  commands). The only flushes to disk that I am obliged to flush
>  are those that I've actually told the block layer that I have done.

Yes, driver doesn't have any ordering responsibility w.r.t. FLUSH for
writes which it hasn't declared finished yet.

> b) If I receive a flush command, and prior to completing that flush
>  command, I receive subsequent write commands, I may execute
>  (and, if I like, write, to disk) write commands received AFTER that
>  flush command. I presume if the subsequent write commands write to
>  blocks that I am meant to be flushing, I can just forget about
>  the blocks I am meant to be flushing (because they would be
>  overwritten) provided *something* overwritten what was there before.

The first half is correct.  The latter half may be correct if there's
no intervening write but _please_ don't do that.  If there's something
to be optimized there, it should be done in upper layers.  It's
playing with fire.

> If my understanding is correct, then for future readers of the archive
> (perhaps I should put this list in Documentation/ ?) the semantics are
> something like:
>
> 1. Block drivers may handle requests received in any order, and may
>  issue completions in any order, subject only to the rules below.
>
> 2. If a read covering a given block X is received after one or more writes
>  for that block, then irrespective of the order in which the read
>  and write(s) are handled/completed, the read shall return the
>  value written by the immediately preceding write to that block.
>
>  Therefore whilst the following is legal...
>
>       Driver sends                        Driver replies
>
>       WRITE BLOCK 1 = X
>                                           WRITE BLOCK 1 COMPLETED
>       .... time passes ...
>       READ BLOCK 1
>       WRITE BLOCK 1 = Y
>                                           WRITE BLOCK 1 COMPLETED
>                                           READ BLOCK 1 COMPLETED
>
>  ...the read from block 1 should return X and not Y, even if it was
>  handled by the driver after the write.

This is usually synchronized in the upper layer and AFAIK filesystems
don't issue overlapping reads and writes simultaneously (right?) and
in the above case I don't think READ BLOCK 1 returning Y would be
illegal.  There's no ordering constraints between them anyway and
block layer would happily reorder the second write in front of the
read.

> 3. If a flush request is received, then before completing it (and,
>  in the case of a make_request_function driver) before initiating
>  any attached write, the driver MUST have written to non-volatile
>  storage any writes which were COMPLETED prior to the reception
>  of the flush. This does not affect any writes received, but
>  not completed, prior to the flush, nor does it prevent a block driver
>  from completing subsequently issued writes before completion of the
>  flush. IE the flush does not act as a barrier, it merely ensures that
>  on completion of the flush non-volatile storage contains either the
>  blocks written to prior to the flush or blocks written to in commands
>  issued subsequent to the flush, but completed prior to it.
>
> 4. Requests marked FUA should be written to non-volatile storage prior
>  to completion, but impose no restrictions on ordering.

Hmm... For bio drivers, REQ_FLUSH and REQ_FUA are best explained
together.  The followings are legal combinations.

* No write data, REQ_FLUSH - doesn't have any ordering constraint
other than the inherent FLUSH requirement (previously completed WRITEs
should be on the media on FLUSH completion).

* Write data, REQ_FLUSH - FLUSH must be completed before write data is
issued.  ie. write data must not be written to the media before all
previous writes are on the media.

* Write data, REQ_FUA - Write should be completed before FLUSH is
issued - ie. the write data should be on platter along with previously
completed writes on bio completion.

* Write data, REQ_FLUSH | REQ_FUA - Write data must not be written to
the media before all previous writes are on the media && the write
data must be on the media on bio completion.  This is usually
sequenced as FLUSH write FLUSH.

Request based drivers only see REQ_FLUSH w/o write data and the only
rule it has to follow is that all writes it completed prior to
receiving FLUSH must be on the media on completion of FLUSH and being
smart about it might not be a good idea.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Questions on block drivers, REQ_FLUSH and REQ_FUA
  2011-05-25 16:43       ` Tejun Heo
@ 2011-05-25 17:43         ` Alex Bligh
  2011-05-25 19:10           ` Vivek Goyal
  2011-05-25 19:15         ` Vivek Goyal
  1 sibling, 1 reply; 11+ messages in thread
From: Alex Bligh @ 2011-05-25 17:43 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Vivek Goyal, linux-fsdevel, Alex Bligh

Tejun,

--On 25 May 2011 18:43:14 +0200 Tejun Heo <tj@kernel.org> wrote:

>> b) If I receive a flush command, and prior to completing that flush
>>  command, I receive subsequent write commands, I may execute
>>  (and, if I like, write, to disk) write commands received AFTER that
>>  flush command. I presume if the subsequent write commands write to
>>  blocks that I am meant to be flushing, I can just forget about
>>  the blocks I am meant to be flushing (because they would be
>>  overwritten) provided *something* overwritten what was there before.
>
> The first half is correct.  The latter half may be correct if there's
> no intervening write but _please_ don't do that.  If there's something
> to be optimized there, it should be done in upper layers.  It's
> playing with fire.

Thanks. My specific interest in playing with this particular fire is
that my driver can only write very large collections of blocks together
to the underlying media, and the write, whilst fast in bandwidth terms,
is high latency. So, if I receive a flush request, I either have
to block handling all writes to any block in the same collection as
anything that has been written to since the last flush (which effectively
means blocking all writes to the disk), or keep processing writes
whilst I deal with the flush. On a small disk with frequent flushes,
the flush might take several tens of milliseconds. On a large disk
with infrequent flushes I might be looking at several seconds, and I
am worried about upper layers timing out.

> Hmm... For bio drivers, REQ_FLUSH and REQ_FUA are best explained
> together.  The followings are legal combinations.
>
> * No write data, REQ_FLUSH - doesn't have any ordering constraint
> other than the inherent FLUSH requirement (previously completed WRITEs
> should be on the media on FLUSH completion).
>
> * Write data, REQ_FLUSH - FLUSH must be completed before write data is
> issued.  ie. write data must not be written to the media before all
> previous writes are on the media.
>
> * Write data, REQ_FUA - Write should be completed before FLUSH is
> issued - ie. the write data should be on platter along with previously
> completed writes on bio completion.
>
> * Write data, REQ_FLUSH | REQ_FUA - Write data must not be written to
> the media before all previous writes are on the media && the write
> data must be on the media on bio completion.  This is usually
> sequenced as FLUSH write FLUSH.
>
> Request based drivers only see REQ_FLUSH w/o write data and the only
> rule it has to follow is that all writes it completed prior to
> receiving FLUSH must be on the media on completion of FLUSH and being
> smart about it might not be a good idea.

Thanks once again. My driver is currently request-based. Is there any
significant advantage converting it to be bio based?

-- 
Alex Bligh
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Questions on block drivers, REQ_FLUSH and REQ_FUA
  2011-05-25 17:43         ` Alex Bligh
@ 2011-05-25 19:10           ` Vivek Goyal
  2011-05-25 19:58             ` Alex Bligh
  0 siblings, 1 reply; 11+ messages in thread
From: Vivek Goyal @ 2011-05-25 19:10 UTC (permalink / raw)
  To: Alex Bligh; +Cc: Tejun Heo, linux-fsdevel

On Wed, May 25, 2011 at 06:43:01PM +0100, Alex Bligh wrote:

[..]
> Thanks once again. My driver is currently request-based. Is there any
> significant advantage converting it to be bio based?

Some people try to avoid the ovhead of elevator if device is very fast
and try to go for bio based driver. Or they create both the modes and
let user choose one.

Thanks
Vivek

> 
> -- 
> Alex Bligh

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Questions on block drivers, REQ_FLUSH and REQ_FUA
  2011-05-25 16:43       ` Tejun Heo
  2011-05-25 17:43         ` Alex Bligh
@ 2011-05-25 19:15         ` Vivek Goyal
  1 sibling, 0 replies; 11+ messages in thread
From: Vivek Goyal @ 2011-05-25 19:15 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Alex Bligh, linux-fsdevel

On Wed, May 25, 2011 at 06:43:14PM +0200, Tejun Heo wrote:

[..]
> 
> * Write data, REQ_FUA - Write should be completed before FLUSH is
> issued - ie. the write data should be on platter along with previously
> completed writes on bio completion.

Tejun,

I have a confusion regarding the following statement.

"along with previously completed writes on bio completion"

I thought REQ_FUA just gurantees that bio/req carrying REQ_FUA flag is
on disk platter. It does not gurantee anything about other requests 
which completed before this bio/req and might still be in non volatile
caches of device.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Questions on block drivers, REQ_FLUSH and REQ_FUA
  2011-05-25 19:10           ` Vivek Goyal
@ 2011-05-25 19:58             ` Alex Bligh
  0 siblings, 0 replies; 11+ messages in thread
From: Alex Bligh @ 2011-05-25 19:58 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: Tejun Heo, linux-fsdevel, Alex Bligh

Vivek,

--On 25 May 2011 15:10:24 -0400 Vivek Goyal <vgoyal@redhat.com> wrote:

>> Thanks once again. My driver is currently request-based. Is there any
>> significant advantage converting it to be bio based?
>
> Some people try to avoid the ovhead of elevator if device is very fast
> and try to go for bio based driver. Or they create both the modes and
> let user choose one.

Ah, so bio means no elevator. I'd missed that through looking only
at the request stuff. Right now the elevator is a reasonable approximation
of what I need, but it's good to know I could do that too if I am
feeling masochistic. Thanks.

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2011-05-25 19:58 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-05-24 21:29 Questions on block drivers, REQ_FLUSH and REQ_FUA Alex Bligh
2011-05-24 22:32 ` Vivek Goyal
2011-05-24 22:37   ` Vivek Goyal
2011-05-25  8:06   ` Alex Bligh
2011-05-25  8:59   ` Tejun Heo
2011-05-25 15:54     ` Alex Bligh
2011-05-25 16:43       ` Tejun Heo
2011-05-25 17:43         ` Alex Bligh
2011-05-25 19:10           ` Vivek Goyal
2011-05-25 19:58             ` Alex Bligh
2011-05-25 19:15         ` Vivek Goyal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).