linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Questions on block drivers, REQ_FLUSH and REQ_FUA
@ 2011-05-24 21:29 Alex Bligh
  2011-05-24 22:32 ` Vivek Goyal
  0 siblings, 1 reply; 11+ messages in thread
From: Alex Bligh @ 2011-05-24 21:29 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Alex Bligh

I am writing (well, designing at the moment) a block driver where there is
a very high variance in time to write the underlying blocks. I have some
questions about the interface to block drivers which I would be really
grateful if someone could answer. I believe they are not answered by
existing documentation and I am happy to write up and contribute something
for Documentation/ in return for answers.

Q1: This may seem pretty basic, but I presume I am allowed to answer
requests submitted to my block driver in an order other than the order
in which they are submitted. IE can I do this:

        Receive        Reply
        =======        =====
        READ1
        READ2
                       REPLY READ2
                       REPLY READ1

My understanding is "yes". If the answer is "no", I am overcomplicating
things and most of the rest of this is irrelevant.

Q2: Will I ever get a sequence where a I receive a read for a block that
I have already received, but not responded to, a write? If so, I presume
I have to ensure I do not send back "old data", but that reordering is
still acceptable, i.e. this is OK, but different values for replies
are not:

        Receive        Reply
        =======        =====
        WRITE1 blkX=A
        READ1 blkX
        WRITE2 blkX=B
	READ2 blkX
                       REPLY READ2=B
                       REPLY READ1=A
                       REPLY WRITE1
                       REPLY WRITE2

Q3: Apparently there are no longer concepts of barriers, just REQ_FLUSH
and REQ_FUA. REQ_FLUSH guarantees all "completed" I/O requests are written
to disk prior to that BIO starting. However, what about non-completed I/O
requests? For instance, is the following legitimate:

        Receive        Send to disk         Reply
        =======        ============         =====
        WRITE1
        WRITE2
                                            WRITE2 (cached)
        FLUSH+WRITE3
                       WRITE2
                       WRITE3
                                            WRITE3
        WRITE4
                       WRITE4
                                            WRITE4
                       WRITE1
                                            WRITE1

Here WRITE1 was not 'completed', and thus by the text of
Documentation/writeback_cache_control.txt, need not be written to disk
before starting WRITE3 (which had REQ_FLUSH attached).

> The REQ_FLUSH flag can be OR ed into the r/w flags of a bio submitted from
> the filesystem and will make sure the volatile cache of the storage device
> has been flushed before the actual I/O operation is started.  This
> explicitly guarantees that previously completed write requests are on
> non-volatile storage before the flagged bio starts.

I presume this is illegal and is a documentation issue.

Q4. Can I reorder forwards write requests across flushes? IE, can I do
this:

        Receive        Send to disk         Reply
        =======        ============         =====
        WRITE1
                                            WRITE2 (cached)
        WRITE2
                                            WRITE2 (cached)
        FLUSH+WRITE3
        WRITE4
                       WRITE4
                                            WRITE4
                       WRITE2
                       WRITE3
                                            WRITE3

Again this does not appear to be illegal, as the FLUSH operation is
not defined as a barrier, meaning it should in theory be possible
to handle (and write to disk) requests received after the
FLUSH request before the FLUSH request finishes, provided that the
commands received before the FLUSH request itself complete before
the FLUSH request is replied to. I really don't know what the answer
is to this one. It makes a big difference to me as I can write multiple
blocks in parallel, and would really rather not slow up future write
requests until everything is flushed unless I need to.

Any assistance gratefully received!

-- 
Alex Bligh


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2011-05-25 19:58 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-05-24 21:29 Questions on block drivers, REQ_FLUSH and REQ_FUA Alex Bligh
2011-05-24 22:32 ` Vivek Goyal
2011-05-24 22:37   ` Vivek Goyal
2011-05-25  8:06   ` Alex Bligh
2011-05-25  8:59   ` Tejun Heo
2011-05-25 15:54     ` Alex Bligh
2011-05-25 16:43       ` Tejun Heo
2011-05-25 17:43         ` Alex Bligh
2011-05-25 19:10           ` Vivek Goyal
2011-05-25 19:58             ` Alex Bligh
2011-05-25 19:15         ` Vivek Goyal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).