From: Alex Bligh <alex@alex.org.uk>
To: Vivek Goyal <vgoyal@redhat.com>
Cc: linux-fsdevel@vger.kernel.org, Tejun Heo <tj@kernel.org>,
Alex Bligh <alex@alex.org.uk>
Subject: Re: Questions on block drivers, REQ_FLUSH and REQ_FUA
Date: Wed, 25 May 2011 09:06:20 +0100 [thread overview]
Message-ID: <891FAD455BB4C52A3ADABD2D@nimrod.local> (raw)
In-Reply-To: <20110524223220.GA379@redhat.com>
--On 24 May 2011 18:32:20 -0400 Vivek Goyal <vgoyal@redhat.com> wrote:
> On Tue, May 24, 2011 at 10:29:09PM +0100, Alex Bligh wrote:
>
> [..]
>> Q3: Apparently there are no longer concepts of barriers, just REQ_FLUSH
>> and REQ_FUA. REQ_FLUSH guarantees all "completed" I/O requests are
>> written to disk prior to that BIO starting. However, what about
>> non-completed I/O requests? For instance, is the following legitimate:
[see diagram duplicated below, snipped to save space]
>> Here WRITE1 was not 'completed', and thus by the text of
>> Documentation/writeback_cache_control.txt, need not be written to disk
>> before starting WRITE3 (which had REQ_FLUSH attached).
...
>> I presume this is illegal and is a documentation issue.
>
> I know very little about flush semantics but still try to answer two
> of your questions.
>
> I think documentation is fine. It specifically talks about completed
> requests. The requests which have been sent to drive (and may be in
> controller's cache).
>
> So in above example, if driver holds back WRITE1 and never signals
> the completion of request, then I think it is fine to complete
> the WRITE3+FLUSH ahead of WRITE1.
>
> I think issue will arise only if you signaled that WRITE1 has completed
> and cached it in driver (as you seem to indicating) and never sent to the
> drive and then you received WRITE3 + FLUSH requests. In that case you
> shall have to make sure that by the time WRITE3 + FLUSH completion is
> signaled, WRITE1 is on the disk.
That conforms to the documentation, but the reason why I think it
is unlikely is that from the kernel's point of view, there is
no difference in effect between what I suggested:
Receive Send to disk Reply
======= ============ =====
WRITE1
WRITE2
WRITE2 (cached)
FLUSH+WRITE3
WRITE2
WRITE3
WRITE3
WRITE4
WRITE4
WRITE4
WRITE1
WRITE1
and what the kernel is trying to avoid:
Receive Send to disk Reply
======= ============ =====
WRITE1 (processed write1, send to writeback cache, do not reply)
WRITE2
WRITE2 (cached)
FLUSH+WRITE3
WRITE2
WRITE3
WRITE3
WRITE4
WRITE4
WRITE4
WRITE1
WRITE1
IE I can't see how a strict reading of the specification gains the
kernel anything.
> IIUC, you are right. You can finish WRITE4 before completing FLUSH+WRITE3
> here.
>
> We just need to make sure that any request completed by the driver
> is on disk by the time FLUSH+WRITE3 completes.
OK, that's less surprising as the kernel still gains something.
> Are you writing a bio based driver? For a request based driver request
> queue should break down FLUSH + WRITE3 request in two parts. Issue FLUSH
> first and when that completes, issue WRITE3.
Currently it's request-based (in fact the kernel bit of it is based on nbd
at the moment), though I could rewrite to make it bio based.
The characteristics I have are: large variance in time to complete a given
operation, desirability of ordering of requests by block number (i.e.
elevator is useful to me), large operations very disporportionately cheaper
than small ones, parallelisation of requests gives huge benefits (i.e. I
can write many many many blocks in parallel).
If a request-based driver is a bad structure, I could relatively easily
rewrite (it mostly lives in userland at the moment, and the kernel bit is
quite small). We'd get a bio-based nbd out of it too for free (I have no
idea whether that would be an advantage though I note loop has gone
make_request_function based).
--
Alex Bligh
next prev parent reply other threads:[~2011-05-25 8:06 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-05-24 21:29 Questions on block drivers, REQ_FLUSH and REQ_FUA Alex Bligh
2011-05-24 22:32 ` Vivek Goyal
2011-05-24 22:37 ` Vivek Goyal
2011-05-25 8:06 ` Alex Bligh [this message]
2011-05-25 8:59 ` Tejun Heo
2011-05-25 15:54 ` Alex Bligh
2011-05-25 16:43 ` Tejun Heo
2011-05-25 17:43 ` Alex Bligh
2011-05-25 19:10 ` Vivek Goyal
2011-05-25 19:58 ` Alex Bligh
2011-05-25 19:15 ` Vivek Goyal
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=891FAD455BB4C52A3ADABD2D@nimrod.local \
--to=alex@alex.org.uk \
--cc=linux-fsdevel@vger.kernel.org \
--cc=tj@kernel.org \
--cc=vgoyal@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).