From: Tejun Heo <tj@kernel.org>
To: Alex Bligh <alex@alex.org.uk>
Cc: Vivek Goyal <vgoyal@redhat.com>, linux-fsdevel@vger.kernel.org
Subject: Re: Questions on block drivers, REQ_FLUSH and REQ_FUA
Date: Wed, 25 May 2011 18:43:14 +0200 [thread overview]
Message-ID: <BANLkTimmhx7TB70f6g5Xoh_i51-t7Q50fQ@mail.gmail.com> (raw)
In-Reply-To: <C6F644CACD0532C1D67DF2C1@Ximines.local>
Hello, Alex.
On Wed, May 25, 2011 at 5:54 PM, Alex Bligh <alex@alex.org.uk> wrote:
> a) If I do not complete a write command, I may avoid writing it to disk
> indefinitely (despite completing subsequently received FLUSH
> commands). The only flushes to disk that I am obliged to flush
> are those that I've actually told the block layer that I have done.
Yes, driver doesn't have any ordering responsibility w.r.t. FLUSH for
writes which it hasn't declared finished yet.
> b) If I receive a flush command, and prior to completing that flush
> command, I receive subsequent write commands, I may execute
> (and, if I like, write, to disk) write commands received AFTER that
> flush command. I presume if the subsequent write commands write to
> blocks that I am meant to be flushing, I can just forget about
> the blocks I am meant to be flushing (because they would be
> overwritten) provided *something* overwritten what was there before.
The first half is correct. The latter half may be correct if there's
no intervening write but _please_ don't do that. If there's something
to be optimized there, it should be done in upper layers. It's
playing with fire.
> If my understanding is correct, then for future readers of the archive
> (perhaps I should put this list in Documentation/ ?) the semantics are
> something like:
>
> 1. Block drivers may handle requests received in any order, and may
> issue completions in any order, subject only to the rules below.
>
> 2. If a read covering a given block X is received after one or more writes
> for that block, then irrespective of the order in which the read
> and write(s) are handled/completed, the read shall return the
> value written by the immediately preceding write to that block.
>
> Therefore whilst the following is legal...
>
> Driver sends Driver replies
>
> WRITE BLOCK 1 = X
> WRITE BLOCK 1 COMPLETED
> .... time passes ...
> READ BLOCK 1
> WRITE BLOCK 1 = Y
> WRITE BLOCK 1 COMPLETED
> READ BLOCK 1 COMPLETED
>
> ...the read from block 1 should return X and not Y, even if it was
> handled by the driver after the write.
This is usually synchronized in the upper layer and AFAIK filesystems
don't issue overlapping reads and writes simultaneously (right?) and
in the above case I don't think READ BLOCK 1 returning Y would be
illegal. There's no ordering constraints between them anyway and
block layer would happily reorder the second write in front of the
read.
> 3. If a flush request is received, then before completing it (and,
> in the case of a make_request_function driver) before initiating
> any attached write, the driver MUST have written to non-volatile
> storage any writes which were COMPLETED prior to the reception
> of the flush. This does not affect any writes received, but
> not completed, prior to the flush, nor does it prevent a block driver
> from completing subsequently issued writes before completion of the
> flush. IE the flush does not act as a barrier, it merely ensures that
> on completion of the flush non-volatile storage contains either the
> blocks written to prior to the flush or blocks written to in commands
> issued subsequent to the flush, but completed prior to it.
>
> 4. Requests marked FUA should be written to non-volatile storage prior
> to completion, but impose no restrictions on ordering.
Hmm... For bio drivers, REQ_FLUSH and REQ_FUA are best explained
together. The followings are legal combinations.
* No write data, REQ_FLUSH - doesn't have any ordering constraint
other than the inherent FLUSH requirement (previously completed WRITEs
should be on the media on FLUSH completion).
* Write data, REQ_FLUSH - FLUSH must be completed before write data is
issued. ie. write data must not be written to the media before all
previous writes are on the media.
* Write data, REQ_FUA - Write should be completed before FLUSH is
issued - ie. the write data should be on platter along with previously
completed writes on bio completion.
* Write data, REQ_FLUSH | REQ_FUA - Write data must not be written to
the media before all previous writes are on the media && the write
data must be on the media on bio completion. This is usually
sequenced as FLUSH write FLUSH.
Request based drivers only see REQ_FLUSH w/o write data and the only
rule it has to follow is that all writes it completed prior to
receiving FLUSH must be on the media on completion of FLUSH and being
smart about it might not be a good idea.
Thanks.
--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2011-05-25 16:43 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-05-24 21:29 Questions on block drivers, REQ_FLUSH and REQ_FUA Alex Bligh
2011-05-24 22:32 ` Vivek Goyal
2011-05-24 22:37 ` Vivek Goyal
2011-05-25 8:06 ` Alex Bligh
2011-05-25 8:59 ` Tejun Heo
2011-05-25 15:54 ` Alex Bligh
2011-05-25 16:43 ` Tejun Heo [this message]
2011-05-25 17:43 ` Alex Bligh
2011-05-25 19:10 ` Vivek Goyal
2011-05-25 19:58 ` Alex Bligh
2011-05-25 19:15 ` Vivek Goyal
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=BANLkTimmhx7TB70f6g5Xoh_i51-t7Q50fQ@mail.gmail.com \
--to=tj@kernel.org \
--cc=alex@alex.org.uk \
--cc=linux-fsdevel@vger.kernel.org \
--cc=vgoyal@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).