From: Eric Blake <eblake@redhat.com>
To: Alex Bligh <alex@alex.org.uk>
Cc: "nbd-general@lists.sourceforge.net"
<nbd-general@lists.sourceforge.net>,
"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>
Subject: Re: [Qemu-devel] [Nbd] Is NBD_CMD_FLAG_FUA valid during NBD_CMD_FLUSH?
Date: Thu, 31 Mar 2016 13:54:17 -0600 [thread overview]
Message-ID: <56FD8069.7020101@redhat.com> (raw)
In-Reply-To: <64B326DA-CDF4-4537-B38A-46E7B57C319C@alex.org.uk>
[-- Attachment #1: Type: text/plain, Size: 3722 bytes --]
On 03/31/2016 01:41 PM, Alex Bligh wrote:
>
> On 31 Mar 2016, at 20:33, Eric Blake <eblake@redhat.com> wrote:
>
>> Qemu's nbd-client is setting NBD_CMD_FLAG_FUA during a flush command,
>> but the official NBD protocol documentation doesn't describe this as
>> valid (it merely states that flush must not have a reply until all
>> acknowledged writes have hit permanent storage). Does this flag make
>> sense (what semantics would the flag add, and we need to fix the NBD
>> docs as well as relax the reference implementation to allow the flag),
>> or is it a bug in qemu (and the recent tightening of NBD to throw EINVAL
>> on unsupported flags will trip up qemu)?
>
> As the original author of that particular mess, the intent was that
> they should reflect exactly the Linux kernel's semantics for FLUSH
> and FUA, not only in terms of whether they can be used together,
> but also exactly what they mean.
Oh, and I also just found that qemu's nbd-server tries to honor FUA on
read, even though the protocol doesn't document that as valid either.
>
> This turned out to be an easier way of describing the operations
> than describing them semantically (in particular FLUSH, where I
> couldn't get an entirely consistent answer of what it required
> of inflight requests, specifically whether it required all
> requests inflight at the time of making the request to be written
> to disk prior to answering, or all requests inflight prior to the
> time of replying to be written to disk prior to answering, though
> I believe the former).
>
> FUA just requires that particular request to be persisted to
> disk, and does not require other requests to be persisted to disk
As written, NBD says that FUA requires the current write operation to
land on disk (but says nothing about any other writes, whether those
writes had an early reply). And for flush, NBD only requires that all
writes that have _sent_ their reply to the client must land on disk, but
this can certainly be a smaller set of write requests than _all_ writes
issued prior to that point in time. So maybe flush+FUA is a valid thing
to support, and means that ALL in-flight writes must land, whether or
not a reply has been sent to the client, for an even stronger barrier?
> So in answer to your question, my understanding is that FLUSH requires
> (some subset) of otherwise potentially non-persisted requests to
> be persisted to disk. In that sense it implies FUA. It is permitted
> to set FUA (as it is permitted, I believe, in the linux block layer)
> but it will make no difference.
>
> I once thought FUA on read should bypass any local read cache, though
> that is not part of the spec currently.
In qemu, read+FUA just triggers blk_co_flush() prior to reading; but
that's the same function it calls for write+FUA. And for flush (whether
or not FUA was specified), qemu still calls blk_co_flush(). So from
qemu's perspective, FUA is synonymous with "finish ALL pending
transactions", which is stronger than what the NBD protocol requires.
(Nothing wrong with an implementation doing more work than required,
although it may be less efficient). Alas, that means I can't use qemu's
behavior as a good reference for how to improve the NBD spec.
Meanwhile, it sounds like FUA is valid on read, write, AND flush
(because the kernel supports all three), even if we aren't quite sure
what to document of those flags. And that means qemu is correct, and
the NBD protocol has a bug. Since you contributed the FUA flag, is that
something you can try to improve?
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]
next prev parent reply other threads:[~2016-03-31 19:54 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-03-31 19:33 [Qemu-devel] Is NBD_CMD_FLAG_FUA valid during NBD_CMD_FLUSH? Eric Blake
2016-03-31 19:41 ` [Qemu-devel] [Nbd] " Alex Bligh
2016-03-31 19:54 ` Eric Blake [this message]
2016-03-31 20:17 ` Alex Bligh
2016-03-31 20:34 ` Eric Blake
2016-04-01 7:49 ` Paolo Bonzini
2016-04-01 9:25 ` Alex Bligh
2016-04-01 8:27 ` Wouter Verhelst
2016-04-01 9:40 ` Alex Bligh
2016-04-01 14:16 ` Eric Blake
2016-04-01 15:00 ` Alex Bligh
2016-04-01 15:08 ` Eric Blake
2016-04-01 15:12 ` Alex Bligh
2016-04-01 15:13 ` Alex Bligh
2016-04-01 15:31 ` Eric Blake
2016-04-01 15:46 ` Alex Bligh
2016-05-02 17:08 ` Eric Blake
2016-04-01 7:43 ` Paolo Bonzini
2016-04-01 9:19 ` Alex Bligh
2016-04-05 5:09 ` Kevin Wolf
2016-04-05 13:28 ` Paolo Bonzini
2016-04-06 13:14 ` Kevin Wolf
2016-04-06 13:28 ` Paolo Bonzini
2016-04-06 13:50 ` Kevin Wolf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=56FD8069.7020101@redhat.com \
--to=eblake@redhat.com \
--cc=alex@alex.org.uk \
--cc=nbd-general@lists.sourceforge.net \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).