From: Wouter Verhelst <w@uter.be>
To: Eric Blake <eblake@redhat.com>
Cc: nbd-general@lists.sourceforge.net, qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [Nbd] [PATCH 3/1] doc: Propose Structured Replies extension
Date: Tue, 29 Mar 2016 20:51:57 +0200 [thread overview]
Message-ID: <20160329185157.GC12469@grep.be> (raw)
In-Reply-To: <56FAC823.8070206@redhat.com>
On Tue, Mar 29, 2016 at 12:23:31PM -0600, Eric Blake wrote:
> On 03/29/2016 11:53 AM, Wouter Verhelst wrote:
> > Hi Eric,
> >
> > Having read this in more detail now:
> >
> > On Mon, Mar 28, 2016 at 09:56:36PM -0600, Eric Blake wrote:
> >> + The server MUST ensure that each read chunk lies within the original
> >> + offset and length of the original client request, MUST NOT send read
> >> + chunks that would cover the same offset more than once, and MUST
> >> + send at least one byte of data in addition to the offset field of
> >> + each read chunk. The server MAY send read chunks out of order, and
> >> + may interleave other responses between read replies. The server
> >> + MUST NOT set the error field of a read chunk; if an error occurs, it
> >> + MAY immediately end the sequence of structured response messages,
> >> + MUST send the error in the concluding normal response, and SHOULD
> >> + keep the connection open. The final non-structured response MUST
> >> + set an error unless the sum of data sent by all read chunks totals
> >> + the original client length request.
> >
> > I'm thinking it would probably be a good idea to have the concluding
> > response (if the error field is nonzero) have an offset too; the server
> > could use that to specify where, exactly, the error occurred (so that a
> > client which sent a very large read request doesn't have to go through a
> > binary search or some such to figure out where the read error happened)
> >
> > i.e.,
> >
> > C: read X bytes at offset Y
> > S: (X bytes)
> > S: (error, offset Z)
>
> Here, I'm assuming that you mean X > Z.
Yes, obviously.
> Unfortunately, I chose the design of 0 or more structured replies
> followed by a normal reply, so that the normal reply is a reliable
> indicator that the read is complete (whether successful or not); and the
> whole goal of the extension is to avoid sending any data payload on a
> normal reply. I'm not sure how to send the offset in the normal reply
> without violating the premise that a normal reply has no payload.
Oh. I thought you meant for the concluding message to also be a
structured reply with the length field be zero, but you mean for it to
be a non-structured reply message? If so, you should clarify that a bit
more (this wasn't clear to me)...
[...]
> But what we could do is allow for the server to send a structured reply
> data chunk of zero bytes, with the offset in question, as the offset
> where an error occurred, prior to then sending the normal reply with the
> final error indicator. I guess that also means that if we don't have
> the DF command flag set, the server could then report multiple failed
> reads interspersed among larger successful clusters, when trying to
> recover as much of the failing disk as possible, if each failure is
> reported via a separate structured read of zero bytes. Hmm, that also
> means that we have to be careful on the wording - if we allow a
> structured reply with 0 data bytes to report an error, after already
> sending a larger reply with partially valid bytes, then that means that
> a client will receive more than one read chunk visiting the same offset,
> so we'd have to make the wording permit that.
>
> > client now has Z-1 bytes of valid data (with the rest being garbage,
> > plus a read error)
> >
> > The alternative (in the above) would be that the client has 0 bytes of
> > valid data, and would have to issue another read request to figure out
> > which parts of the data are valid.
>
> So if I'm understanding you, you are trying to state that the server may
> report the header for X bytes, then fail partway through those X bytes;
> it must still send X bytes, but can then report how many are valid (that
> is, a client must assume that 0 of the X bytes received are valid
> _unless_ the server also reported where it failed).
Yes.
> But I was envisioning the opposite: the server must NOT send X bytes
> unless it knows they are valid; if it encounters a read error at Z,
> then it sends a structured read of Z-1 bytes before the final normal
> message that reports overall failure. The client then assumes that
> all X bytes received are valid.
The problem with that approach is that it makes it impossible for a
server to use a sendfile()-like system call, where you don't know that
there's a read error until start sending out data to the client (which
implies that you must've already sent out the header).
> But I also documented that the client MAY, but not MUST, abort the read
> at the first error; so the idea of being able to report multiple errors
> and/or send headers prior to learning whether there are read errors
> means that your interpretation is probably safer than mine.
I didn't mean to imply that. I do think that aborting the read at the
first error is probably a good idea. If the error occurs because the
disk is dying, having the server go ahead and try to read more data
anyway is probably not a very good idea (unless instructed to do so by
the user, i.e., client).
> I guess it will help to have actual v2 wording in front of us to further
> fine-tune the wording.
Certainly :-)
> >> + The client SHOULD immediately close the connection if it detects
> >> + that the server has sent an offset more than once (whether or not
> >> + the overlapping data claimed to have the same contents), or if
> >> + receives the concluding normal reply without an error set but
> >> + without all bytes covered by read chunk(s). A future extension may
> >
> > I would reword this to...
> >
> > The client MAY immediately close the connection if it detects that
> > [...]. The server MUST NOT send an offset more than once.
> >
> >> + add a command flag that would allow the server to skip read chunks
> >> + for portions of the file that read as all zeroes.
> >
> > Not sure if that part is necessary or helpful, really.
>
> I envision such an extension in parallel to (or as part of) the proposed
> NBD_CMD_GET_LBA_STATUS (or whatever we name it) - it is slightly more
> efficient to skip reads of holes with a single read command flag than it
> is to first read status to determine where holes are and only then issue
> reads for the non-hole regions.
Sure.
> But I can also buy your argument that such language belongs in the
> extension for sparse reads, and doesn't need to be present in the
> extension for structured reads.
Right, that was my point, mainly.
--
< ron> I mean, the main *practical* problem with C++, is there's like a dozen
people in the world who think they really understand all of its rules,
and pretty much all of them are just lying to themselves too.
-- #debian-devel, OFTC, 2016-02-12
next prev parent reply other threads:[~2016-03-29 18:52 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-03-28 13:59 [Qemu-devel] [PATCH] doc: Mention proper use of handle Eric Blake
2016-03-29 3:56 ` [Qemu-devel] [PATCH 2/1] doc: More details on flag negotiation Eric Blake
2016-03-29 3:56 ` [Qemu-devel] [PATCH 3/1] doc: Propose Structured Replies extension Eric Blake
2016-03-29 7:33 ` [Qemu-devel] [Nbd] " Wouter Verhelst
2016-03-29 8:24 ` Alex Bligh
2016-03-29 14:21 ` Eric Blake
2016-03-29 14:37 ` Alex Bligh
2016-03-29 15:12 ` Eric Blake
2016-03-29 16:37 ` Wouter Verhelst
2016-03-29 17:34 ` Alex Bligh
2016-03-29 17:45 ` Eric Blake
2016-03-29 18:03 ` Wouter Verhelst
2016-03-29 18:07 ` Eric Blake
2016-03-29 18:19 ` Wouter Verhelst
2016-03-29 18:25 ` Eric Blake
2016-03-29 18:09 ` Alex Bligh
2016-03-29 17:53 ` Wouter Verhelst
2016-03-29 18:23 ` Eric Blake
2016-03-29 18:51 ` Wouter Verhelst [this message]
2016-03-29 19:06 ` Wouter Verhelst
2016-03-29 19:39 ` Alex Bligh
2016-03-29 20:00 ` Eric Blake
2016-03-29 20:18 ` Alex Bligh
2016-03-29 20:44 ` Alex Bligh
2016-03-29 21:05 ` Wouter Verhelst
2016-03-29 22:05 ` Alex Bligh
2016-03-29 22:45 ` Wouter Verhelst
2016-03-29 22:53 ` Alex Bligh
2016-03-29 7:11 ` [Qemu-devel] [PATCH] doc: Mention proper use of handle Wouter Verhelst
2016-03-29 13:59 ` Eric Blake
2016-03-29 23:00 ` [Qemu-devel] [PATCH v2 0/3] NBD Structured Read Eric Blake
2016-03-29 23:00 ` [Qemu-devel] [PATCH v2 1/3] NBD proto: add "Command flags" section Eric Blake
2016-03-29 23:00 ` [Qemu-devel] [PATCH v2 2/3] doc: Mention proper use of handle Eric Blake
2016-03-29 23:01 ` [Qemu-devel] [PATCH v2 3/3] doc: Propose Structured Read extension Eric Blake
2016-03-29 23:29 ` Eric Blake
2016-03-30 6:50 ` Alex Bligh
2016-03-30 17:45 ` Eric Blake
2016-03-30 19:51 ` [Qemu-devel] [Nbd] " Wouter Verhelst
2016-03-30 20:54 ` Eric Blake
2016-03-30 21:26 ` Wouter Verhelst
2016-03-30 22:48 ` [Qemu-devel] " Alex Bligh
2016-03-30 20:44 ` [Qemu-devel] [Nbd] " Wouter Verhelst
2016-03-30 8:09 ` [Qemu-devel] [Nbd] [PATCH v2 0/3] NBD Structured Read Wouter Verhelst
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160329185157.GC12469@grep.be \
--to=w@uter.be \
--cc=eblake@redhat.com \
--cc=nbd-general@lists.sourceforge.net \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).