Re: [Qemu-devel] [Nbd] [PATCH 3/1] doc: Propose Structured Replies extension

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Wouter Verhelst <w@uter.be>
To: Eric Blake <eblake@redhat.com>
Cc: nbd-general@lists.sourceforge.net, qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [Nbd] [PATCH 3/1] doc: Propose Structured Replies extension
Date: Tue, 29 Mar 2016 20:51:57 +0200	[thread overview]
Message-ID: <20160329185157.GC12469@grep.be> (raw)
In-Reply-To: <56FAC823.8070206@redhat.com>

On Tue, Mar 29, 2016 at 12:23:31PM -0600, Eric Blake wrote:
> On 03/29/2016 11:53 AM, Wouter Verhelst wrote:
> > Hi Eric,
> > 
> > Having read this in more detail now:
> > 
> > On Mon, Mar 28, 2016 at 09:56:36PM -0600, Eric Blake wrote:
> >> +  The server MUST ensure that each read chunk lies within the original
> >> +  offset and length of the original client request, MUST NOT send read
> >> +  chunks that would cover the same offset more than once, and MUST
> >> +  send at least one byte of data in addition to the offset field of
> >> +  each read chunk.  The server MAY send read chunks out of order, and
> >> +  may interleave other responses between read replies.  The server
> >> +  MUST NOT set the error field of a read chunk; if an error occurs, it
> >> +  MAY immediately end the sequence of structured response messages,
> >> +  MUST send the error in the concluding normal response, and SHOULD
> >> +  keep the connection open.  The final non-structured response MUST
> >> +  set an error unless the sum of data sent by all read chunks totals
> >> +  the original client length request.
> > 
> > I'm thinking it would probably be a good idea to have the concluding
> > response (if the error field is nonzero) have an offset too; the server
> > could use that to specify where, exactly, the error occurred (so that a
> > client which sent a very large read request doesn't have to go through a
> > binary search or some such to figure out where the read error happened)
> > 
> > i.e.,
> > 
> > C: read X bytes at offset Y
> > S: (X bytes)
> > S: (error, offset Z)
> 
> Here, I'm assuming that you mean X > Z.

Yes, obviously.

> Unfortunately, I chose the design of 0 or more structured replies
> followed by a normal reply, so that the normal reply is a reliable
> indicator that the read is complete (whether successful or not); and the
> whole goal of the extension is to avoid sending any data payload on a
> normal reply.  I'm not sure how to send the offset in the normal reply
> without violating the premise that a normal reply has no payload.

Oh. I thought you meant for the concluding message to also be a
structured reply with the length field be zero, but you mean for it to
be a non-structured reply message? If so, you should clarify that a bit
more (this wasn't clear to me)...

[...]
> But what we could do is allow for the server to send a structured reply
> data chunk of zero bytes, with the offset in question, as the offset
> where an error occurred, prior to then sending the normal reply with the
> final error indicator.  I guess that also means that if we don't have
> the DF command flag set, the server could then report multiple failed
> reads interspersed among larger successful clusters, when trying to
> recover as much of the failing disk as possible, if each failure is
> reported via a separate structured read of zero bytes.  Hmm, that also
> means that we have to be careful on the wording - if we allow a
> structured reply with 0 data bytes to report an error, after already
> sending a larger reply with partially valid bytes, then that means that
> a client will receive more than one read chunk visiting the same offset,
> so we'd have to make the wording permit that.
> 
> > client now has Z-1 bytes of valid data (with the rest being garbage,
> > plus a read error)
> > 
> > The alternative (in the above) would be that the client has 0 bytes of
> > valid data, and would have to issue another read request to figure out
> > which parts of the data are valid.
> 
> So if I'm understanding you, you are trying to state that the server may
> report the header for X bytes, then fail partway through those X bytes;
> it must still send X bytes, but can then report how many are valid (that
> is, a client must assume that 0 of the X bytes received are valid
> _unless_ the server also reported where it failed).

Yes.

> But I was envisioning the opposite: the server must NOT send X bytes
> unless it knows they are valid; if it encounters a read error at Z,
> then it sends a structured read of Z-1 bytes before the final normal
> message that reports overall failure.  The client then assumes that
> all X bytes received are valid.

The problem with that approach is that it makes it impossible for a
server to use a sendfile()-like system call, where you don't know that
there's a read error until start sending out data to the client (which
implies that you must've already sent out the header).

> But I also documented that the client MAY, but not MUST, abort the read
> at the first error; so the idea of being able to report multiple errors
> and/or send headers prior to learning whether there are read errors
> means that your interpretation is probably safer than mine.

I didn't mean to imply that. I do think that aborting the read at the
first error is probably a good idea. If the error occurs because the
disk is dying, having the server go ahead and try to read more data
anyway is probably not a very good idea (unless instructed to do so by
the user, i.e., client).

> I guess it will help to have actual v2 wording in front of us to further
> fine-tune the wording.

Certainly :-)

> >> +  The client SHOULD immediately close the connection if it detects
> >> +  that the server has sent an offset more than once (whether or not
> >> +  the overlapping data claimed to have the same contents), or if
> >> +  receives the concluding normal reply without an error set but
> >> +  without all bytes covered by read chunk(s). A future extension may
> > 
> > I would reword this to...
> > 
> > The client MAY immediately close the connection if it detects that
> > [...]. The server MUST NOT send an offset more than once.
> > 
> >> +  add a command flag that would allow the server to skip read chunks
> >> +  for portions of the file that read as all zeroes.
> > 
> > Not sure if that part is necessary or helpful, really.
> 
> I envision such an extension in parallel to (or as part of) the proposed
> NBD_CMD_GET_LBA_STATUS (or whatever we name it) - it is slightly more
> efficient to skip reads of holes with a single read command flag than it
> is to first read status to determine where holes are and only then issue
> reads for the non-hole regions.

Sure.

> But I can also buy your argument that such language belongs in the
> extension for sparse reads, and doesn't need to be present in the
> extension for structured reads.

Right, that was my point, mainly.

-- 
< ron> I mean, the main *practical* problem with C++, is there's like a dozen
       people in the world who think they really understand all of its rules,
       and pretty much all of them are just lying to themselves too.
 -- #debian-devel, OFTC, 2016-02-12

next prev parent reply	other threads:[~2016-03-29 18:52 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-28 13:59 [Qemu-devel] [PATCH] doc: Mention proper use of handle Eric Blake
2016-03-29  3:56 ` [Qemu-devel] [PATCH 2/1] doc: More details on flag negotiation Eric Blake
2016-03-29  3:56 ` [Qemu-devel] [PATCH 3/1] doc: Propose Structured Replies extension Eric Blake
2016-03-29  7:33   ` [Qemu-devel] [Nbd] " Wouter Verhelst
2016-03-29  8:24   ` Alex Bligh
2016-03-29 14:21     ` Eric Blake
2016-03-29 14:37       ` Alex Bligh
2016-03-29 15:12         ` Eric Blake
2016-03-29 16:37           ` Wouter Verhelst
2016-03-29 17:34           ` Alex Bligh
2016-03-29 17:45             ` Eric Blake
2016-03-29 18:03               ` Wouter Verhelst
2016-03-29 18:07                 ` Eric Blake
2016-03-29 18:19                   ` Wouter Verhelst
2016-03-29 18:25                     ` Eric Blake
2016-03-29 18:09                 ` Alex Bligh
2016-03-29 17:53   ` Wouter Verhelst
2016-03-29 18:23     ` Eric Blake
2016-03-29 18:51       ` Wouter Verhelst [this message]
2016-03-29 19:06         ` Wouter Verhelst
2016-03-29 19:39         ` Alex Bligh
2016-03-29 20:00           ` Eric Blake
2016-03-29 20:18             ` Alex Bligh
2016-03-29 20:44             ` Alex Bligh
2016-03-29 21:05               ` Wouter Verhelst
2016-03-29 22:05                 ` Alex Bligh
2016-03-29 22:45                   ` Wouter Verhelst
2016-03-29 22:53                     ` Alex Bligh
2016-03-29  7:11 ` [Qemu-devel] [PATCH] doc: Mention proper use of handle Wouter Verhelst
2016-03-29 13:59   ` Eric Blake
2016-03-29 23:00 ` [Qemu-devel] [PATCH v2 0/3] NBD Structured Read Eric Blake
2016-03-29 23:00   ` [Qemu-devel] [PATCH v2 1/3] NBD proto: add "Command flags" section Eric Blake
2016-03-29 23:00   ` [Qemu-devel] [PATCH v2 2/3] doc: Mention proper use of handle Eric Blake
2016-03-29 23:01   ` [Qemu-devel] [PATCH v2 3/3] doc: Propose Structured Read extension Eric Blake
2016-03-29 23:29     ` Eric Blake
2016-03-30  6:50     ` Alex Bligh
2016-03-30 17:45       ` Eric Blake
2016-03-30 19:51         ` [Qemu-devel] [Nbd] " Wouter Verhelst
2016-03-30 20:54           ` Eric Blake
2016-03-30 21:26             ` Wouter Verhelst
2016-03-30 22:48         ` [Qemu-devel] " Alex Bligh
2016-03-30 20:44     ` [Qemu-devel] [Nbd] " Wouter Verhelst
2016-03-30  8:09   ` [Qemu-devel] [Nbd] [PATCH v2 0/3] NBD Structured Read Wouter Verhelst

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160329185157.GC12469@grep.be \
    --to=w@uter.be \
    --cc=eblake@redhat.com \
    --cc=nbd-general@lists.sourceforge.net \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).