Re: [Qemu-devel] [Nbd] [PATCH 3/1] doc: Propose Structured Replies extension

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Wouter Verhelst <w@uter.be>
To: Eric Blake <eblake@redhat.com>
Cc: nbd-general@lists.sourceforge.net, qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [Nbd] [PATCH 3/1] doc: Propose Structured Replies extension
Date: Tue, 29 Mar 2016 20:51:57 +0200	[thread overview]
Message-ID: <20160329185157.GC12469@grep.be> (raw)
In-Reply-To: <56FAC823.8070206@redhat.com>

On Tue, Mar 29, 2016 at 12:23:31PM -0600, Eric Blake wrote:
> On 03/29/2016 11:53 AM, Wouter Verhelst wrote:
> > Hi Eric,
> > 
> > Having read this in more detail now:
> > 
> > On Mon, Mar 28, 2016 at 09:56:36PM -0600, Eric Blake wrote:
> >> +  The server MUST ensure that each read chunk lies within the original
> >> +  offset and length of the original client request, MUST NOT send read
> >> +  chunks that would cover the same offset more than once, and MUST
> >> +  send at least one byte of data in addition to the offset field of
> >> +  each read chunk.  The server MAY send read chunks out of order, and
> >> +  may interleave other responses between read replies.  The server
> >> +  MUST NOT set the error field of a read chunk; if an error occurs, it
> >> +  MAY immediately end the sequence of structured response messages,
> >> +  MUST send the error in the concluding normal response, and SHOULD
> >> +  keep the connection open.  The final non-structured response MUST
> >> +  set an error unless the sum of data sent by all read chunks totals
> >> +  the original client length request.
> > 
> > I'm thinking it would probably be a good idea to have the concluding
> > response (if the error field is nonzero) have an offset too; the server
> > could use that to specify where, exactly, the error occurred (so that a
> > client which sent a very large read request doesn't have to go through a
> > binary search or some such to figure out where the read error happened)
> > 
> > i.e.,
> > 
> > C: read X bytes at offset Y
> > S: (X bytes)
> > S: (error, offset Z)
> 
> Here, I'm assuming that you mean X > Z.

Yes, obviously.

> Unfortunately, I chose the design of 0 or more structured replies
> followed by a normal reply, so that the normal reply is a reliable
> indicator that the read is complete (whether successful or not); and the
> whole goal of the extension is to avoid sending any data payload on a
> normal reply.  I'm not sure how to send the offset in the normal reply
> without violating the premise that a normal reply has no payload.

Oh. I thought you meant for the concluding message to also be a
structured reply with the length field be zero, but you mean for it to
be a non-structured reply message? If so, you should clarify that a bit
more (this wasn't clear to me)...

[...]
> But what we could do is allow for the server to send a structured reply
> data chunk of zero bytes, with the offset in question, as the offset
> where an error occurred, prior to then sending the normal reply with the
> final error indicator.  I guess that also means that if we don't have
> the DF command flag set, the server could then report multiple failed
> reads interspersed among larger successful clusters, when trying to
> recover as much of the failing disk as possible, if each failure is
> reported via a separate structured read of zero bytes.  Hmm, that also
> means that we have to be careful on the wording - if we allow a
> structured reply with 0 data bytes to report an error, after already
> sending a larger reply with partially valid bytes, then that means that
> a client will receive more than one read chunk visiting the same offset,
> so we'd have to make the wording permit that.
> 
> > client now has Z-1 bytes of valid data (with the rest being garbage,
> > plus a read error)
> > 
> > The alternative (in the above) would be that the client has 0 bytes of
> > valid data, and would have to issue another read request to figure out
> > which parts of the data are valid.
> 
> So if I'm understanding you, you are trying to state that the server may
> report the header for X bytes, then fail partway through those X bytes;
> it must still send X bytes, but can then report how many are valid (that
> is, a client must assume that 0 of the X bytes received are valid
> _unless_ the server also reported where it failed).

Yes.

> But I was envisioning the opposite: the server must NOT send X bytes
> unless it knows they are valid; if it encounters a read error at Z,
> then it sends a structured read of Z-1 bytes before the final normal
> message that reports overall failure.  The client then assumes that
> all X bytes received are valid.

The problem with that approach is that it makes it impossible for a
server to use a sendfile()-like system call, where you don't know that
there's a read error until start sending out data to the client (which
implies that you must've already sent out the header).

> But I also documented that the client MAY, but not MUST, abort the read
> at the first error; so the idea of being able to report multiple errors
> and/or send headers prior to learning whether there are read errors
> means that your interpretation is probably safer than mine.

I didn't mean to imply that. I do think that aborting the read at the
first error is probably a good idea. If the error occurs because the
disk is dying, having the server go ahead and try to read more data
anyway is probably not a very good idea (unless instructed to do so by
the user, i.e., client).

> I guess it will help to have actual v2 wording in front of us to further
> fine-tune the wording.

Certainly :-)

> >> +  The client SHOULD immediately close the connection if it detects
> >> +  that the server has sent an offset more than once (whether or not
> >> +  the overlapping data claimed to have the same contents), or if
> >> +  receives the concluding normal reply without an error set but
> >> +  without all bytes covered by read chunk(s). A future extension may
> > 
> > I would reword this to...
> > 
> > The client MAY immediately close the connection if it detects that
> > [...]. The server MUST NOT send an offset more than once.
> > 
> >> +  add a command flag that would allow the server to skip read chunks
> >> +  for portions of the file that read as all zeroes.
> > 
> > Not sure if that part is necessary or helpful, really.
> 
> I envision such an extension in parallel to (or as part of) the proposed
> NBD_CMD_GET_LBA_STATUS (or whatever we name it) - it is slightly more
> efficient to skip reads of holes with a single read command flag than it
> is to first read status to determine where holes are and only then issue
> reads for the non-hole regions.

Sure.

> But I can also buy your argument that such language belongs in the
> extension for sparse reads, and doesn't need to be present in the
> extension for structured reads.

Right, that was my point, mainly.

-- 
< ron> I mean, the main *practical* problem with C++, is there's like a dozen
       people in the world who think they really understand all of its rules,
       and pretty much all of them are just lying to themselves too.
 -- #debian-devel, OFTC, 2016-02-12

next prev parent reply	other threads:[~2016-03-29 18:52 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-28 13:59 [Qemu-devel] [PATCH] doc: Mention proper use of handle Eric Blake
2016-03-29  3:56 ` [Qemu-devel] [PATCH 2/1] doc: More details on flag negotiation Eric Blake
2016-03-29  3:56 ` [Qemu-devel] [PATCH 3/1] doc: Propose Structured Replies extension Eric Blake
2016-03-29  7:33   ` [Qemu-devel] [Nbd] " Wouter Verhelst
2016-03-29  8:24   ` Alex Bligh
2016-03-29 14:21     ` Eric Blake
2016-03-29 14:37       ` Alex Bligh
2016-03-29 15:12         ` Eric Blake
2016-03-29 16:37           ` Wouter Verhelst
2016-03-29 17:34           ` Alex Bligh
2016-03-29 17:45             ` Eric Blake
2016-03-29 18:03               ` Wouter Verhelst
2016-03-29 18:07                 ` Eric Blake
2016-03-29 18:19                   ` Wouter Verhelst
2016-03-29 18:25                     ` Eric Blake
2016-03-29 18:09                 ` Alex Bligh
2016-03-29 17:53   ` Wouter Verhelst
2016-03-29 18:23     ` Eric Blake
2016-03-29 18:51       ` Wouter Verhelst [this message]
2016-03-29 19:06         ` Wouter Verhelst
2016-03-29 19:39         ` Alex Bligh
2016-03-29 20:00           ` Eric Blake
2016-03-29 20:18             ` Alex Bligh
2016-03-29 20:44             ` Alex Bligh
2016-03-29 21:05               ` Wouter Verhelst
2016-03-29 22:05                 ` Alex Bligh
2016-03-29 22:45                   ` Wouter Verhelst
2016-03-29 22:53                     ` Alex Bligh
2016-03-29  7:11 ` [Qemu-devel] [PATCH] doc: Mention proper use of handle Wouter Verhelst
2016-03-29 13:59   ` Eric Blake
2016-03-29 23:00 ` [Qemu-devel] [PATCH v2 0/3] NBD Structured Read Eric Blake
2016-03-29 23:00   ` [Qemu-devel] [PATCH v2 1/3] NBD proto: add "Command flags" section Eric Blake
2016-03-29 23:00   ` [Qemu-devel] [PATCH v2 2/3] doc: Mention proper use of handle Eric Blake
2016-03-29 23:01   ` [Qemu-devel] [PATCH v2 3/3] doc: Propose Structured Read extension Eric Blake
2016-03-29 23:29     ` Eric Blake
2016-03-30  6:50     ` Alex Bligh
2016-03-30 17:45       ` Eric Blake
2016-03-30 19:51         ` [Qemu-devel] [Nbd] " Wouter Verhelst
2016-03-30 20:54           ` Eric Blake
2016-03-30 21:26             ` Wouter Verhelst
2016-03-30 22:48         ` [Qemu-devel] " Alex Bligh
2016-03-30 20:44     ` [Qemu-devel] [Nbd] " Wouter Verhelst
2016-03-30  8:09   ` [Qemu-devel] [Nbd] [PATCH v2 0/3] NBD Structured Read Wouter Verhelst

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160329185157.GC12469@grep.be \
    --to=w@uter.be \
    --cc=eblake@redhat.com \
    --cc=nbd-general@lists.sourceforge.net \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.