qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Eric Blake <eblake@redhat.com>
To: Wouter Verhelst <w@uter.be>
Cc: nbd-general@lists.sourceforge.net, qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [Nbd] [PATCH 3/1] doc: Propose Structured Replies extension
Date: Tue, 29 Mar 2016 12:23:31 -0600	[thread overview]
Message-ID: <56FAC823.8070206@redhat.com> (raw)
In-Reply-To: <20160329175319.GA8628@grep.be>

[-- Attachment #1: Type: text/plain, Size: 5511 bytes --]

On 03/29/2016 11:53 AM, Wouter Verhelst wrote:
> Hi Eric,
> 
> Having read this in more detail now:
> 
> On Mon, Mar 28, 2016 at 09:56:36PM -0600, Eric Blake wrote:
>> +  The server MUST ensure that each read chunk lies within the original
>> +  offset and length of the original client request, MUST NOT send read
>> +  chunks that would cover the same offset more than once, and MUST
>> +  send at least one byte of data in addition to the offset field of
>> +  each read chunk.  The server MAY send read chunks out of order, and
>> +  may interleave other responses between read replies.  The server
>> +  MUST NOT set the error field of a read chunk; if an error occurs, it
>> +  MAY immediately end the sequence of structured response messages,
>> +  MUST send the error in the concluding normal response, and SHOULD
>> +  keep the connection open.  The final non-structured response MUST
>> +  set an error unless the sum of data sent by all read chunks totals
>> +  the original client length request.
> 
> I'm thinking it would probably be a good idea to have the concluding
> response (if the error field is nonzero) have an offset too; the server
> could use that to specify where, exactly, the error occurred (so that a
> client which sent a very large read request doesn't have to go through a
> binary search or some such to figure out where the read error happened)
> 
> i.e.,
> 
> C: read X bytes at offset Y
> S: (X bytes)
> S: (error, offset Z)

Here, I'm assuming that you mean X > Z.

Unfortunately, I chose the design of 0 or more structured replies
followed by a normal reply, so that the normal reply is a reliable
indicator that the read is complete (whether successful or not); and the
whole goal of the extension is to avoid sending any data payload on a
normal reply.  I'm not sure how to send the offset in the normal reply
without violating the premise that a normal reply has no payload.

But what we could do is allow for the server to send a structured reply
data chunk of zero bytes, with the offset in question, as the offset
where an error occurred, prior to then sending the normal reply with the
final error indicator.  I guess that also means that if we don't have
the DF command flag set, the server could then report multiple failed
reads interspersed among larger successful clusters, when trying to
recover as much of the failing disk as possible, if each failure is
reported via a separate structured read of zero bytes.  Hmm, that also
means that we have to be careful on the wording - if we allow a
structured reply with 0 data bytes to report an error, after already
sending a larger reply with partially valid bytes, then that means that
a client will receive more than one read chunk visiting the same offset,
so we'd have to make the wording permit that.

> client now has Z-1 bytes of valid data (with the rest being garbage,
> plus a read error)
> 
> The alternative (in the above) would be that the client has 0 bytes of
> valid data, and would have to issue another read request to figure out
> which parts of the data are valid.

So if I'm understanding you, you are trying to state that the server may
report the header for X bytes, then fail partway through those X bytes;
it must still send X bytes, but can then report how many are valid (that
is, a client must assume that 0 of the X bytes received are valid
_unless_ the server also reported where it failed).  But I was
envisioning the opposite: the server must NOT send X bytes unless it
knows they are valid; if it encounters a read error at Z, then it sends
a structured read of Z-1 bytes before the final normal message that
reports overall failure.  The client then assumes that all X bytes
received are valid.

But I also documented that the client MAY, but not MUST, abort the read
at the first error; so the idea of being able to report multiple errors
and/or send headers prior to learning whether there are read errors
means that your interpretation is probably safer than mine.

I guess it will help to have actual v2 wording in front of us to further
fine-tune the wording.

> 
>> +  The client SHOULD immediately close the connection if it detects
>> +  that the server has sent an offset more than once (whether or not
>> +  the overlapping data claimed to have the same contents), or if
>> +  receives the concluding normal reply without an error set but
>> +  without all bytes covered by read chunk(s). A future extension may
> 
> I would reword this to...
> 
> The client MAY immediately close the connection if it detects that
> [...]. The server MUST NOT send an offset more than once.
> 
>> +  add a command flag that would allow the server to skip read chunks
>> +  for portions of the file that read as all zeroes.
> 
> Not sure if that part is necessary or helpful, really.

I envision such an extension in parallel to (or as part of) the proposed
NBD_CMD_GET_LBA_STATUS (or whatever we name it) - it is slightly more
efficient to skip reads of holes with a single read command flag than it
is to first read status to determine where holes are and only then issue
reads for the non-hole regions.  But I can also buy your argument that
such language belongs in the extension for sparse reads, and doesn't
need to be present in the extension for structured reads.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

  reply	other threads:[~2016-03-29 18:23 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-28 13:59 [Qemu-devel] [PATCH] doc: Mention proper use of handle Eric Blake
2016-03-29  3:56 ` [Qemu-devel] [PATCH 2/1] doc: More details on flag negotiation Eric Blake
2016-03-29  3:56 ` [Qemu-devel] [PATCH 3/1] doc: Propose Structured Replies extension Eric Blake
2016-03-29  7:33   ` [Qemu-devel] [Nbd] " Wouter Verhelst
2016-03-29  8:24   ` Alex Bligh
2016-03-29 14:21     ` Eric Blake
2016-03-29 14:37       ` Alex Bligh
2016-03-29 15:12         ` Eric Blake
2016-03-29 16:37           ` Wouter Verhelst
2016-03-29 17:34           ` Alex Bligh
2016-03-29 17:45             ` Eric Blake
2016-03-29 18:03               ` Wouter Verhelst
2016-03-29 18:07                 ` Eric Blake
2016-03-29 18:19                   ` Wouter Verhelst
2016-03-29 18:25                     ` Eric Blake
2016-03-29 18:09                 ` Alex Bligh
2016-03-29 17:53   ` Wouter Verhelst
2016-03-29 18:23     ` Eric Blake [this message]
2016-03-29 18:51       ` Wouter Verhelst
2016-03-29 19:06         ` Wouter Verhelst
2016-03-29 19:39         ` Alex Bligh
2016-03-29 20:00           ` Eric Blake
2016-03-29 20:18             ` Alex Bligh
2016-03-29 20:44             ` Alex Bligh
2016-03-29 21:05               ` Wouter Verhelst
2016-03-29 22:05                 ` Alex Bligh
2016-03-29 22:45                   ` Wouter Verhelst
2016-03-29 22:53                     ` Alex Bligh
2016-03-29  7:11 ` [Qemu-devel] [PATCH] doc: Mention proper use of handle Wouter Verhelst
2016-03-29 13:59   ` Eric Blake
2016-03-29 23:00 ` [Qemu-devel] [PATCH v2 0/3] NBD Structured Read Eric Blake
2016-03-29 23:00   ` [Qemu-devel] [PATCH v2 1/3] NBD proto: add "Command flags" section Eric Blake
2016-03-29 23:00   ` [Qemu-devel] [PATCH v2 2/3] doc: Mention proper use of handle Eric Blake
2016-03-29 23:01   ` [Qemu-devel] [PATCH v2 3/3] doc: Propose Structured Read extension Eric Blake
2016-03-29 23:29     ` Eric Blake
2016-03-30  6:50     ` Alex Bligh
2016-03-30 17:45       ` Eric Blake
2016-03-30 19:51         ` [Qemu-devel] [Nbd] " Wouter Verhelst
2016-03-30 20:54           ` Eric Blake
2016-03-30 21:26             ` Wouter Verhelst
2016-03-30 22:48         ` [Qemu-devel] " Alex Bligh
2016-03-30 20:44     ` [Qemu-devel] [Nbd] " Wouter Verhelst
2016-03-30  8:09   ` [Qemu-devel] [Nbd] [PATCH v2 0/3] NBD Structured Read Wouter Verhelst

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56FAC823.8070206@redhat.com \
    --to=eblake@redhat.com \
    --cc=nbd-general@lists.sourceforge.net \
    --cc=qemu-devel@nongnu.org \
    --cc=w@uter.be \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).