[PATCH v7 00/12] NBD 64-bit extensions for qemu

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v7 00/12] NBD 64-bit extensions for qemu
@ 2023-09-25 19:22 Eric Blake
  2023-09-25 19:22 ` [PATCH v7 01/12] nbd/server: Support a request payload Eric Blake
                   ` (11 more replies)
  0 siblings, 12 replies; 30+ messages in thread
From: Eric Blake @ 2023-09-25 19:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: libguestfs, qemu-block, vsementsov

v6 was here:
https://lists.gnu.org/archive/html/qemu-devel/2023-08/msg05231.html

Since then:
 - patches v6 1-5 included in pull request
 - patch v6 6 logic improved, now v7 patch 1
 - rebased to master

Still needing review:
 - patch 1,6,7,11,12

Eric Blake (12):
  nbd/server: Support a request payload
  nbd/server: Prepare to receive extended header requests
  nbd/server: Prepare to send extended header replies
  nbd/server: Support 64-bit block status
  nbd/server: Enable initial support for extended headers
  nbd/client: Plumb errp through nbd_receive_replies
  nbd/client: Initial support for extended headers
  nbd/client: Accept 64-bit block status chunks
  nbd/client: Request extended headers during negotiation
  nbd/server: Refactor list of negotiated meta contexts
  nbd/server: Prepare for per-request filtering of BLOCK_STATUS
  nbd/server: Add FLAG_PAYLOAD support to CMD_BLOCK_STATUS

 docs/interop/nbd.txt                          |   1 +
 include/block/nbd.h                           |   5 +-
 nbd/nbd-internal.h                            |   5 +-
 block/nbd.c                                   |  67 ++-
 nbd/client-connection.c                       |   2 +-
 nbd/client.c                                  | 124 ++++--
 nbd/server.c                                  | 418 ++++++++++++++----
 qemu-nbd.c                                    |   4 +
 block/trace-events                            |   1 +
 nbd/trace-events                              |   5 +-
 tests/qemu-iotests/223.out                    |  18 +-
 tests/qemu-iotests/233.out                    |   4 +
 tests/qemu-iotests/241.out                    |   3 +
 tests/qemu-iotests/307.out                    |  15 +-
 .../tests/nbd-qemu-allocation.out             |   3 +-
 15 files changed, 516 insertions(+), 159 deletions(-)

-- 
2.41.0



^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v7 01/12] nbd/server: Support a request payload
  2023-09-25 19:22 [PATCH v7 00/12] NBD 64-bit extensions for qemu Eric Blake
@ 2023-09-25 19:22 ` Eric Blake
  2023-09-27  8:55   ` Vladimir Sementsov-Ogievskiy
                     ` (2 more replies)
  2023-09-25 19:22 ` [PATCH v7 02/12] nbd/server: Prepare to receive extended header requests Eric Blake
                   ` (10 subsequent siblings)
  11 siblings, 3 replies; 30+ messages in thread
From: Eric Blake @ 2023-09-25 19:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: libguestfs, qemu-block, vsementsov

Upcoming additions to support NBD 64-bit effect lengths allow for the
possibility to distinguish between payload length (capped at 32M) and
effect length (64 bits, although we generally assume 63 bits because
of off_t limitations).  Without that extension, only the NBD_CMD_WRITE
request has a payload; but with the extension, it makes sense to allow
at least NBD_CMD_BLOCK_STATUS to have both a payload and effect length
in a future patch (where the payload is a limited-size struct that in
turn gives the real effect length as well as a subset of known ids for
which status is requested).  Other future NBD commands may also have a
request payload, so the 64-bit extension introduces a new
NBD_CMD_FLAG_PAYLOAD_LEN that distinguishes between whether the header
length is a payload length or an effect length, rather than
hard-coding the decision based on the command.

According to the spec, a client should never send a command with a
payload without the negotiation phase proving such extension is
available.  So in the unlikely event the bit is set or cleared
incorrectly, the client is already at fault; if the client then
provides the payload, we can gracefully consume it off the wire and
fail the command with NBD_EINVAL (subsequent checks for magic numbers
ensure we are still in sync), while if the client fails to send
payload we block waiting for it (basically deadlocking our connection
to the bad client, but not negatively impacting our ability to service
other clients, so not a security risk).  Note that we do not support
the payload version of BLOCK_STATUS yet.

Signed-off-by: Eric Blake <eblake@redhat.com>
---

v7: another try at better logic [Vladimir]

v5: retitled from v4 13/24, rewrite on top of previous patch's switch
statement [Vladimir]

v4: less indentation on several 'if's [Vladimir]
---
 nbd/server.c     | 37 +++++++++++++++++++++++++++++++++----
 nbd/trace-events |  1 +
 2 files changed, 34 insertions(+), 4 deletions(-)

diff --git a/nbd/server.c b/nbd/server.c
index 7a6f95071f8..1eabcfc908d 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -2322,9 +2322,11 @@ static int coroutine_fn nbd_co_receive_request(NBDRequestData *req,
                                                Error **errp)
 {
     NBDClient *client = req->client;
+    bool extended_with_payload;
     bool check_length = false;
     bool check_rofs = false;
     bool allocate_buffer = false;
+    bool payload_okay = false;
     unsigned payload_len = 0;
     int valid_flags = NBD_CMD_FLAG_FUA;
     int ret;
@@ -2338,6 +2340,13 @@ static int coroutine_fn nbd_co_receive_request(NBDRequestData *req,

     trace_nbd_co_receive_request_decode_type(request->cookie, request->type,
                                              nbd_cmd_lookup(request->type));
+    extended_with_payload = client->mode >= NBD_MODE_EXTENDED &&
+        request->flags & NBD_CMD_FLAG_PAYLOAD_LEN;
+    if (extended_with_payload) {
+        payload_len = request->len;
+        check_length = true;
+    }
+
     switch (request->type) {
     case NBD_CMD_DISC:
         /* Special case: we're going to disconnect without a reply,
@@ -2354,6 +2363,15 @@ static int coroutine_fn nbd_co_receive_request(NBDRequestData *req,
         break;

     case NBD_CMD_WRITE:
+        if (client->mode >= NBD_MODE_EXTENDED) {
+            if (!extended_with_payload) {
+                /* The client is noncompliant. Trace it, but proceed. */
+                trace_nbd_co_receive_ext_payload_compliance(request->from,
+                                                            request->len);
+            }
+            valid_flags |= NBD_CMD_FLAG_PAYLOAD_LEN;
+        }
+        payload_okay = true;
         payload_len = request->len;
         check_length = true;
         allocate_buffer = true;
@@ -2395,6 +2413,14 @@ static int coroutine_fn nbd_co_receive_request(NBDRequestData *req,
                    request->len, NBD_MAX_BUFFER_SIZE);
         return -EINVAL;
     }
+    if (payload_len && !payload_okay) {
+        /*
+         * For now, we don't support payloads on other commands; but
+         * we can keep the connection alive by ignoring the payload.
+         */
+        assert(request->type != NBD_CMD_WRITE);
+        request->len = 0;
+    }
     if (allocate_buffer) {
         /* READ, WRITE */
         req->data = blk_try_blockalign(client->exp->common.blk,
@@ -2405,10 +2431,13 @@ static int coroutine_fn nbd_co_receive_request(NBDRequestData *req,
         }
     }
     if (payload_len) {
-        /* WRITE */
-        assert(req->data);
-        ret = nbd_read(client->ioc, req->data, payload_len,
-                       "CMD_WRITE data", errp);
+        if (payload_okay) {
+            assert(req->data);
+            ret = nbd_read(client->ioc, req->data, payload_len,
+                           "CMD_WRITE data", errp);
+        } else {
+            ret = nbd_drop(client->ioc, payload_len, errp);
+        }
         if (ret < 0) {
             return -EIO;
         }
diff --git a/nbd/trace-events b/nbd/trace-events
index f9dccfcfb44..c1a3227613f 100644
--- a/nbd/trace-events
+++ b/nbd/trace-events
@@ -71,6 +71,7 @@ nbd_co_send_extents(uint64_t cookie, unsigned int extents, uint32_t id, uint64_t
 nbd_co_send_chunk_error(uint64_t cookie, int err, const char *errname, const char *msg) "Send structured error reply: cookie = %" PRIu64 ", error = %d (%s), msg = '%s'"
 nbd_co_receive_request_decode_type(uint64_t cookie, uint16_t type, const char *name) "Decoding type: cookie = %" PRIu64 ", type = %" PRIu16 " (%s)"
 nbd_co_receive_request_payload_received(uint64_t cookie, uint64_t len) "Payload received: cookie = %" PRIu64 ", len = %" PRIu64
+nbd_co_receive_ext_payload_compliance(uint64_t from, uint64_t len) "client sent non-compliant write without payload flag: from=0x%" PRIx64 ", len=0x%" PRIx64
 nbd_co_receive_align_compliance(const char *op, uint64_t from, uint64_t len, uint32_t align) "client sent non-compliant unaligned %s request: from=0x%" PRIx64 ", len=0x%" PRIx64 ", align=0x%" PRIx32
 nbd_trip(void) "Reading request"

-- 
2.41.0



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v7 02/12] nbd/server: Prepare to receive extended header requests
  2023-09-25 19:22 [PATCH v7 00/12] NBD 64-bit extensions for qemu Eric Blake
  2023-09-25 19:22 ` [PATCH v7 01/12] nbd/server: Support a request payload Eric Blake
@ 2023-09-25 19:22 ` Eric Blake
  2023-09-25 19:22 ` [PATCH v7 03/12] nbd/server: Prepare to send extended header replies Eric Blake
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 30+ messages in thread
From: Eric Blake @ 2023-09-25 19:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: libguestfs, qemu-block, vsementsov

Although extended mode is not yet enabled, once we do turn it on, we
need to accept extended requests for all messages.  Previous patches
have already taken care of supporting 64-bit lengths, now we just need
to read it off the wire.

Note that this implementation will block indefinitely on a buggy
client that sends a non-extended payload (that is, we try to read a
full packet before we ever check the magic number, but a client that
mistakenly sends a simple request after negotiating extended headers
doesn't send us enough bytes), but it's no different from any other
client that stops talking to us partway through a packet and thus not
worth coding around.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
---

v6: fix sign extension bug

v5: no change

v4: new patch, split out from v3 9/14
---
 nbd/nbd-internal.h |  5 ++++-
 nbd/server.c       | 43 ++++++++++++++++++++++++++++++-------------
 2 files changed, 34 insertions(+), 14 deletions(-)

diff --git a/nbd/nbd-internal.h b/nbd/nbd-internal.h
index 133b1d94b50..dfa02f77ee4 100644
--- a/nbd/nbd-internal.h
+++ b/nbd/nbd-internal.h
@@ -34,8 +34,11 @@
  * https://github.com/yoe/nbd/blob/master/doc/proto.md
  */

-/* Size of all NBD_OPT_*, without payload */
+/* Size of all compact NBD_CMD_*, without payload */
 #define NBD_REQUEST_SIZE            (4 + 2 + 2 + 8 + 8 + 4)
+/* Size of all extended NBD_CMD_*, without payload */
+#define NBD_EXTENDED_REQUEST_SIZE   (4 + 2 + 2 + 8 + 8 + 8)
+
 /* Size of all NBD_REP_* sent in answer to most NBD_OPT_*, without payload */
 #define NBD_REPLY_SIZE              (4 + 4 + 8)
 /* Size of reply to NBD_OPT_EXPORT_NAME */
diff --git a/nbd/server.c b/nbd/server.c
index 1eabcfc908d..e227e470d41 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -1411,11 +1411,13 @@ nbd_read_eof(NBDClient *client, void *buffer, size_t size, Error **errp)
 static int coroutine_fn nbd_receive_request(NBDClient *client, NBDRequest *request,
                                             Error **errp)
 {
-    uint8_t buf[NBD_REQUEST_SIZE];
-    uint32_t magic;
+    uint8_t buf[NBD_EXTENDED_REQUEST_SIZE];
+    uint32_t magic, expect;
     int ret;
+    size_t size = client->mode >= NBD_MODE_EXTENDED ?
+        NBD_EXTENDED_REQUEST_SIZE : NBD_REQUEST_SIZE;

-    ret = nbd_read_eof(client, buf, sizeof(buf), errp);
+    ret = nbd_read_eof(client, buf, size, errp);
     if (ret < 0) {
         return ret;
     }
@@ -1423,13 +1425,21 @@ static int coroutine_fn nbd_receive_request(NBDClient *client, NBDRequest *reque
         return -EIO;
     }

-    /* Request
-       [ 0 ..  3]   magic   (NBD_REQUEST_MAGIC)
-       [ 4 ..  5]   flags   (NBD_CMD_FLAG_FUA, ...)
-       [ 6 ..  7]   type    (NBD_CMD_READ, ...)
-       [ 8 .. 15]   cookie
-       [16 .. 23]   from
-       [24 .. 27]   len
+    /*
+     * Compact request
+     *  [ 0 ..  3]   magic   (NBD_REQUEST_MAGIC)
+     *  [ 4 ..  5]   flags   (NBD_CMD_FLAG_FUA, ...)
+     *  [ 6 ..  7]   type    (NBD_CMD_READ, ...)
+     *  [ 8 .. 15]   cookie
+     *  [16 .. 23]   from
+     *  [24 .. 27]   len
+     * Extended request
+     *  [ 0 ..  3]   magic   (NBD_EXTENDED_REQUEST_MAGIC)
+     *  [ 4 ..  5]   flags   (NBD_CMD_FLAG_FUA, NBD_CMD_FLAG_PAYLOAD_LEN, ...)
+     *  [ 6 ..  7]   type    (NBD_CMD_READ, ...)
+     *  [ 8 .. 15]   cookie
+     *  [16 .. 23]   from
+     *  [24 .. 31]   len
      */

     magic = ldl_be_p(buf);
@@ -1437,13 +1447,20 @@ static int coroutine_fn nbd_receive_request(NBDClient *client, NBDRequest *reque
     request->type   = lduw_be_p(buf + 6);
     request->cookie = ldq_be_p(buf + 8);
     request->from   = ldq_be_p(buf + 16);
-    request->len    = (uint32_t)ldl_be_p(buf + 24); /* widen 32 to 64 bits */
+    if (client->mode >= NBD_MODE_EXTENDED) {
+        request->len = ldq_be_p(buf + 24);
+        expect = NBD_EXTENDED_REQUEST_MAGIC;
+    } else {
+        request->len = (uint32_t)ldl_be_p(buf + 24); /* widen 32 to 64 bits */
+        expect = NBD_REQUEST_MAGIC;
+    }

     trace_nbd_receive_request(magic, request->flags, request->type,
                               request->from, request->len);

-    if (magic != NBD_REQUEST_MAGIC) {
-        error_setg(errp, "invalid magic (got 0x%" PRIx32 ")", magic);
+    if (magic != expect) {
+        error_setg(errp, "invalid magic (got 0x%" PRIx32 ", expected 0x%"
+                   PRIx32 ")", magic, expect);
         return -EINVAL;
     }
     return 0;
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v7 03/12] nbd/server: Prepare to send extended header replies
  2023-09-25 19:22 [PATCH v7 00/12] NBD 64-bit extensions for qemu Eric Blake
  2023-09-25 19:22 ` [PATCH v7 01/12] nbd/server: Support a request payload Eric Blake
  2023-09-25 19:22 ` [PATCH v7 02/12] nbd/server: Prepare to receive extended header requests Eric Blake
@ 2023-09-25 19:22 ` Eric Blake
  2023-09-25 19:22 ` [PATCH v7 04/12] nbd/server: Support 64-bit block status Eric Blake
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 30+ messages in thread
From: Eric Blake @ 2023-09-25 19:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: libguestfs, qemu-block, vsementsov

Although extended mode is not yet enabled, once we do turn it on, we
need to reply with extended headers to all messages.  Update the low
level entry points necessary so that all other callers automatically
get the right header based on the current mode.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
---

v5: s/iov->iov_len/iov[0].iov_len/ [Vladimir], add R-b

v4: new patch, split out from v3 9/14
---
 nbd/server.c | 30 ++++++++++++++++++++++--------
 1 file changed, 22 insertions(+), 8 deletions(-)

diff --git a/nbd/server.c b/nbd/server.c
index e227e470d41..4cd061f9da4 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -1938,8 +1938,6 @@ static inline void set_be_chunk(NBDClient *client, struct iovec *iov,
                                 size_t niov, uint16_t flags, uint16_t type,
                                 NBDRequest *request)
 {
-    /* TODO - handle structured vs. extended replies */
-    NBDStructuredReplyChunk *chunk = iov->iov_base;
     size_t i, length = 0;

     for (i = 1; i < niov; i++) {
@@ -1947,12 +1945,26 @@ static inline void set_be_chunk(NBDClient *client, struct iovec *iov,
     }
     assert(length <= NBD_MAX_BUFFER_SIZE + sizeof(NBDStructuredReadData));

-    iov[0].iov_len = sizeof(*chunk);
-    stl_be_p(&chunk->magic, NBD_STRUCTURED_REPLY_MAGIC);
-    stw_be_p(&chunk->flags, flags);
-    stw_be_p(&chunk->type, type);
-    stq_be_p(&chunk->cookie, request->cookie);
-    stl_be_p(&chunk->length, length);
+    if (client->mode >= NBD_MODE_EXTENDED) {
+        NBDExtendedReplyChunk *chunk = iov->iov_base;
+
+        iov[0].iov_len = sizeof(*chunk);
+        stl_be_p(&chunk->magic, NBD_EXTENDED_REPLY_MAGIC);
+        stw_be_p(&chunk->flags, flags);
+        stw_be_p(&chunk->type, type);
+        stq_be_p(&chunk->cookie, request->cookie);
+        stq_be_p(&chunk->offset, request->from);
+        stq_be_p(&chunk->length, length);
+    } else {
+        NBDStructuredReplyChunk *chunk = iov->iov_base;
+
+        iov[0].iov_len = sizeof(*chunk);
+        stl_be_p(&chunk->magic, NBD_STRUCTURED_REPLY_MAGIC);
+        stw_be_p(&chunk->flags, flags);
+        stw_be_p(&chunk->type, type);
+        stq_be_p(&chunk->cookie, request->cookie);
+        stl_be_p(&chunk->length, length);
+    }
 }

 static int coroutine_fn nbd_co_send_chunk_done(NBDClient *client,
@@ -2509,6 +2521,8 @@ static coroutine_fn int nbd_send_generic_reply(NBDClient *client,
 {
     if (client->mode >= NBD_MODE_STRUCTURED && ret < 0) {
         return nbd_co_send_chunk_error(client, request, -ret, error_msg, errp);
+    } else if (client->mode >= NBD_MODE_EXTENDED) {
+        return nbd_co_send_chunk_done(client, request, errp);
     } else {
         return nbd_co_send_simple_reply(client, request, ret < 0 ? -ret : 0,
                                         NULL, 0, errp);
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v7 04/12] nbd/server: Support 64-bit block status
  2023-09-25 19:22 [PATCH v7 00/12] NBD 64-bit extensions for qemu Eric Blake
                   ` (2 preceding siblings ...)
  2023-09-25 19:22 ` [PATCH v7 03/12] nbd/server: Prepare to send extended header replies Eric Blake
@ 2023-09-25 19:22 ` Eric Blake
  2023-09-25 19:22 ` [PATCH v7 05/12] nbd/server: Enable initial support for extended headers Eric Blake
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 30+ messages in thread
From: Eric Blake @ 2023-09-25 19:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: libguestfs, qemu-block, vsementsov

The NBD spec states that if the client negotiates extended headers,
the server must avoid NBD_REPLY_TYPE_BLOCK_STATUS and instead use
NBD_REPLY_TYPE_BLOCK_STATUS_EXT which supports 64-bit lengths, even if
the reply does not need more than 32 bits.  As of this patch,
client->mode is still never NBD_MODE_EXTENDED, so the code added here
does not take effect until the next patch enables negotiation.

For now, all metacontexts that we know how to export never populate
more than 32 bits of information, so we don't have to worry about
NBD_REP_ERR_EXT_HEADER_REQD or filtering during handshake, and we
always send all zeroes for the upper 32 bits of status during
NBD_CMD_BLOCK_STATUS.

Note that we previously had some interesting size-juggling on call
chains, such as:

nbd_co_send_block_status(uint32_t length)
-> blockstatus_to_extents(uint32_t bytes)
  -> bdrv_block_status_above(bytes, &uint64_t num)
  -> nbd_extent_array_add(uint64_t num)
    -> store num in 32-bit length

But we were lucky that it never overflowed: bdrv_block_status_above
never sets num larger than bytes, and we had previously been capping
'bytes' at 32 bits (since the protocol does not allow sending a larger
request without extended headers).  This patch adds some assertions
that ensure we continue to avoid overflowing 32 bits for a narrow
client, while fully utilizing 64-bits all the way through when the
client understands that.  Even in 64-bit math, overflow is not an
issue, because all lengths are coming from the block layer, and we
know that the block layer does not support images larger than off_t
(if lengths were coming from the network, the story would be
different).

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
---

v5: stronger justification on assertion [Vladimir], add R-b

v4: split conversion to big-endian across two helper functions rather
than in-place union [Vladimir]
---
 nbd/server.c | 108 ++++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 82 insertions(+), 26 deletions(-)

diff --git a/nbd/server.c b/nbd/server.c
index 4cd061f9da4..8448167b12a 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -2103,20 +2103,24 @@ static int coroutine_fn nbd_co_send_sparse_read(NBDClient *client,
 }

 typedef struct NBDExtentArray {
-    NBDExtent32 *extents;
+    NBDExtent64 *extents;
     unsigned int nb_alloc;
     unsigned int count;
     uint64_t total_length;
+    bool extended;
     bool can_add;
     bool converted_to_be;
 } NBDExtentArray;

-static NBDExtentArray *nbd_extent_array_new(unsigned int nb_alloc)
+static NBDExtentArray *nbd_extent_array_new(unsigned int nb_alloc,
+                                            NBDMode mode)
 {
     NBDExtentArray *ea = g_new0(NBDExtentArray, 1);

+    assert(mode >= NBD_MODE_STRUCTURED);
     ea->nb_alloc = nb_alloc;
-    ea->extents = g_new(NBDExtent32, nb_alloc);
+    ea->extents = g_new(NBDExtent64, nb_alloc);
+    ea->extended = mode >= NBD_MODE_EXTENDED;
     ea->can_add = true;

     return ea;
@@ -2135,15 +2139,36 @@ static void nbd_extent_array_convert_to_be(NBDExtentArray *ea)
     int i;

     assert(!ea->converted_to_be);
+    assert(ea->extended);
     ea->can_add = false;
     ea->converted_to_be = true;

     for (i = 0; i < ea->count; i++) {
-        ea->extents[i].flags = cpu_to_be32(ea->extents[i].flags);
-        ea->extents[i].length = cpu_to_be32(ea->extents[i].length);
+        ea->extents[i].length = cpu_to_be64(ea->extents[i].length);
+        ea->extents[i].flags = cpu_to_be64(ea->extents[i].flags);
     }
 }

+/* Further modifications of the array after conversion are abandoned */
+static NBDExtent32 *nbd_extent_array_convert_to_narrow(NBDExtentArray *ea)
+{
+    int i;
+    NBDExtent32 *extents = g_new(NBDExtent32, ea->count);
+
+    assert(!ea->converted_to_be);
+    assert(!ea->extended);
+    ea->can_add = false;
+    ea->converted_to_be = true;
+
+    for (i = 0; i < ea->count; i++) {
+        assert((ea->extents[i].length | ea->extents[i].flags) <= UINT32_MAX);
+        extents[i].length = cpu_to_be32(ea->extents[i].length);
+        extents[i].flags = cpu_to_be32(ea->extents[i].flags);
+    }
+
+    return extents;
+}
+
 /*
  * Add extent to NBDExtentArray. If extent can't be added (no available space),
  * return -1.
@@ -2154,19 +2179,27 @@ static void nbd_extent_array_convert_to_be(NBDExtentArray *ea)
  * would result in an incorrect range reported to the client)
  */
 static int nbd_extent_array_add(NBDExtentArray *ea,
-                                uint32_t length, uint32_t flags)
+                                uint64_t length, uint32_t flags)
 {
     assert(ea->can_add);

     if (!length) {
         return 0;
     }
+    if (!ea->extended) {
+        assert(length <= UINT32_MAX);
+    }

     /* Extend previous extent if flags are the same */
     if (ea->count > 0 && flags == ea->extents[ea->count - 1].flags) {
-        uint64_t sum = (uint64_t)length + ea->extents[ea->count - 1].length;
+        uint64_t sum = length + ea->extents[ea->count - 1].length;

-        if (sum <= UINT32_MAX) {
+        /*
+         * sum cannot overflow: the block layer bounds image size at
+         * 2^63, and ea->extents[].length comes from the block layer.
+         */
+        assert(sum >= length);
+        if (sum <= UINT32_MAX || ea->extended) {
             ea->extents[ea->count - 1].length = sum;
             ea->total_length += length;
             return 0;
@@ -2179,7 +2212,7 @@ static int nbd_extent_array_add(NBDExtentArray *ea,
     }

     ea->total_length += length;
-    ea->extents[ea->count] = (NBDExtent32) {.length = length, .flags = flags};
+    ea->extents[ea->count] = (NBDExtent64) {.length = length, .flags = flags};
     ea->count++;

     return 0;
@@ -2248,20 +2281,39 @@ nbd_co_send_extents(NBDClient *client, NBDRequest *request, NBDExtentArray *ea,
                     bool last, uint32_t context_id, Error **errp)
 {
     NBDReply hdr;
-    NBDStructuredMeta chunk;
-    struct iovec iov[] = {
-        {.iov_base = &hdr},
-        {.iov_base = &chunk, .iov_len = sizeof(chunk)},
-        {.iov_base = ea->extents, .iov_len = ea->count * sizeof(ea->extents[0])}
-    };
-
-    nbd_extent_array_convert_to_be(ea);
+    NBDStructuredMeta meta;
+    NBDExtendedMeta meta_ext;
+    g_autofree NBDExtent32 *extents = NULL;
+    uint16_t type;
+    struct iovec iov[] = { {.iov_base = &hdr}, {0}, {0} };
+
+    if (client->mode >= NBD_MODE_EXTENDED) {
+        type = NBD_REPLY_TYPE_BLOCK_STATUS_EXT;
+
+        iov[1].iov_base = &meta_ext;
+        iov[1].iov_len = sizeof(meta_ext);
+        stl_be_p(&meta_ext.context_id, context_id);
+        stl_be_p(&meta_ext.count, ea->count);
+
+        nbd_extent_array_convert_to_be(ea);
+        iov[2].iov_base = ea->extents;
+        iov[2].iov_len = ea->count * sizeof(ea->extents[0]);
+    } else {
+        type = NBD_REPLY_TYPE_BLOCK_STATUS;
+
+        iov[1].iov_base = &meta;
+        iov[1].iov_len = sizeof(meta);
+        stl_be_p(&meta.context_id, context_id);
+
+        extents = nbd_extent_array_convert_to_narrow(ea);
+        iov[2].iov_base = extents;
+        iov[2].iov_len = ea->count * sizeof(extents[0]);
+    }

     trace_nbd_co_send_extents(request->cookie, ea->count, context_id,
                               ea->total_length, last);
-    set_be_chunk(client, iov, 3, last ? NBD_REPLY_FLAG_DONE : 0,
-                 NBD_REPLY_TYPE_BLOCK_STATUS, request);
-    stl_be_p(&chunk.context_id, context_id);
+    set_be_chunk(client, iov, 3, last ? NBD_REPLY_FLAG_DONE : 0, type,
+                 request);

     return nbd_co_send_iov(client, iov, 3, errp);
 }
@@ -2270,13 +2322,14 @@ nbd_co_send_extents(NBDClient *client, NBDRequest *request, NBDExtentArray *ea,
 static int
 coroutine_fn nbd_co_send_block_status(NBDClient *client, NBDRequest *request,
                                       BlockBackend *blk, uint64_t offset,
-                                      uint32_t length, bool dont_fragment,
+                                      uint64_t length, bool dont_fragment,
                                       bool last, uint32_t context_id,
                                       Error **errp)
 {
     int ret;
     unsigned int nb_extents = dont_fragment ? 1 : NBD_MAX_BLOCK_STATUS_EXTENTS;
-    g_autoptr(NBDExtentArray) ea = nbd_extent_array_new(nb_extents);
+    g_autoptr(NBDExtentArray) ea =
+        nbd_extent_array_new(nb_extents, client->mode);

     if (context_id == NBD_META_ID_BASE_ALLOCATION) {
         ret = blockstatus_to_extents(blk, offset, length, ea);
@@ -2299,11 +2352,12 @@ static void bitmap_to_extents(BdrvDirtyBitmap *bitmap,
     int64_t start, dirty_start, dirty_count;
     int64_t end = offset + length;
     bool full = false;
+    int64_t bound = es->extended ? INT64_MAX : INT32_MAX;

     bdrv_dirty_bitmap_lock(bitmap);

     for (start = offset;
-         bdrv_dirty_bitmap_next_dirty_area(bitmap, start, end, INT32_MAX,
+         bdrv_dirty_bitmap_next_dirty_area(bitmap, start, end, bound,
                                            &dirty_start, &dirty_count);
          start = dirty_start + dirty_count)
     {
@@ -2327,12 +2381,13 @@ static int coroutine_fn nbd_co_send_bitmap(NBDClient *client,
                                            NBDRequest *request,
                                            BdrvDirtyBitmap *bitmap,
                                            uint64_t offset,
-                                           uint32_t length, bool dont_fragment,
+                                           uint64_t length, bool dont_fragment,
                                            bool last, uint32_t context_id,
                                            Error **errp)
 {
     unsigned int nb_extents = dont_fragment ? 1 : NBD_MAX_BLOCK_STATUS_EXTENTS;
-    g_autoptr(NBDExtentArray) ea = nbd_extent_array_new(nb_extents);
+    g_autoptr(NBDExtentArray) ea =
+        nbd_extent_array_new(nb_extents, client->mode);

     bitmap_to_extents(bitmap, offset, length, ea);

@@ -2668,7 +2723,8 @@ static coroutine_fn int nbd_handle_request(NBDClient *client,
             return nbd_send_generic_reply(client, request, -EINVAL,
                                           "need non-zero length", errp);
         }
-        assert(request->len <= UINT32_MAX);
+        assert(client->mode >= NBD_MODE_EXTENDED ||
+               request->len <= UINT32_MAX);
         if (client->export_meta.count) {
             bool dont_fragment = request->flags & NBD_CMD_FLAG_REQ_ONE;
             int contexts_remaining = client->export_meta.count;
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v7 05/12] nbd/server: Enable initial support for extended headers
  2023-09-25 19:22 [PATCH v7 00/12] NBD 64-bit extensions for qemu Eric Blake
                   ` (3 preceding siblings ...)
  2023-09-25 19:22 ` [PATCH v7 04/12] nbd/server: Support 64-bit block status Eric Blake
@ 2023-09-25 19:22 ` Eric Blake
  2023-09-25 19:22 ` [PATCH v7 06/12] nbd/client: Plumb errp through nbd_receive_replies Eric Blake
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 30+ messages in thread
From: Eric Blake @ 2023-09-25 19:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: libguestfs, qemu-block, vsementsov

Time to start supporting clients that request extended headers.  Now
we can finally reach the code added across several previous patches.

Even though the NBD spec has been altered to allow us to accept
NBD_CMD_READ larger than the max payload size (provided our response
is a hole or broken up over more than one data chunk), we are not
planning to take advantage of that, and continue to cap NBD_CMD_READ
to 32M regardless of header size.

For NBD_CMD_WRITE_ZEROES and NBD_CMD_TRIM, the block layer already
supports 64-bit operations without any effort on our part.  For
NBD_CMD_BLOCK_STATUS, the client's length is a hint, and the previous
patch took care of implementing the required
NBD_REPLY_TYPE_BLOCK_STATUS_EXT.

We do not yet support clients that want to do request payload
filtering of NBD_CMD_BLOCK_STATUS; that will be added in later
patches, but is not essential for qemu as a client since qemu only
requests the single context base:allocation.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
---

v5: add R-b, s/8.1/8.2/

v4: split out parts into earlier patches, rebase to earlier changes,
simplify handling of generic replies, retitle (compare to v3 9/14)
---
 docs/interop/nbd.txt |  1 +
 nbd/server.c         | 21 +++++++++++++++++++++
 2 files changed, 22 insertions(+)

diff --git a/docs/interop/nbd.txt b/docs/interop/nbd.txt
index f5ca25174a6..9aae5e1f294 100644
--- a/docs/interop/nbd.txt
+++ b/docs/interop/nbd.txt
@@ -69,3 +69,4 @@ NBD_CMD_BLOCK_STATUS for "qemu:dirty-bitmap:", NBD_CMD_CACHE
 NBD_CMD_FLAG_FAST_ZERO
 * 5.2: NBD_CMD_BLOCK_STATUS for "qemu:allocation-depth"
 * 7.1: NBD_FLAG_CAN_MULTI_CONN for shareable writable exports
+* 8.2: NBD_OPT_EXTENDED_HEADERS
diff --git a/nbd/server.c b/nbd/server.c
index 8448167b12a..b09ee44c159 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -482,6 +482,10 @@ static int nbd_negotiate_handle_export_name(NBDClient *client, bool no_zeroes,
         [10 .. 133]   reserved     (0) [unless no_zeroes]
      */
     trace_nbd_negotiate_handle_export_name();
+    if (client->mode >= NBD_MODE_EXTENDED) {
+        error_setg(errp, "Extended headers already negotiated");
+        return -EINVAL;
+    }
     if (client->optlen > NBD_MAX_STRING_SIZE) {
         error_setg(errp, "Bad length received");
         return -EINVAL;
@@ -1264,6 +1268,10 @@ static int nbd_negotiate_options(NBDClient *client, Error **errp)
             case NBD_OPT_STRUCTURED_REPLY:
                 if (length) {
                     ret = nbd_reject_length(client, false, errp);
+                } else if (client->mode >= NBD_MODE_EXTENDED) {
+                    ret = nbd_negotiate_send_rep_err(
+                        client, NBD_REP_ERR_EXT_HEADER_REQD, errp,
+                        "extended headers already negotiated");
                 } else if (client->mode >= NBD_MODE_STRUCTURED) {
                     ret = nbd_negotiate_send_rep_err(
                         client, NBD_REP_ERR_INVALID, errp,
@@ -1280,6 +1288,19 @@ static int nbd_negotiate_options(NBDClient *client, Error **errp)
                                                  errp);
                 break;

+            case NBD_OPT_EXTENDED_HEADERS:
+                if (length) {
+                    ret = nbd_reject_length(client, false, errp);
+                } else if (client->mode >= NBD_MODE_EXTENDED) {
+                    ret = nbd_negotiate_send_rep_err(
+                        client, NBD_REP_ERR_INVALID, errp,
+                        "extended headers already negotiated");
+                } else {
+                    ret = nbd_negotiate_send_rep(client, NBD_REP_ACK, errp);
+                    client->mode = NBD_MODE_EXTENDED;
+                }
+                break;
+
             default:
                 ret = nbd_opt_drop(client, NBD_REP_ERR_UNSUP, errp,
                                    "Unsupported option %" PRIu32 " (%s)",
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v7 06/12] nbd/client: Plumb errp through nbd_receive_replies
  2023-09-25 19:22 [PATCH v7 00/12] NBD 64-bit extensions for qemu Eric Blake
                   ` (4 preceding siblings ...)
  2023-09-25 19:22 ` [PATCH v7 05/12] nbd/server: Enable initial support for extended headers Eric Blake
@ 2023-09-25 19:22 ` Eric Blake
  2023-09-27 11:29   ` Vladimir Sementsov-Ogievskiy
  2023-09-25 19:22 ` [PATCH v7 07/12] nbd/client: Initial support for extended headers Eric Blake
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 30+ messages in thread
From: Eric Blake @ 2023-09-25 19:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: libguestfs, qemu-block, vsementsov, Kevin Wolf, Hanna Reitz

Instead of ignoring the low-level error just to refabricate our own
message to pass to the caller, we can just plumb the caller's errp
down to the low level.

Signed-off-by: Eric Blake <eblake@redhat.com>
---

v5: set errp on more failure cases [Vladimir], typo fix

v4: new patch [Vladimir]
---
 block/nbd.c | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index 4a7f37da1c6..22d3cb11ac8 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -416,7 +416,8 @@ static void coroutine_fn GRAPH_RDLOCK nbd_reconnect_attempt(BDRVNBDState *s)
     reconnect_delay_timer_del(s);
 }

-static coroutine_fn int nbd_receive_replies(BDRVNBDState *s, uint64_t cookie)
+static coroutine_fn int nbd_receive_replies(BDRVNBDState *s, uint64_t cookie,
+                                            Error **errp)
 {
     int ret;
     uint64_t ind = COOKIE_TO_INDEX(cookie), ind2;
@@ -457,20 +458,25 @@ static coroutine_fn int nbd_receive_replies(BDRVNBDState *s, uint64_t cookie)

         /* We are under mutex and cookie is 0. We have to do the dirty work. */
         assert(s->reply.cookie == 0);
-        ret = nbd_receive_reply(s->bs, s->ioc, &s->reply, NULL);
-        if (ret <= 0) {
-            ret = ret ? ret : -EIO;
+        ret = nbd_receive_reply(s->bs, s->ioc, &s->reply, errp);
+        if (ret == 0) {
+            ret = -EIO;
+            error_setg(errp, "server dropped connection");
+        }
+        if (ret < 0) {
             nbd_channel_error(s, ret);
             return ret;
         }
         if (nbd_reply_is_structured(&s->reply) &&
             s->info.mode < NBD_MODE_STRUCTURED) {
             nbd_channel_error(s, -EINVAL);
+            error_setg(errp, "unexpected structured reply");
             return -EINVAL;
         }
         ind2 = COOKIE_TO_INDEX(s->reply.cookie);
         if (ind2 >= MAX_NBD_REQUESTS || !s->requests[ind2].coroutine) {
             nbd_channel_error(s, -EINVAL);
+            error_setg(errp, "unexpected cookie value");
             return -EINVAL;
         }
         if (s->reply.cookie == cookie) {
@@ -842,9 +848,9 @@ static coroutine_fn int nbd_co_do_receive_one_chunk(
     }
     *request_ret = 0;

-    ret = nbd_receive_replies(s, cookie);
+    ret = nbd_receive_replies(s, cookie, errp);
     if (ret < 0) {
-        error_setg(errp, "Connection closed");
+        error_prepend(errp, "Connection closed: ");
         return -EIO;
     }
     assert(s->ioc);
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v7 07/12] nbd/client: Initial support for extended headers
  2023-09-25 19:22 [PATCH v7 00/12] NBD 64-bit extensions for qemu Eric Blake
                   ` (5 preceding siblings ...)
  2023-09-25 19:22 ` [PATCH v7 06/12] nbd/client: Plumb errp through nbd_receive_replies Eric Blake
@ 2023-09-25 19:22 ` Eric Blake
  2023-09-29  9:24   ` Vladimir Sementsov-Ogievskiy
  2023-09-25 19:22 ` [PATCH v7 08/12] nbd/client: Accept 64-bit block status chunks Eric Blake
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 30+ messages in thread
From: Eric Blake @ 2023-09-25 19:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: libguestfs, qemu-block, vsementsov, Kevin Wolf, Hanna Reitz

Update the client code to be able to send an extended request, and
parse an extended header from the server.  Note that since we reject
any structured reply with a too-large payload, we can always normalize
a valid header back into the compact form, so that the caller need not
deal with two branches of a union.  Still, until a later patch lets
the client negotiate extended headers, the code added here should not
be reached.  Note that because of the different magic numbers, it is
just as easy to trace and then tolerate a non-compliant server sending
the wrong header reply as it would be to insist that the server is
compliant.

Signed-off-by: Eric Blake <eblake@redhat.com>
---

v5: fix logic bug on error reporting [Vladimir]

v4: split off errp handling to separate patch [Vladimir], better
function naming [Vladimir]
---
 include/block/nbd.h |   3 +-
 block/nbd.c         |   2 +-
 nbd/client.c        | 104 +++++++++++++++++++++++++++++---------------
 nbd/trace-events    |   3 +-
 4 files changed, 74 insertions(+), 38 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index 8a765e78dfb..ba8724f5336 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -389,7 +389,8 @@ int nbd_init(int fd, QIOChannelSocket *sioc, NBDExportInfo *info,
              Error **errp);
 int nbd_send_request(QIOChannel *ioc, NBDRequest *request);
 int coroutine_fn nbd_receive_reply(BlockDriverState *bs, QIOChannel *ioc,
-                                   NBDReply *reply, Error **errp);
+                                   NBDReply *reply, NBDMode mode,
+                                   Error **errp);
 int nbd_client(int fd);
 int nbd_disconnect(int fd);
 int nbd_errno_to_system_errno(int err);
diff --git a/block/nbd.c b/block/nbd.c
index 22d3cb11ac8..76461430411 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -458,7 +458,7 @@ static coroutine_fn int nbd_receive_replies(BDRVNBDState *s, uint64_t cookie,

         /* We are under mutex and cookie is 0. We have to do the dirty work. */
         assert(s->reply.cookie == 0);
-        ret = nbd_receive_reply(s->bs, s->ioc, &s->reply, errp);
+        ret = nbd_receive_reply(s->bs, s->ioc, &s->reply, s->info.mode, errp);
         if (ret == 0) {
             ret = -EIO;
             error_setg(errp, "server dropped connection");
diff --git a/nbd/client.c b/nbd/client.c
index cecb0f04377..a2f253062aa 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -1346,22 +1346,29 @@ int nbd_disconnect(int fd)

 int nbd_send_request(QIOChannel *ioc, NBDRequest *request)
 {
-    uint8_t buf[NBD_REQUEST_SIZE];
+    uint8_t buf[NBD_EXTENDED_REQUEST_SIZE];
+    size_t len;

-    assert(request->mode <= NBD_MODE_STRUCTURED); /* TODO handle extended */
-    assert(request->len <= UINT32_MAX);
     trace_nbd_send_request(request->from, request->len, request->cookie,
                            request->flags, request->type,
                            nbd_cmd_lookup(request->type));

-    stl_be_p(buf, NBD_REQUEST_MAGIC);
     stw_be_p(buf + 4, request->flags);
     stw_be_p(buf + 6, request->type);
     stq_be_p(buf + 8, request->cookie);
     stq_be_p(buf + 16, request->from);
-    stl_be_p(buf + 24, request->len);
+    if (request->mode >= NBD_MODE_EXTENDED) {
+        stl_be_p(buf, NBD_EXTENDED_REQUEST_MAGIC);
+        stq_be_p(buf + 24, request->len);
+        len = NBD_EXTENDED_REQUEST_SIZE;
+    } else {
+        assert(request->len <= UINT32_MAX);
+        stl_be_p(buf, NBD_REQUEST_MAGIC);
+        stl_be_p(buf + 24, request->len);
+        len = NBD_REQUEST_SIZE;
+    }

-    return nbd_write(ioc, buf, sizeof(buf), NULL);
+    return nbd_write(ioc, buf, len, NULL);
 }

 /* nbd_receive_simple_reply
@@ -1388,30 +1395,36 @@ static int nbd_receive_simple_reply(QIOChannel *ioc, NBDSimpleReply *reply,
     return 0;
 }

-/* nbd_receive_structured_reply_chunk
+/* nbd_receive_reply_chunk_header
  * Read structured reply chunk except magic field (which should be already
- * read).
+ * read).  Normalize into the compact form.
  * Payload is not read.
  */
-static int nbd_receive_structured_reply_chunk(QIOChannel *ioc,
-                                              NBDStructuredReplyChunk *chunk,
-                                              Error **errp)
+static int nbd_receive_reply_chunk_header(QIOChannel *ioc, NBDReply *chunk,
+                                          Error **errp)
 {
     int ret;
+    size_t len;
+    uint64_t payload_len;

-    assert(chunk->magic == NBD_STRUCTURED_REPLY_MAGIC);
+    if (chunk->magic == NBD_STRUCTURED_REPLY_MAGIC) {
+        len = sizeof(chunk->structured);
+    } else {
+        assert(chunk->magic == NBD_EXTENDED_REPLY_MAGIC);
+        len = sizeof(chunk->extended);
+    }

     ret = nbd_read(ioc, (uint8_t *)chunk + sizeof(chunk->magic),
-                   sizeof(*chunk) - sizeof(chunk->magic), "structured chunk",
+                   len - sizeof(chunk->magic), "structured chunk",
                    errp);
     if (ret < 0) {
         return ret;
     }

-    chunk->flags = be16_to_cpu(chunk->flags);
-    chunk->type = be16_to_cpu(chunk->type);
-    chunk->cookie = be64_to_cpu(chunk->cookie);
-    chunk->length = be32_to_cpu(chunk->length);
+    /* flags, type, and cookie occupy same space between forms */
+    chunk->structured.flags = be16_to_cpu(chunk->structured.flags);
+    chunk->structured.type = be16_to_cpu(chunk->structured.type);
+    chunk->structured.cookie = be64_to_cpu(chunk->structured.cookie);

     /*
      * Because we use BLOCK_STATUS with REQ_ONE, and cap READ requests
@@ -1419,11 +1432,20 @@ static int nbd_receive_structured_reply_chunk(QIOChannel *ioc,
      * this.  Even if we stopped using REQ_ONE, sane servers will cap
      * the number of extents they return for block status.
      */
-    if (chunk->length > NBD_MAX_BUFFER_SIZE + sizeof(NBDStructuredReadData)) {
+    if (chunk->magic == NBD_STRUCTURED_REPLY_MAGIC) {
+        payload_len = be32_to_cpu(chunk->structured.length);
+    } else {
+        /* For now, we are ignoring the extended header offset. */
+        payload_len = be64_to_cpu(chunk->extended.length);
+        chunk->magic = NBD_STRUCTURED_REPLY_MAGIC;
+    }
+    if (payload_len > NBD_MAX_BUFFER_SIZE + sizeof(NBDStructuredReadData)) {
         error_setg(errp, "server chunk %" PRIu32 " (%s) payload is too long",
-                   chunk->type, nbd_rep_lookup(chunk->type));
+                   chunk->structured.type,
+                   nbd_rep_lookup(chunk->structured.type));
         return -EINVAL;
     }
+    chunk->structured.length = payload_len;

     return 0;
 }
@@ -1470,19 +1492,21 @@ nbd_read_eof(BlockDriverState *bs, QIOChannel *ioc, void *buffer, size_t size,

 /* nbd_receive_reply
  *
- * Decreases bs->in_flight while waiting for a new reply. This yield is where
- * we wait indefinitely and the coroutine must be able to be safely reentered
- * for nbd_client_attach_aio_context().
+ * Wait for a new reply. If this yields, the coroutine must be able to be
+ * safely reentered for nbd_client_attach_aio_context().  @mode determines
+ * which reply magic we are expecting, although this normalizes the result
+ * so that the caller only has to work with compact headers.
  *
  * Returns 1 on success
- *         0 on eof, when no data was read (errp is not set)
- *         negative errno on failure (errp is set)
+ *         0 on eof, when no data was read
+ *         negative errno on failure
  */
 int coroutine_fn nbd_receive_reply(BlockDriverState *bs, QIOChannel *ioc,
-                                   NBDReply *reply, Error **errp)
+                                   NBDReply *reply, NBDMode mode, Error **errp)
 {
     int ret;
     const char *type;
+    uint32_t expected;

     ret = nbd_read_eof(bs, ioc, &reply->magic, sizeof(reply->magic), errp);
     if (ret <= 0) {
@@ -1491,34 +1515,44 @@ int coroutine_fn nbd_receive_reply(BlockDriverState *bs, QIOChannel *ioc,

     reply->magic = be32_to_cpu(reply->magic);

+    /* Diagnose but accept wrong-width header */
     switch (reply->magic) {
     case NBD_SIMPLE_REPLY_MAGIC:
+        if (mode >= NBD_MODE_EXTENDED) {
+            trace_nbd_receive_wrong_header(reply->magic,
+                                           nbd_mode_lookup(mode));
+        }
         ret = nbd_receive_simple_reply(ioc, &reply->simple, errp);
         if (ret < 0) {
-            break;
+            return ret;
         }
         trace_nbd_receive_simple_reply(reply->simple.error,
                                        nbd_err_lookup(reply->simple.error),
                                        reply->cookie);
         break;
     case NBD_STRUCTURED_REPLY_MAGIC:
-        ret = nbd_receive_structured_reply_chunk(ioc, &reply->structured, errp);
+    case NBD_EXTENDED_REPLY_MAGIC:
+        expected = mode >= NBD_MODE_EXTENDED ? NBD_EXTENDED_REPLY_MAGIC
+            : NBD_STRUCTURED_REPLY_MAGIC;
+        if (reply->magic != expected) {
+            trace_nbd_receive_wrong_header(reply->magic,
+                                           nbd_mode_lookup(mode));
+        }
+        ret = nbd_receive_reply_chunk_header(ioc, reply, errp);
         if (ret < 0) {
-            break;
+            return ret;
         }
         type = nbd_reply_type_lookup(reply->structured.type);
-        trace_nbd_receive_structured_reply_chunk(reply->structured.flags,
-                                                 reply->structured.type, type,
-                                                 reply->structured.cookie,
-                                                 reply->structured.length);
+        trace_nbd_receive_reply_chunk_header(reply->structured.flags,
+                                             reply->structured.type, type,
+                                             reply->structured.cookie,
+                                             reply->structured.length);
         break;
     default:
+        trace_nbd_receive_wrong_header(reply->magic, nbd_mode_lookup(mode));
         error_setg(errp, "invalid magic (got 0x%" PRIx32 ")", reply->magic);
         return -EINVAL;
     }
-    if (ret < 0) {
-        return ret;
-    }

     return 1;
 }
diff --git a/nbd/trace-events b/nbd/trace-events
index c1a3227613f..8f4e20ee9f2 100644
--- a/nbd/trace-events
+++ b/nbd/trace-events
@@ -33,7 +33,8 @@ nbd_client_clear_queue(void) "Clearing NBD queue"
 nbd_client_clear_socket(void) "Clearing NBD socket"
 nbd_send_request(uint64_t from, uint64_t len, uint64_t cookie, uint16_t flags, uint16_t type, const char *name) "Sending request to server: { .from = %" PRIu64", .len = %" PRIu64 ", .cookie = %" PRIu64 ", .flags = 0x%" PRIx16 ", .type = %" PRIu16 " (%s) }"
 nbd_receive_simple_reply(int32_t error, const char *errname, uint64_t cookie) "Got simple reply: { .error = %" PRId32 " (%s), cookie = %" PRIu64" }"
-nbd_receive_structured_reply_chunk(uint16_t flags, uint16_t type, const char *name, uint64_t cookie, uint32_t length) "Got structured reply chunk: { flags = 0x%" PRIx16 ", type = %d (%s), cookie = %" PRIu64 ", length = %" PRIu32 " }"
+nbd_receive_reply_chunk_header(uint16_t flags, uint16_t type, const char *name, uint64_t cookie, uint32_t length) "Got reply chunk header: { flags = 0x%" PRIx16 ", type = %d (%s), cookie = %" PRIu64 ", length = %" PRIu32 " }"
+nbd_receive_wrong_header(uint32_t magic, const char *mode) "Server sent unexpected magic 0x%" PRIx32 " for negotiated mode %s"

 # common.c
 nbd_unknown_error(int err) "Squashing unexpected error %d to EINVAL"
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v7 08/12] nbd/client: Accept 64-bit block status chunks
  2023-09-25 19:22 [PATCH v7 00/12] NBD 64-bit extensions for qemu Eric Blake
                   ` (6 preceding siblings ...)
  2023-09-25 19:22 ` [PATCH v7 07/12] nbd/client: Initial support for extended headers Eric Blake
@ 2023-09-25 19:22 ` Eric Blake
  2023-09-25 19:22 ` [PATCH v7 09/12] nbd/client: Request extended headers during negotiation Eric Blake
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 30+ messages in thread
From: Eric Blake @ 2023-09-25 19:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: libguestfs, qemu-block, vsementsov, Kevin Wolf, Hanna Reitz

Once extended mode is enabled, we need to accept 64-bit status replies
(even for replies that don't exceed a 32-bit length).  It is easier to
normalize narrow replies into wide format so that the rest of our code
only has to handle one width.  Although a server is non-compliant if
it sends a 64-bit reply in compact mode, or a 32-bit reply in extended
mode, it is still easy enough to tolerate these mismatches.

In normal execution, we are only requesting "base:allocation" which
never exceeds 32 bits for flag values. But during testing with
x-dirty-bitmap, we can force qemu to connect to some other context
that might have 64-bit status bit; however, we ignore those upper bits
(other than mapping qemu:allocation-depth into something that
'qemu-img map --output=json' can expose), and since that only affects
testing, we really don't bother with checking whether more than the
two least-significant bits are set.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
---

v5: factor out duplicate length calculation [Vladimir], add R-b

v4: tweak comments and error message about count mismatch, fix setting
of wide in loop [Vladimir]
---
 block/nbd.c        | 49 ++++++++++++++++++++++++++++++++--------------
 block/trace-events |  1 +
 2 files changed, 35 insertions(+), 15 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index 76461430411..52ebc8b2f53 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -615,13 +615,17 @@ static int nbd_parse_offset_hole_payload(BDRVNBDState *s,
  */
 static int nbd_parse_blockstatus_payload(BDRVNBDState *s,
                                          NBDStructuredReplyChunk *chunk,
-                                         uint8_t *payload, uint64_t orig_length,
-                                         NBDExtent32 *extent, Error **errp)
+                                         uint8_t *payload, bool wide,
+                                         uint64_t orig_length,
+                                         NBDExtent64 *extent, Error **errp)
 {
     uint32_t context_id;
+    uint32_t count;
+    size_t ext_len = wide ? sizeof(*extent) : sizeof(NBDExtent32);
+    size_t pay_len = sizeof(context_id) + wide * sizeof(count) + ext_len;

     /* The server succeeded, so it must have sent [at least] one extent */
-    if (chunk->length < sizeof(context_id) + sizeof(*extent)) {
+    if (chunk->length < pay_len) {
         error_setg(errp, "Protocol error: invalid payload for "
                          "NBD_REPLY_TYPE_BLOCK_STATUS");
         return -EINVAL;
@@ -636,8 +640,15 @@ static int nbd_parse_blockstatus_payload(BDRVNBDState *s,
         return -EINVAL;
     }

-    extent->length = payload_advance32(&payload);
-    extent->flags = payload_advance32(&payload);
+    if (wide) {
+        count = payload_advance32(&payload);
+        extent->length = payload_advance64(&payload);
+        extent->flags = payload_advance64(&payload);
+    } else {
+        count = 0;
+        extent->length = payload_advance32(&payload);
+        extent->flags = payload_advance32(&payload);
+    }

     if (extent->length == 0) {
         error_setg(errp, "Protocol error: server sent status chunk with "
@@ -658,7 +669,7 @@ static int nbd_parse_blockstatus_payload(BDRVNBDState *s,
      * (always a safe status, even if it loses information).
      */
     if (s->info.min_block && !QEMU_IS_ALIGNED(extent->length,
-                                                   s->info.min_block)) {
+                                              s->info.min_block)) {
         trace_nbd_parse_blockstatus_compliance("extent length is unaligned");
         if (extent->length > s->info.min_block) {
             extent->length = QEMU_ALIGN_DOWN(extent->length,
@@ -672,13 +683,15 @@ static int nbd_parse_blockstatus_payload(BDRVNBDState *s,
     /*
      * We used NBD_CMD_FLAG_REQ_ONE, so the server should not have
      * sent us any more than one extent, nor should it have included
-     * status beyond our request in that extent. However, it's easy
-     * enough to ignore the server's noncompliance without killing the
+     * status beyond our request in that extent. Furthermore, a wide
+     * server should have replied with an accurate count (we left
+     * count at 0 for a narrow server).  However, it's easy enough to
+     * ignore the server's noncompliance without killing the
      * connection; just ignore trailing extents, and clamp things to
      * the length of our request.
      */
-    if (chunk->length > sizeof(context_id) + sizeof(*extent)) {
-        trace_nbd_parse_blockstatus_compliance("more than one extent");
+    if (count != wide || chunk->length > pay_len) {
+        trace_nbd_parse_blockstatus_compliance("unexpected extent count");
     }
     if (extent->length > orig_length) {
         extent->length = orig_length;
@@ -1124,7 +1137,7 @@ nbd_co_receive_cmdread_reply(BDRVNBDState *s, uint64_t cookie,

 static int coroutine_fn
 nbd_co_receive_blockstatus_reply(BDRVNBDState *s, uint64_t cookie,
-                                 uint64_t length, NBDExtent32 *extent,
+                                 uint64_t length, NBDExtent64 *extent,
                                  int *request_ret, Error **errp)
 {
     NBDReplyChunkIter iter;
@@ -1137,11 +1150,17 @@ nbd_co_receive_blockstatus_reply(BDRVNBDState *s, uint64_t cookie,
     NBD_FOREACH_REPLY_CHUNK(s, iter, cookie, false, NULL, &reply, &payload) {
         int ret;
         NBDStructuredReplyChunk *chunk = &reply.structured;
+        bool wide;

         assert(nbd_reply_is_structured(&reply));

         switch (chunk->type) {
+        case NBD_REPLY_TYPE_BLOCK_STATUS_EXT:
         case NBD_REPLY_TYPE_BLOCK_STATUS:
+            wide = chunk->type == NBD_REPLY_TYPE_BLOCK_STATUS_EXT;
+            if ((s->info.mode >= NBD_MODE_EXTENDED) != wide) {
+                trace_nbd_extended_headers_compliance("block_status");
+            }
             if (received) {
                 nbd_channel_error(s, -EINVAL);
                 error_setg(&local_err, "Several BLOCK_STATUS chunks in reply");
@@ -1149,9 +1168,9 @@ nbd_co_receive_blockstatus_reply(BDRVNBDState *s, uint64_t cookie,
             }
             received = true;

-            ret = nbd_parse_blockstatus_payload(s, &reply.structured,
-                                                payload, length, extent,
-                                                &local_err);
+            ret = nbd_parse_blockstatus_payload(
+                s, &reply.structured, payload, wide,
+                length, extent, &local_err);
             if (ret < 0) {
                 nbd_channel_error(s, ret);
                 nbd_iter_channel_error(&iter, ret, &local_err);
@@ -1381,7 +1400,7 @@ static int coroutine_fn GRAPH_RDLOCK nbd_client_co_block_status(
         int64_t *pnum, int64_t *map, BlockDriverState **file)
 {
     int ret, request_ret;
-    NBDExtent32 extent = { 0 };
+    NBDExtent64 extent = { 0 };
     BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
     Error *local_err = NULL;

diff --git a/block/trace-events b/block/trace-events
index 925aa554bbf..8e789e1f12f 100644
--- a/block/trace-events
+++ b/block/trace-events
@@ -166,6 +166,7 @@ iscsi_xcopy(void *src_lun, uint64_t src_off, void *dst_lun, uint64_t dst_off, ui
 # nbd.c
 nbd_parse_blockstatus_compliance(const char *err) "ignoring extra data from non-compliant server: %s"
 nbd_structured_read_compliance(const char *type) "server sent non-compliant unaligned read %s chunk"
+nbd_extended_headers_compliance(const char *type) "server sent non-compliant %s chunk not matching choice of extended headers"
 nbd_read_reply_entry_fail(int ret, const char *err) "ret = %d, err: %s"
 nbd_co_request_fail(uint64_t from, uint64_t len, uint64_t handle, uint16_t flags, uint16_t type, const char *name, int ret, const char *err) "Request failed { .from = %" PRIu64", .len = %" PRIu64 ", .handle = %" PRIu64 ", .flags = 0x%" PRIx16 ", .type = %" PRIu16 " (%s) } ret = %d, err: %s"
 nbd_client_handshake(const char *export_name) "export '%s'"
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v7 09/12] nbd/client: Request extended headers during negotiation
  2023-09-25 19:22 [PATCH v7 00/12] NBD 64-bit extensions for qemu Eric Blake
                   ` (7 preceding siblings ...)
  2023-09-25 19:22 ` [PATCH v7 08/12] nbd/client: Accept 64-bit block status chunks Eric Blake
@ 2023-09-25 19:22 ` Eric Blake
  2023-09-25 19:22 ` [PATCH v7 10/12] nbd/server: Refactor list of negotiated meta contexts Eric Blake
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 30+ messages in thread
From: Eric Blake @ 2023-09-25 19:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: libguestfs, qemu-block, vsementsov, Kevin Wolf, Hanna Reitz

All the pieces are in place for a client to finally request extended
headers.  Note that we must not request extended headers when qemu-nbd
is used to connect to the kernel module (as nbd.ko does not expect
them, but expects us to do the negotiation in userspace before handing
the socket over to the kernel), but there is no harm in all other
clients requesting them.

Extended headers are not essential to the information collected during
'qemu-nbd --list', but probing for it gives us one more piece of
information in that output.  Update the iotests affected by the new
line of output.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
---

v5: add R-b

v4: rebase to earlier changes, tweak commit message for why qemu-nbd
connection to /dev/nbd cannot use extended mode [Vladimir]
---
 nbd/client-connection.c                       |  2 +-
 nbd/client.c                                  | 20 ++++++++++++++-----
 qemu-nbd.c                                    |  3 +++
 tests/qemu-iotests/223.out                    |  6 ++++++
 tests/qemu-iotests/233.out                    |  4 ++++
 tests/qemu-iotests/241.out                    |  3 +++
 tests/qemu-iotests/307.out                    |  5 +++++
 .../tests/nbd-qemu-allocation.out             |  1 +
 8 files changed, 38 insertions(+), 6 deletions(-)

diff --git a/nbd/client-connection.c b/nbd/client-connection.c
index aa0201b7107..f9da67c87e3 100644
--- a/nbd/client-connection.c
+++ b/nbd/client-connection.c
@@ -93,7 +93,7 @@ NBDClientConnection *nbd_client_connection_new(const SocketAddress *saddr,
         .do_negotiation = do_negotiation,

         .initial_info.request_sizes = true,
-        .initial_info.mode = NBD_MODE_STRUCTURED,
+        .initial_info.mode = NBD_MODE_EXTENDED,
         .initial_info.base_allocation = true,
         .initial_info.x_dirty_bitmap = g_strdup(x_dirty_bitmap),
         .initial_info.name = g_strdup(export_name ?: "")
diff --git a/nbd/client.c b/nbd/client.c
index a2f253062aa..29ffc609a4b 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -953,15 +953,23 @@ static int nbd_start_negotiate(QIOChannel *ioc, QCryptoTLSCreds *tlscreds,
         if (fixedNewStyle) {
             int result = 0;

+            if (max_mode >= NBD_MODE_EXTENDED) {
+                result = nbd_request_simple_option(ioc,
+                                                   NBD_OPT_EXTENDED_HEADERS,
+                                                   false, errp);
+                if (result) {
+                    return result < 0 ? -EINVAL : NBD_MODE_EXTENDED;
+                }
+            }
             if (max_mode >= NBD_MODE_STRUCTURED) {
                 result = nbd_request_simple_option(ioc,
                                                    NBD_OPT_STRUCTURED_REPLY,
                                                    false, errp);
-                if (result < 0) {
-                    return -EINVAL;
+                if (result) {
+                    return result < 0 ? -EINVAL : NBD_MODE_STRUCTURED;
                 }
             }
-            return result ? NBD_MODE_STRUCTURED : NBD_MODE_SIMPLE;
+            return NBD_MODE_SIMPLE;
         } else {
             return NBD_MODE_EXPORT_NAME;
         }
@@ -1034,6 +1042,7 @@ int nbd_receive_negotiate(QIOChannel *ioc, QCryptoTLSCreds *tlscreds,
     }

     switch (info->mode) {
+    case NBD_MODE_EXTENDED:
     case NBD_MODE_STRUCTURED:
         if (base_allocation) {
             result = nbd_negotiate_simple_meta_context(ioc, info, errp);
@@ -1144,7 +1153,7 @@ int nbd_receive_export_list(QIOChannel *ioc, QCryptoTLSCreds *tlscreds,

     *info = NULL;
     result = nbd_start_negotiate(ioc, tlscreds, hostname, &sioc,
-                                 NBD_MODE_STRUCTURED, NULL, errp);
+                                 NBD_MODE_EXTENDED, NULL, errp);
     if (tlscreds && sioc) {
         ioc = sioc;
     }
@@ -1155,6 +1164,7 @@ int nbd_receive_export_list(QIOChannel *ioc, QCryptoTLSCreds *tlscreds,
     switch ((NBDMode)result) {
     case NBD_MODE_SIMPLE:
     case NBD_MODE_STRUCTURED:
+    case NBD_MODE_EXTENDED:
         /* newstyle - use NBD_OPT_LIST to populate array, then try
          * NBD_OPT_INFO on each array member. If structured replies
          * are enabled, also try NBD_OPT_LIST_META_CONTEXT. */
@@ -1191,7 +1201,7 @@ int nbd_receive_export_list(QIOChannel *ioc, QCryptoTLSCreds *tlscreds,
                 break;
             }

-            if (result == NBD_MODE_STRUCTURED &&
+            if (result >= NBD_MODE_STRUCTURED &&
                 nbd_list_meta_contexts(ioc, &array[i], errp) < 0) {
                 goto out;
             }
diff --git a/qemu-nbd.c b/qemu-nbd.c
index 70aa3c487a0..e6681450cfe 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -235,6 +235,9 @@ static int qemu_nbd_client_list(SocketAddress *saddr, QCryptoTLSCreds *tls,
             printf("  opt block: %u\n", list[i].opt_block);
             printf("  max block: %u\n", list[i].max_block);
         }
+        printf("  transaction size: %s\n",
+               list[i].mode >= NBD_MODE_EXTENDED ?
+               "64-bit" : "32-bit");
         if (list[i].n_contexts) {
             printf("  available meta contexts: %d\n", list[i].n_contexts);
             for (j = 0; j < list[i].n_contexts; j++) {
diff --git a/tests/qemu-iotests/223.out b/tests/qemu-iotests/223.out
index 86a37014d0f..32f05f1c9af 100644
--- a/tests/qemu-iotests/223.out
+++ b/tests/qemu-iotests/223.out
@@ -87,6 +87,7 @@ exports available: 3
   min block: 1
   opt block: 4096
   max block: 33554432
+  transaction size: 64-bit
   available meta contexts: 2
    base:allocation
    qemu:dirty-bitmap:b
@@ -97,6 +98,7 @@ exports available: 3
   min block: 1
   opt block: 4096
   max block: 33554432
+  transaction size: 64-bit
   available meta contexts: 2
    base:allocation
    qemu:dirty-bitmap:b2
@@ -106,6 +108,7 @@ exports available: 3
   min block: 1
   opt block: 4096
   max block: 33554432
+  transaction size: 64-bit
   available meta contexts: 2
    base:allocation
    qemu:dirty-bitmap:b3
@@ -206,6 +209,7 @@ exports available: 3
   min block: 1
   opt block: 4096
   max block: 33554432
+  transaction size: 64-bit
   available meta contexts: 2
    base:allocation
    qemu:dirty-bitmap:b
@@ -216,6 +220,7 @@ exports available: 3
   min block: 1
   opt block: 4096
   max block: 33554432
+  transaction size: 64-bit
   available meta contexts: 2
    base:allocation
    qemu:dirty-bitmap:b2
@@ -225,6 +230,7 @@ exports available: 3
   min block: 1
   opt block: 4096
   max block: 33554432
+  transaction size: 64-bit
   available meta contexts: 2
    base:allocation
    qemu:dirty-bitmap:b3
diff --git a/tests/qemu-iotests/233.out b/tests/qemu-iotests/233.out
index 237c82767ea..1910f7df20f 100644
--- a/tests/qemu-iotests/233.out
+++ b/tests/qemu-iotests/233.out
@@ -39,6 +39,7 @@ exports available: 1
  export: ''
   size:  67108864
   min block: 1
+  transaction size: 64-bit

 == check TLS fail over TCP with mismatched hostname ==
 qemu-img: Could not open 'driver=nbd,host=localhost,port=PORT,tls-creds=tls0': Certificate does not match the hostname localhost
@@ -53,6 +54,7 @@ exports available: 1
  export: ''
   size:  67108864
   min block: 1
+  transaction size: 64-bit

 == check TLS with different CA fails ==
 qemu-img: Could not open 'driver=nbd,host=127.0.0.1,port=PORT,tls-creds=tls0': The certificate hasn't got a known issuer
@@ -83,6 +85,7 @@ exports available: 1
  export: ''
   size:  67108864
   min block: 1
+  transaction size: 64-bit

 == check TLS works over UNIX with PSK ==
 image: nbd+unix://?socket=SOCK_DIR/qemu-nbd.sock
@@ -93,6 +96,7 @@ exports available: 1
  export: ''
   size:  67108864
   min block: 1
+  transaction size: 64-bit

 == check TLS fails over UNIX with mismatch PSK ==
 qemu-img: Could not open 'driver=nbd,path=SOCK_DIR/qemu-nbd.sock,tls-creds=tls0': TLS handshake failed: The TLS connection was non-properly terminated.
diff --git a/tests/qemu-iotests/241.out b/tests/qemu-iotests/241.out
index 7946c286d51..7267cd2997e 100644
--- a/tests/qemu-iotests/241.out
+++ b/tests/qemu-iotests/241.out
@@ -6,6 +6,7 @@ exports available: 1
  export: ''
   size:  1024
   min block: 1
+  transaction size: 64-bit
 [{ "start": 0, "length": 1000, "depth": 0, "present": true, "zero": false, "data": true, "compressed": false, "offset": OFFSET},
 { "start": 1000, "length": 24, "depth": 0, "present": true, "zero": true, "data": false, "compressed": false, "offset": OFFSET}]
 1 KiB (0x400) bytes     allocated at offset 0 bytes (0x0)
@@ -16,6 +17,7 @@ exports available: 1
  export: ''
   size:  1024
   min block: 512
+  transaction size: 64-bit
 [{ "start": 0, "length": 1024, "depth": 0, "present": true, "zero": false, "data": true, "compressed": false, "offset": OFFSET}]
 1 KiB (0x400) bytes     allocated at offset 0 bytes (0x0)
 WARNING: Image format was not specified for 'TEST_DIR/t.raw' and probing guessed raw.
@@ -28,6 +30,7 @@ exports available: 1
  export: ''
   size:  1024
   min block: 1
+  transaction size: 64-bit
 [{ "start": 0, "length": 1000, "depth": 0, "present": true, "zero": false, "data": true, "compressed": false, "offset": OFFSET},
 { "start": 1000, "length": 24, "depth": 0, "present": true, "zero": true, "data": false, "compressed": false, "offset": OFFSET}]
 1 KiB (0x400) bytes     allocated at offset 0 bytes (0x0)
diff --git a/tests/qemu-iotests/307.out b/tests/qemu-iotests/307.out
index 390f05d1b78..2b9a6a67a1a 100644
--- a/tests/qemu-iotests/307.out
+++ b/tests/qemu-iotests/307.out
@@ -19,6 +19,7 @@ exports available: 1
   min block: XXX
   opt block: XXX
   max block: XXX
+  transaction size: 64-bit
   available meta contexts: 1
    base:allocation

@@ -47,6 +48,7 @@ exports available: 1
   min block: XXX
   opt block: XXX
   max block: XXX
+  transaction size: 64-bit
   available meta contexts: 1
    base:allocation

@@ -78,6 +80,7 @@ exports available: 2
   min block: XXX
   opt block: XXX
   max block: XXX
+  transaction size: 64-bit
   available meta contexts: 1
    base:allocation
  export: 'export1'
@@ -87,6 +90,7 @@ exports available: 2
   min block: XXX
   opt block: XXX
   max block: XXX
+  transaction size: 64-bit
   available meta contexts: 1
    base:allocation

@@ -113,6 +117,7 @@ exports available: 1
   min block: XXX
   opt block: XXX
   max block: XXX
+  transaction size: 64-bit
   available meta contexts: 1
    base:allocation

diff --git a/tests/qemu-iotests/tests/nbd-qemu-allocation.out b/tests/qemu-iotests/tests/nbd-qemu-allocation.out
index 138eb09c6d0..1db16926ed4 100644
--- a/tests/qemu-iotests/tests/nbd-qemu-allocation.out
+++ b/tests/qemu-iotests/tests/nbd-qemu-allocation.out
@@ -21,6 +21,7 @@ exports available: 1
   min block: 1
   opt block: 4096
   max block: 33554432
+  transaction size: 64-bit
   available meta contexts: 2
    base:allocation
    qemu:allocation-depth
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v7 10/12] nbd/server: Refactor list of negotiated meta contexts
  2023-09-25 19:22 [PATCH v7 00/12] NBD 64-bit extensions for qemu Eric Blake
                   ` (8 preceding siblings ...)
  2023-09-25 19:22 ` [PATCH v7 09/12] nbd/client: Request extended headers during negotiation Eric Blake
@ 2023-09-25 19:22 ` Eric Blake
  2023-09-25 19:22 ` [PATCH v7 11/12] nbd/server: Prepare for per-request filtering of BLOCK_STATUS Eric Blake
  2023-09-25 19:22 ` [PATCH v7 12/12] nbd/server: Add FLAG_PAYLOAD support to CMD_BLOCK_STATUS Eric Blake
  11 siblings, 0 replies; 30+ messages in thread
From: Eric Blake @ 2023-09-25 19:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: libguestfs, qemu-block, vsementsov, Kevin Wolf, Hanna Reitz

Peform several minor refactorings of how the list of negotiated meta
contexts is managed, to make upcoming patches easier: Promote the
internal type NBDExportMetaContexts to the public opaque type
NBDMetaContexts, and mark exp const.  Use a shorter member name in
NBDClient.  Hoist calls to nbd_check_meta_context() earlier in their
callers, as the number of negotiated contexts may impact the flags
exposed in regards to an export, which in turn requires a new
parameter.  Drop a redundant parameter to nbd_negotiate_meta_queries.
No semantic change intended on the success path; on the failure path,
dropping context in nbd_check_meta_export even when reporting an error
is safer.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
---

v5: rebase to master, tweak commit message [Vladimir], R-b added

v4: new patch split out from v3 13/14, with smaller impact (quit
trying to separate exp outside of NBDMeataContexts)
---
 include/block/nbd.h |  1 +
 nbd/server.c        | 55 ++++++++++++++++++++++++---------------------
 2 files changed, 31 insertions(+), 25 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index ba8724f5336..2006497f987 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -29,6 +29,7 @@
 typedef struct NBDExport NBDExport;
 typedef struct NBDClient NBDClient;
 typedef struct NBDClientConnection NBDClientConnection;
+typedef struct NBDMetaContexts NBDMetaContexts;

 extern const BlockExportDriver blk_exp_nbd;

diff --git a/nbd/server.c b/nbd/server.c
index b09ee44c159..44ebbd139b2 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -105,11 +105,13 @@ struct NBDExport {

 static QTAILQ_HEAD(, NBDExport) exports = QTAILQ_HEAD_INITIALIZER(exports);

-/* NBDExportMetaContexts represents a list of contexts to be exported,
+/*
+ * NBDMetaContexts represents a list of meta contexts in use,
  * as selected by NBD_OPT_SET_META_CONTEXT. Also used for
- * NBD_OPT_LIST_META_CONTEXT. */
-typedef struct NBDExportMetaContexts {
-    NBDExport *exp;
+ * NBD_OPT_LIST_META_CONTEXT.
+ */
+struct NBDMetaContexts {
+    const NBDExport *exp; /* associated export */
     size_t count; /* number of negotiated contexts */
     bool base_allocation; /* export base:allocation context (block status) */
     bool allocation_depth; /* export qemu:allocation-depth */
@@ -117,7 +119,7 @@ typedef struct NBDExportMetaContexts {
                     * export qemu:dirty-bitmap:<export bitmap name>,
                     * sized by exp->nr_export_bitmaps
                     */
-} NBDExportMetaContexts;
+};

 struct NBDClient {
     int refcount;
@@ -144,7 +146,7 @@ struct NBDClient {
     uint32_t check_align; /* If non-zero, check for aligned client requests */

     NBDMode mode;
-    NBDExportMetaContexts export_meta;
+    NBDMetaContexts contexts; /* Negotiated meta contexts */

     uint32_t opt; /* Current option being negotiated */
     uint32_t optlen; /* remaining length of data in ioc for the option being
@@ -455,10 +457,10 @@ static int nbd_negotiate_handle_list(NBDClient *client, Error **errp)
     return nbd_negotiate_send_rep(client, NBD_REP_ACK, errp);
 }

-static void nbd_check_meta_export(NBDClient *client)
+static void nbd_check_meta_export(NBDClient *client, NBDExport *exp)
 {
-    if (client->exp != client->export_meta.exp) {
-        client->export_meta.count = 0;
+    if (exp != client->contexts.exp) {
+        client->contexts.count = 0;
     }
 }

@@ -504,6 +506,7 @@ static int nbd_negotiate_handle_export_name(NBDClient *client, bool no_zeroes,
         error_setg(errp, "export not found");
         return -EINVAL;
     }
+    nbd_check_meta_export(client, client->exp);

     myflags = client->exp->nbdflags;
     if (client->mode >= NBD_MODE_STRUCTURED) {
@@ -521,7 +524,6 @@ static int nbd_negotiate_handle_export_name(NBDClient *client, bool no_zeroes,

     QTAILQ_INSERT_TAIL(&client->exp->clients, client, next);
     blk_exp_ref(&client->exp->common);
-    nbd_check_meta_export(client);

     return 0;
 }
@@ -641,6 +643,9 @@ static int nbd_negotiate_handle_info(NBDClient *client, Error **errp)
                                           errp, "export '%s' not present",
                                           sane_name);
     }
+    if (client->opt == NBD_OPT_GO) {
+        nbd_check_meta_export(client, exp);
+    }

     /* Don't bother sending NBD_INFO_NAME unless client requested it */
     if (sendname) {
@@ -729,7 +734,6 @@ static int nbd_negotiate_handle_info(NBDClient *client, Error **errp)
         client->check_align = check_align;
         QTAILQ_INSERT_TAIL(&client->exp->clients, client, next);
         blk_exp_ref(&client->exp->common);
-        nbd_check_meta_export(client);
         rc = 1;
     }
     return rc;
@@ -852,7 +856,7 @@ static bool nbd_strshift(const char **str, const char *prefix)
  * Handle queries to 'base' namespace. For now, only the base:allocation
  * context is available.  Return true if @query has been handled.
  */
-static bool nbd_meta_base_query(NBDClient *client, NBDExportMetaContexts *meta,
+static bool nbd_meta_base_query(NBDClient *client, NBDMetaContexts *meta,
                                 const char *query)
 {
     if (!nbd_strshift(&query, "base:")) {
@@ -872,7 +876,7 @@ static bool nbd_meta_base_query(NBDClient *client, NBDExportMetaContexts *meta,
  * and qemu:allocation-depth contexts are available.  Return true if @query
  * has been handled.
  */
-static bool nbd_meta_qemu_query(NBDClient *client, NBDExportMetaContexts *meta,
+static bool nbd_meta_qemu_query(NBDClient *client, NBDMetaContexts *meta,
                                 const char *query)
 {
     size_t i;
@@ -938,7 +942,7 @@ static bool nbd_meta_qemu_query(NBDClient *client, NBDExportMetaContexts *meta,
  * Return -errno on I/O error, 0 if option was completely handled by
  * sending a reply about inconsistent lengths, or 1 on success. */
 static int nbd_negotiate_meta_query(NBDClient *client,
-                                    NBDExportMetaContexts *meta, Error **errp)
+                                    NBDMetaContexts *meta, Error **errp)
 {
     int ret;
     g_autofree char *query = NULL;
@@ -977,14 +981,14 @@ static int nbd_negotiate_meta_query(NBDClient *client,
  * Handle NBD_OPT_LIST_META_CONTEXT and NBD_OPT_SET_META_CONTEXT
  *
  * Return -errno on I/O error, or 0 if option was completely handled. */
-static int nbd_negotiate_meta_queries(NBDClient *client,
-                                      NBDExportMetaContexts *meta, Error **errp)
+static int nbd_negotiate_meta_queries(NBDClient *client, Error **errp)
 {
     int ret;
     g_autofree char *export_name = NULL;
     /* Mark unused to work around https://bugs.llvm.org/show_bug.cgi?id=3888 */
     g_autofree G_GNUC_UNUSED bool *bitmaps = NULL;
-    NBDExportMetaContexts local_meta = {0};
+    NBDMetaContexts local_meta = {0};
+    NBDMetaContexts *meta;
     uint32_t nb_queries;
     size_t i;
     size_t count = 0;
@@ -1000,6 +1004,8 @@ static int nbd_negotiate_meta_queries(NBDClient *client,
     if (client->opt == NBD_OPT_LIST_META_CONTEXT) {
         /* Only change the caller's meta on SET. */
         meta = &local_meta;
+    } else {
+        meta = &client->contexts;
     }

     g_free(meta->bitmaps);
@@ -1284,8 +1290,7 @@ static int nbd_negotiate_options(NBDClient *client, Error **errp)

             case NBD_OPT_LIST_META_CONTEXT:
             case NBD_OPT_SET_META_CONTEXT:
-                ret = nbd_negotiate_meta_queries(client, &client->export_meta,
-                                                 errp);
+                ret = nbd_negotiate_meta_queries(client, errp);
                 break;

             case NBD_OPT_EXTENDED_HEADERS:
@@ -1512,7 +1517,7 @@ void nbd_client_put(NBDClient *client)
             QTAILQ_REMOVE(&client->exp->clients, client, next);
             blk_exp_unref(&client->exp->common);
         }
-        g_free(client->export_meta.bitmaps);
+        g_free(client->contexts.bitmaps);
         g_free(client);
     }
 }
@@ -2746,11 +2751,11 @@ static coroutine_fn int nbd_handle_request(NBDClient *client,
         }
         assert(client->mode >= NBD_MODE_EXTENDED ||
                request->len <= UINT32_MAX);
-        if (client->export_meta.count) {
+        if (client->contexts.count) {
             bool dont_fragment = request->flags & NBD_CMD_FLAG_REQ_ONE;
-            int contexts_remaining = client->export_meta.count;
+            int contexts_remaining = client->contexts.count;

-            if (client->export_meta.base_allocation) {
+            if (client->contexts.base_allocation) {
                 ret = nbd_co_send_block_status(client, request,
                                                exp->common.blk,
                                                request->from,
@@ -2763,7 +2768,7 @@ static coroutine_fn int nbd_handle_request(NBDClient *client,
                 }
             }

-            if (client->export_meta.allocation_depth) {
+            if (client->contexts.allocation_depth) {
                 ret = nbd_co_send_block_status(client, request,
                                                exp->common.blk,
                                                request->from, request->len,
@@ -2777,7 +2782,7 @@ static coroutine_fn int nbd_handle_request(NBDClient *client,
             }

             for (i = 0; i < client->exp->nr_export_bitmaps; i++) {
-                if (!client->export_meta.bitmaps[i]) {
+                if (!client->contexts.bitmaps[i]) {
                     continue;
                 }
                 ret = nbd_co_send_bitmap(client, request,
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v7 11/12] nbd/server: Prepare for per-request filtering of BLOCK_STATUS
  2023-09-25 19:22 [PATCH v7 00/12] NBD 64-bit extensions for qemu Eric Blake
                   ` (9 preceding siblings ...)
  2023-09-25 19:22 ` [PATCH v7 10/12] nbd/server: Refactor list of negotiated meta contexts Eric Blake
@ 2023-09-25 19:22 ` Eric Blake
  2023-09-29 11:54   ` Vladimir Sementsov-Ogievskiy
  2023-09-25 19:22 ` [PATCH v7 12/12] nbd/server: Add FLAG_PAYLOAD support to CMD_BLOCK_STATUS Eric Blake
  11 siblings, 1 reply; 30+ messages in thread
From: Eric Blake @ 2023-09-25 19:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: libguestfs, qemu-block, vsementsov, Kevin Wolf, Hanna Reitz

The next commit will add support for the optional extension
NBD_CMD_FLAG_PAYLOAD during NBD_CMD_BLOCK_STATUS, where the client can
request that the server only return a subset of negotiated contexts,
rather than all contexts.  To make that task easier, this patch
populates the list of contexts to return on a per-command basis (for
now, identical to the full set of negotiated contexts).

Signed-off-by: Eric Blake <eblake@redhat.com>
---

v5: fix null dereference on early error [Vladimir], hoist in assertion
from v4 24/24

v4: split out NBDMetaContexts refactoring to its own patch, track
NBDRequests.contexts as a pointer rather than inline
---
 include/block/nbd.h |  1 +
 nbd/server.c        | 22 +++++++++++++++++-----
 2 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index 2006497f987..4e7bd6342f9 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -77,6 +77,7 @@ typedef struct NBDRequest {
     uint16_t flags; /* NBD_CMD_FLAG_* */
     uint16_t type;  /* NBD_CMD_* */
     NBDMode mode;   /* Determines which network representation to use */
+    NBDMetaContexts *contexts; /* Used by NBD_CMD_BLOCK_STATUS */
 } NBDRequest;

 typedef struct NBDSimpleReply {
diff --git a/nbd/server.c b/nbd/server.c
index 44ebbd139b2..2d4cec75a49 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -2505,6 +2505,7 @@ static int coroutine_fn nbd_co_receive_request(NBDRequestData *req,
         break;

     case NBD_CMD_BLOCK_STATUS:
+        request->contexts = &client->contexts;
         valid_flags |= NBD_CMD_FLAG_REQ_ONE;
         break;

@@ -2745,17 +2746,18 @@ static coroutine_fn int nbd_handle_request(NBDClient *client,
                                       "discard failed", errp);

     case NBD_CMD_BLOCK_STATUS:
+        assert(request->contexts);
         if (!request->len) {
             return nbd_send_generic_reply(client, request, -EINVAL,
                                           "need non-zero length", errp);
         }
         assert(client->mode >= NBD_MODE_EXTENDED ||
                request->len <= UINT32_MAX);
-        if (client->contexts.count) {
+        if (request->contexts->count) {
             bool dont_fragment = request->flags & NBD_CMD_FLAG_REQ_ONE;
-            int contexts_remaining = client->contexts.count;
+            int contexts_remaining = request->contexts->count;

-            if (client->contexts.base_allocation) {
+            if (request->contexts->base_allocation) {
                 ret = nbd_co_send_block_status(client, request,
                                                exp->common.blk,
                                                request->from,
@@ -2768,7 +2770,7 @@ static coroutine_fn int nbd_handle_request(NBDClient *client,
                 }
             }

-            if (client->contexts.allocation_depth) {
+            if (request->contexts->allocation_depth) {
                 ret = nbd_co_send_block_status(client, request,
                                                exp->common.blk,
                                                request->from, request->len,
@@ -2781,8 +2783,9 @@ static coroutine_fn int nbd_handle_request(NBDClient *client,
                 }
             }

+            assert(request->contexts->exp == client->exp);
             for (i = 0; i < client->exp->nr_export_bitmaps; i++) {
-                if (!client->contexts.bitmaps[i]) {
+                if (!request->contexts->bitmaps[i]) {
                     continue;
                 }
                 ret = nbd_co_send_bitmap(client, request,
@@ -2798,6 +2801,10 @@ static coroutine_fn int nbd_handle_request(NBDClient *client,
             assert(!contexts_remaining);

             return 0;
+        } else if (client->contexts.count) {
+            return nbd_send_generic_reply(client, request, -EINVAL,
+                                          "CMD_BLOCK_STATUS payload not valid",
+                                          errp);
         } else {
             return nbd_send_generic_reply(client, request, -EINVAL,
                                           "CMD_BLOCK_STATUS not negotiated",
@@ -2876,6 +2883,11 @@ static coroutine_fn void nbd_trip(void *opaque)
     } else {
         ret = nbd_handle_request(client, &request, req->data, &local_err);
     }
+    if (request.contexts && request.contexts != &client->contexts) {
+        assert(request.type == NBD_CMD_BLOCK_STATUS);
+        g_free(request.contexts->bitmaps);
+        g_free(request.contexts);
+    }
     if (ret < 0) {
         error_prepend(&local_err, "Failed to send reply: ");
         goto disconnect;
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v7 12/12] nbd/server: Add FLAG_PAYLOAD support to CMD_BLOCK_STATUS
  2023-09-25 19:22 [PATCH v7 00/12] NBD 64-bit extensions for qemu Eric Blake
                   ` (10 preceding siblings ...)
  2023-09-25 19:22 ` [PATCH v7 11/12] nbd/server: Prepare for per-request filtering of BLOCK_STATUS Eric Blake
@ 2023-09-25 19:22 ` Eric Blake
  2023-09-30 13:24   ` Vladimir Sementsov-Ogievskiy
  2023-10-05 14:33   ` Vladimir Sementsov-Ogievskiy
  11 siblings, 2 replies; 30+ messages in thread
From: Eric Blake @ 2023-09-25 19:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: libguestfs, qemu-block, vsementsov, Kevin Wolf, Hanna Reitz

Allow a client to request a subset of negotiated meta contexts.  For
example, a client may ask to use a single connection to learn about
both block status and dirty bitmaps, but where the dirty bitmap
queries only need to be performed on a subset of the disk; forcing the
server to compute that information on block status queries in the rest
of the disk is wasted effort (both at the server, and on the amount of
traffic sent over the wire to be parsed and ignored by the client).

Qemu as an NBD client never requests to use more than one meta
context, so it has no need to use block status payloads.  Testing this
instead requires support from libnbd, which CAN access multiple meta
contexts in parallel from a single NBD connection; an interop test
submitted to the libnbd project at the same time as this patch
demonstrates the feature working, as well as testing some corner cases
(for example, when the payload length is longer than the export
length), although other corner cases (like passing the same id
duplicated) requires a protocol fuzzer because libnbd is not wired up
to break the protocol that badly.

This also includes tweaks to 'qemu-nbd --list' to show when a server
is advertising the capability, and to the testsuite to reflect the
addition to that output.

Of note: qemu will always advertise the new feature bit during
NBD_OPT_INFO if extended headers have alreay been negotiated
(regardless of whether any NBD_OPT_SET_META_CONTEXT negotiation has
occurred); but for NBD_OPT_GO, qemu only advertises the feature if
block status is also enabled (that is, if the client does not
negotiate any contexts, then NBD_CMD_BLOCK_STATUS cannot be used, so
the feature is not advertised).

Signed-off-by: Eric Blake <eblake@redhat.com>
---

v5: factor out 'id - NBD_MTA_ID_DIRTY_BITMAP' [Vladimir], rework logic
on zero-length requests to be clearer [Vladimir], rebase to earlier
changes
---
 docs/interop/nbd.txt                          |   2 +-
 nbd/server.c                                  | 114 ++++++++++++++++--
 qemu-nbd.c                                    |   1 +
 nbd/trace-events                              |   1 +
 tests/qemu-iotests/223.out                    |  12 +-
 tests/qemu-iotests/307.out                    |  10 +-
 .../tests/nbd-qemu-allocation.out             |   2 +-
 7 files changed, 122 insertions(+), 20 deletions(-)

diff --git a/docs/interop/nbd.txt b/docs/interop/nbd.txt
index 9aae5e1f294..18efb251de9 100644
--- a/docs/interop/nbd.txt
+++ b/docs/interop/nbd.txt
@@ -69,4 +69,4 @@ NBD_CMD_BLOCK_STATUS for "qemu:dirty-bitmap:", NBD_CMD_CACHE
 NBD_CMD_FLAG_FAST_ZERO
 * 5.2: NBD_CMD_BLOCK_STATUS for "qemu:allocation-depth"
 * 7.1: NBD_FLAG_CAN_MULTI_CONN for shareable writable exports
-* 8.2: NBD_OPT_EXTENDED_HEADERS
+* 8.2: NBD_OPT_EXTENDED_HEADERS, NBD_FLAG_BLOCK_STATUS_PAYLOAD
diff --git a/nbd/server.c b/nbd/server.c
index 2d4cec75a49..898580a9b0b 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -512,6 +512,9 @@ static int nbd_negotiate_handle_export_name(NBDClient *client, bool no_zeroes,
     if (client->mode >= NBD_MODE_STRUCTURED) {
         myflags |= NBD_FLAG_SEND_DF;
     }
+    if (client->mode >= NBD_MODE_EXTENDED && client->contexts.count) {
+        myflags |= NBD_FLAG_BLOCK_STAT_PAYLOAD;
+    }
     trace_nbd_negotiate_new_style_size_flags(client->exp->size, myflags);
     stq_be_p(buf, client->exp->size);
     stw_be_p(buf + 8, myflags);
@@ -699,6 +702,10 @@ static int nbd_negotiate_handle_info(NBDClient *client, Error **errp)
     if (client->mode >= NBD_MODE_STRUCTURED) {
         myflags |= NBD_FLAG_SEND_DF;
     }
+    if (client->mode >= NBD_MODE_EXTENDED &&
+        (client->contexts.count || client->opt == NBD_OPT_INFO)) {
+        myflags |= NBD_FLAG_BLOCK_STAT_PAYLOAD;
+    }
     trace_nbd_negotiate_new_style_size_flags(exp->size, myflags);
     stq_be_p(buf, exp->size);
     stw_be_p(buf + 8, myflags);
@@ -2420,6 +2427,87 @@ static int coroutine_fn nbd_co_send_bitmap(NBDClient *client,
     return nbd_co_send_extents(client, request, ea, last, context_id, errp);
 }

+/*
+ * nbd_co_block_status_payload_read
+ * Called when a client wants a subset of negotiated contexts via a
+ * BLOCK_STATUS payload.  Check the payload for valid length and
+ * contents.  On success, return 0 with request updated to effective
+ * length.  If request was invalid but all payload consumed, return 0
+ * with request->len and request->contexts->count set to 0 (which will
+ * trigger an appropriate NBD_EINVAL response later on).  Return
+ * negative errno if the payload was not fully consumed.
+ */
+static int
+nbd_co_block_status_payload_read(NBDClient *client, NBDRequest *request,
+                                 Error **errp)
+{
+    int payload_len = request->len;
+    g_autofree char *buf = NULL;
+    size_t count, i, nr_bitmaps;
+    uint32_t id;
+
+    if (payload_len > NBD_MAX_BUFFER_SIZE) {
+        error_setg(errp, "len (%" PRIu64 ") is larger than max len (%u)",
+                   request->len, NBD_MAX_BUFFER_SIZE);
+        return -EINVAL;
+    }
+
+    assert(client->contexts.exp == client->exp);
+    nr_bitmaps = client->exp->nr_export_bitmaps;
+    request->contexts = g_new0(NBDMetaContexts, 1);
+    request->contexts->exp = client->exp;
+
+    if (payload_len % sizeof(uint32_t) ||
+        payload_len < sizeof(NBDBlockStatusPayload) ||
+        payload_len > (sizeof(NBDBlockStatusPayload) +
+                       sizeof(id) * client->contexts.count)) {
+        goto skip;
+    }
+
+    buf = g_malloc(payload_len);
+    if (nbd_read(client->ioc, buf, payload_len,
+                 "CMD_BLOCK_STATUS data", errp) < 0) {
+        return -EIO;
+    }
+    trace_nbd_co_receive_request_payload_received(request->cookie,
+                                                  payload_len);
+    request->contexts->bitmaps = g_new0(bool, nr_bitmaps);
+    count = (payload_len - sizeof(NBDBlockStatusPayload)) / sizeof(id);
+    payload_len = 0;
+
+    for (i = 0; i < count; i++) {
+        id = ldl_be_p(buf + sizeof(NBDBlockStatusPayload) + sizeof(id) * i);
+        if (id == NBD_META_ID_BASE_ALLOCATION) {
+            if (request->contexts->base_allocation) {
+                goto skip;
+            }
+            request->contexts->base_allocation = true;
+        } else if (id == NBD_META_ID_ALLOCATION_DEPTH) {
+            if (request->contexts->allocation_depth) {
+                goto skip;
+            }
+            request->contexts->allocation_depth = true;
+        } else {
+            int idx = id - NBD_META_ID_DIRTY_BITMAP;
+
+            if (idx > nr_bitmaps || request->contexts->bitmaps[idx]) {
+                goto skip;
+            }
+            request->contexts->bitmaps[idx] = true;
+        }
+    }
+
+    request->len = ldq_be_p(buf);
+    request->contexts->count = count;
+    return 0;
+
+ skip:
+    trace_nbd_co_receive_block_status_payload_compliance(request->from,
+                                                         request->len);
+    request->len = request->contexts->count = 0;
+    return nbd_drop(client->ioc, payload_len, errp);
+}
+
 /* nbd_co_receive_request
  * Collect a client request. Return 0 if request looks valid, -EIO to drop
  * connection right away, -EAGAIN to indicate we were interrupted and the
@@ -2505,7 +2593,18 @@ static int coroutine_fn nbd_co_receive_request(NBDRequestData *req,
         break;

     case NBD_CMD_BLOCK_STATUS:
-        request->contexts = &client->contexts;
+        if (extended_with_payload) {
+            ret = nbd_co_block_status_payload_read(client, request, errp);
+            if (ret < 0) {
+                return ret;
+            }
+            /* payload now consumed */
+            check_length = extended_with_payload = false;
+            payload_len = 0;
+            valid_flags |= NBD_CMD_FLAG_PAYLOAD_LEN;
+        } else {
+            request->contexts = &client->contexts;
+        }
         valid_flags |= NBD_CMD_FLAG_REQ_ONE;
         break;

@@ -2747,16 +2846,16 @@ static coroutine_fn int nbd_handle_request(NBDClient *client,

     case NBD_CMD_BLOCK_STATUS:
         assert(request->contexts);
-        if (!request->len) {
-            return nbd_send_generic_reply(client, request, -EINVAL,
-                                          "need non-zero length", errp);
-        }
         assert(client->mode >= NBD_MODE_EXTENDED ||
                request->len <= UINT32_MAX);
         if (request->contexts->count) {
             bool dont_fragment = request->flags & NBD_CMD_FLAG_REQ_ONE;
             int contexts_remaining = request->contexts->count;

+            if (!request->len) {
+                return nbd_send_generic_reply(client, request, -EINVAL,
+                                              "need non-zero length", errp);
+            }
             if (request->contexts->base_allocation) {
                 ret = nbd_co_send_block_status(client, request,
                                                exp->common.blk,
@@ -2893,8 +2992,9 @@ static coroutine_fn void nbd_trip(void *opaque)
         goto disconnect;
     }

-    /* We must disconnect after NBD_CMD_WRITE if we did not
-     * read the payload.
+    /*
+     * We must disconnect after NBD_CMD_WRITE or BLOCK_STATUS with
+     * payload if we did not read the payload.
      */
     if (!req->complete) {
         error_setg(&local_err, "Request handling failed in intermediate state");
diff --git a/qemu-nbd.c b/qemu-nbd.c
index e6681450cfe..7f1734cfccc 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -219,6 +219,7 @@ static int qemu_nbd_client_list(SocketAddress *saddr, QCryptoTLSCreds *tls,
                 [NBD_FLAG_SEND_RESIZE_BIT]          = "resize",
                 [NBD_FLAG_SEND_CACHE_BIT]           = "cache",
                 [NBD_FLAG_SEND_FAST_ZERO_BIT]       = "fast-zero",
+                [NBD_FLAG_BLOCK_STAT_PAYLOAD_BIT]   = "block-status-payload",
             };

             printf("  size:  %" PRIu64 "\n", list[i].size);
diff --git a/nbd/trace-events b/nbd/trace-events
index 8f4e20ee9f2..ac186c19ec0 100644
--- a/nbd/trace-events
+++ b/nbd/trace-events
@@ -70,6 +70,7 @@ nbd_co_send_chunk_read(uint64_t cookie, uint64_t offset, void *data, uint64_t si
 nbd_co_send_chunk_read_hole(uint64_t cookie, uint64_t offset, uint64_t size) "Send structured read hole reply: cookie = %" PRIu64 ", offset = %" PRIu64 ", len = %" PRIu64
 nbd_co_send_extents(uint64_t cookie, unsigned int extents, uint32_t id, uint64_t length, int last) "Send block status reply: cookie = %" PRIu64 ", extents = %u, context = %d (extents cover %" PRIu64 " bytes, last chunk = %d)"
 nbd_co_send_chunk_error(uint64_t cookie, int err, const char *errname, const char *msg) "Send structured error reply: cookie = %" PRIu64 ", error = %d (%s), msg = '%s'"
+nbd_co_receive_block_status_payload_compliance(uint64_t from, int len) "client sent unusable block status payload: from=0x%" PRIx64 ", len=0x%x"
 nbd_co_receive_request_decode_type(uint64_t cookie, uint16_t type, const char *name) "Decoding type: cookie = %" PRIu64 ", type = %" PRIu16 " (%s)"
 nbd_co_receive_request_payload_received(uint64_t cookie, uint64_t len) "Payload received: cookie = %" PRIu64 ", len = %" PRIu64
 nbd_co_receive_ext_payload_compliance(uint64_t from, uint64_t len) "client sent non-compliant write without payload flag: from=0x%" PRIx64 ", len=0x%" PRIx64
diff --git a/tests/qemu-iotests/223.out b/tests/qemu-iotests/223.out
index 32f05f1c9af..e5e7f42caac 100644
--- a/tests/qemu-iotests/223.out
+++ b/tests/qemu-iotests/223.out
@@ -83,7 +83,7 @@ exports available: 0
 exports available: 3
  export: 'n'
   size:  4194304
-  flags: 0x58f ( readonly flush fua df multi cache )
+  flags: 0x158f ( readonly flush fua df multi cache block-status-payload )
   min block: 1
   opt block: 4096
   max block: 33554432
@@ -94,7 +94,7 @@ exports available: 3
  export: 'n2'
   description: some text
   size:  4194304
-  flags: 0xded ( flush fua trim zeroes df multi cache fast-zero )
+  flags: 0x1ded ( flush fua trim zeroes df multi cache fast-zero block-status-payload )
   min block: 1
   opt block: 4096
   max block: 33554432
@@ -104,7 +104,7 @@ exports available: 3
    qemu:dirty-bitmap:b2
  export: 'n3'
   size:  4194304
-  flags: 0x58f ( readonly flush fua df multi cache )
+  flags: 0x158f ( readonly flush fua df multi cache block-status-payload )
   min block: 1
   opt block: 4096
   max block: 33554432
@@ -205,7 +205,7 @@ exports available: 0
 exports available: 3
  export: 'n'
   size:  4194304
-  flags: 0x58f ( readonly flush fua df multi cache )
+  flags: 0x158f ( readonly flush fua df multi cache block-status-payload )
   min block: 1
   opt block: 4096
   max block: 33554432
@@ -216,7 +216,7 @@ exports available: 3
  export: 'n2'
   description: some text
   size:  4194304
-  flags: 0xded ( flush fua trim zeroes df multi cache fast-zero )
+  flags: 0x1ded ( flush fua trim zeroes df multi cache fast-zero block-status-payload )
   min block: 1
   opt block: 4096
   max block: 33554432
@@ -226,7 +226,7 @@ exports available: 3
    qemu:dirty-bitmap:b2
  export: 'n3'
   size:  4194304
-  flags: 0x58f ( readonly flush fua df multi cache )
+  flags: 0x158f ( readonly flush fua df multi cache block-status-payload )
   min block: 1
   opt block: 4096
   max block: 33554432
diff --git a/tests/qemu-iotests/307.out b/tests/qemu-iotests/307.out
index 2b9a6a67a1a..f645f3315f8 100644
--- a/tests/qemu-iotests/307.out
+++ b/tests/qemu-iotests/307.out
@@ -15,7 +15,7 @@ wrote 4096/4096 bytes at offset 0
 exports available: 1
  export: 'fmt'
   size:  67108864
-  flags: 0x58f ( readonly flush fua df multi cache )
+  flags: 0x158f ( readonly flush fua df multi cache block-status-payload )
   min block: XXX
   opt block: XXX
   max block: XXX
@@ -44,7 +44,7 @@ exports available: 1
 exports available: 1
  export: 'fmt'
   size:  67108864
-  flags: 0x58f ( readonly flush fua df multi cache )
+  flags: 0x158f ( readonly flush fua df multi cache block-status-payload )
   min block: XXX
   opt block: XXX
   max block: XXX
@@ -76,7 +76,7 @@ exports available: 1
 exports available: 2
  export: 'fmt'
   size:  67108864
-  flags: 0x58f ( readonly flush fua df multi cache )
+  flags: 0x158f ( readonly flush fua df multi cache block-status-payload )
   min block: XXX
   opt block: XXX
   max block: XXX
@@ -86,7 +86,7 @@ exports available: 2
  export: 'export1'
   description: This is the writable second export
   size:  67108864
-  flags: 0xded ( flush fua trim zeroes df multi cache fast-zero )
+  flags: 0x1ded ( flush fua trim zeroes df multi cache fast-zero block-status-payload )
   min block: XXX
   opt block: XXX
   max block: XXX
@@ -113,7 +113,7 @@ exports available: 1
  export: 'export1'
   description: This is the writable second export
   size:  67108864
-  flags: 0xded ( flush fua trim zeroes df multi cache fast-zero )
+  flags: 0x1ded ( flush fua trim zeroes df multi cache fast-zero block-status-payload )
   min block: XXX
   opt block: XXX
   max block: XXX
diff --git a/tests/qemu-iotests/tests/nbd-qemu-allocation.out b/tests/qemu-iotests/tests/nbd-qemu-allocation.out
index 1db16926ed4..56b57c69ed8 100644
--- a/tests/qemu-iotests/tests/nbd-qemu-allocation.out
+++ b/tests/qemu-iotests/tests/nbd-qemu-allocation.out
@@ -17,7 +17,7 @@ wrote 2097152/2097152 bytes at offset 1048576
 exports available: 1
  export: ''
   size:  4194304
-  flags: 0x48f ( readonly flush fua df cache )
+  flags: 0x148f ( readonly flush fua df cache block-status-payload )
   min block: 1
   opt block: 4096
   max block: 33554432
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 01/12] nbd/server: Support a request payload
  2023-09-25 19:22 ` [PATCH v7 01/12] nbd/server: Support a request payload Eric Blake
@ 2023-09-27  8:55   ` Vladimir Sementsov-Ogievskiy
  2023-09-27 15:59     ` Eric Blake
  2023-09-30 13:34   ` Vladimir Sementsov-Ogievskiy
  2023-10-05 15:38   ` [Libguestfs] " Eric Blake
  2 siblings, 1 reply; 30+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2023-09-27  8:55 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: libguestfs, qemu-block

On 25.09.23 22:22, Eric Blake wrote:
> Upcoming additions to support NBD 64-bit effect lengths allow for the
> possibility to distinguish between payload length (capped at 32M) and
> effect length (64 bits, although we generally assume 63 bits because
> of off_t limitations).  Without that extension, only the NBD_CMD_WRITE
> request has a payload; but with the extension, it makes sense to allow
> at least NBD_CMD_BLOCK_STATUS to have both a payload and effect length
> in a future patch (where the payload is a limited-size struct that in
> turn gives the real effect length as well as a subset of known ids for
> which status is requested).  Other future NBD commands may also have a
> request payload, so the 64-bit extension introduces a new
> NBD_CMD_FLAG_PAYLOAD_LEN that distinguishes between whether the header
> length is a payload length or an effect length, rather than
> hard-coding the decision based on the command.
> 
> According to the spec, a client should never send a command with a
> payload without the negotiation phase proving such extension is
> available.  So in the unlikely event the bit is set or cleared
> incorrectly, the client is already at fault; if the client then
> provides the payload, we can gracefully consume it off the wire and
> fail the command with NBD_EINVAL (subsequent checks for magic numbers
> ensure we are still in sync), while if the client fails to send
> payload we block waiting for it (basically deadlocking our connection
> to the bad client, but not negatively impacting our ability to service
> other clients, so not a security risk).  Note that we do not support
> the payload version of BLOCK_STATUS yet.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
> 
> v7: another try at better logic [Vladimir]
> 
> v5: retitled from v4 13/24, rewrite on top of previous patch's switch
> statement [Vladimir]
> 
> v4: less indentation on several 'if's [Vladimir]
> ---
>   nbd/server.c     | 37 +++++++++++++++++++++++++++++++++----
>   nbd/trace-events |  1 +
>   2 files changed, 34 insertions(+), 4 deletions(-)
> 
> diff --git a/nbd/server.c b/nbd/server.c
> index 7a6f95071f8..1eabcfc908d 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -2322,9 +2322,11 @@ static int coroutine_fn nbd_co_receive_request(NBDRequestData *req,
>                                                  Error **errp)
>   {
>       NBDClient *client = req->client;
> +    bool extended_with_payload;
>       bool check_length = false;
>       bool check_rofs = false;
>       bool allocate_buffer = false;
> +    bool payload_okay = false;
>       unsigned payload_len = 0;
>       int valid_flags = NBD_CMD_FLAG_FUA;
>       int ret;
> @@ -2338,6 +2340,13 @@ static int coroutine_fn nbd_co_receive_request(NBDRequestData *req,
> 
>       trace_nbd_co_receive_request_decode_type(request->cookie, request->type,
>                                                nbd_cmd_lookup(request->type));
> +    extended_with_payload = client->mode >= NBD_MODE_EXTENDED &&
> +        request->flags & NBD_CMD_FLAG_PAYLOAD_LEN;
> +    if (extended_with_payload) {
> +        payload_len = request->len;
> +        check_length = true;
> +    }
> +
>       switch (request->type) {
>       case NBD_CMD_DISC:
>           /* Special case: we're going to disconnect without a reply,
> @@ -2354,6 +2363,15 @@ static int coroutine_fn nbd_co_receive_request(NBDRequestData *req,
>           break;
> 
>       case NBD_CMD_WRITE:
> +        if (client->mode >= NBD_MODE_EXTENDED) {
> +            if (!extended_with_payload) {
> +                /* The client is noncompliant. Trace it, but proceed. */
> +                trace_nbd_co_receive_ext_payload_compliance(request->from,
> +                                                            request->len);
> +            }
> +            valid_flags |= NBD_CMD_FLAG_PAYLOAD_LEN;
> +        }
> +        payload_okay = true;
>           payload_len = request->len;
>           check_length = true;
>           allocate_buffer = true;
> @@ -2395,6 +2413,14 @@ static int coroutine_fn nbd_co_receive_request(NBDRequestData *req,
>                      request->len, NBD_MAX_BUFFER_SIZE);
>           return -EINVAL;
>       }
> +    if (payload_len && !payload_okay) {
> +        /*
> +         * For now, we don't support payloads on other commands; but
> +         * we can keep the connection alive by ignoring the payload.
> +         */
> +        assert(request->type != NBD_CMD_WRITE);
> +        request->len = 0;

So, actually we handle a syntactic request with len=0 and return success... I'm afraid, that in the most scenarios that would not be what client want, but client will be confused by success return.

So, for example, if client pass READ with positive length and accidentlly set NBD_CMD_FLAG_PAYLOAD_LEN bit, it will get successful result with wrong length=0.
Or, for WRITE_ZEROES (with accidental NBD_CMD_FLAG_PAYLOAD_LEN) it will just get successful result, when actually nothing is done.

> +    }
>       if (allocate_buffer) {
>           /* READ, WRITE */
>           req->data = blk_try_blockalign(client->exp->common.blk,
> @@ -2405,10 +2431,13 @@ static int coroutine_fn nbd_co_receive_request(NBDRequestData *req,
>           }
>       }
>       if (payload_len) {
> -        /* WRITE */
> -        assert(req->data);
> -        ret = nbd_read(client->ioc, req->data, payload_len,
> -                       "CMD_WRITE data", errp);
> +        if (payload_okay) {
> +            assert(req->data);
> +            ret = nbd_read(client->ioc, req->data, payload_len,
> +                           "CMD_WRITE data", errp);
> +        } else {
> +            ret = nbd_drop(client->ioc, payload_len, errp);
> +        }
>           if (ret < 0) {
>               return -EIO;
>           }
> diff --git a/nbd/trace-events b/nbd/trace-events
> index f9dccfcfb44..c1a3227613f 100644
> --- a/nbd/trace-events
> +++ b/nbd/trace-events
> @@ -71,6 +71,7 @@ nbd_co_send_extents(uint64_t cookie, unsigned int extents, uint32_t id, uint64_t
>   nbd_co_send_chunk_error(uint64_t cookie, int err, const char *errname, const char *msg) "Send structured error reply: cookie = %" PRIu64 ", error = %d (%s), msg = '%s'"
>   nbd_co_receive_request_decode_type(uint64_t cookie, uint16_t type, const char *name) "Decoding type: cookie = %" PRIu64 ", type = %" PRIu16 " (%s)"
>   nbd_co_receive_request_payload_received(uint64_t cookie, uint64_t len) "Payload received: cookie = %" PRIu64 ", len = %" PRIu64
> +nbd_co_receive_ext_payload_compliance(uint64_t from, uint64_t len) "client sent non-compliant write without payload flag: from=0x%" PRIx64 ", len=0x%" PRIx64
>   nbd_co_receive_align_compliance(const char *op, uint64_t from, uint64_t len, uint32_t align) "client sent non-compliant unaligned %s request: from=0x%" PRIx64 ", len=0x%" PRIx64 ", align=0x%" PRIx32
>   nbd_trip(void) "Reading request"
> 

-- 
Best regards,
Vladimir



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 06/12] nbd/client: Plumb errp through nbd_receive_replies
  2023-09-25 19:22 ` [PATCH v7 06/12] nbd/client: Plumb errp through nbd_receive_replies Eric Blake
@ 2023-09-27 11:29   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 30+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2023-09-27 11:29 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: libguestfs, qemu-block, Kevin Wolf, Hanna Reitz

On 25.09.23 22:22, Eric Blake wrote:
> Instead of ignoring the low-level error just to refabricate our own
> message to pass to the caller, we can just plumb the caller's errp
> down to the low level.
> 
> Signed-off-by: Eric Blake<eblake@redhat.com>

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>

-- 
Best regards,
Vladimir



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 01/12] nbd/server: Support a request payload
  2023-09-27  8:55   ` Vladimir Sementsov-Ogievskiy
@ 2023-09-27 15:59     ` Eric Blake
  2023-09-28  9:09       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 30+ messages in thread
From: Eric Blake @ 2023-09-27 15:59 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy; +Cc: qemu-devel, libguestfs, qemu-block

On Wed, Sep 27, 2023 at 11:55:41AM +0300, Vladimir Sementsov-Ogievskiy wrote:
> On 25.09.23 22:22, Eric Blake wrote:
> > Upcoming additions to support NBD 64-bit effect lengths allow for the
> > possibility to distinguish between payload length (capped at 32M) and
> > effect length (64 bits, although we generally assume 63 bits because
> > of off_t limitations).  Without that extension, only the NBD_CMD_WRITE
> > request has a payload; but with the extension, it makes sense to allow
> > at least NBD_CMD_BLOCK_STATUS to have both a payload and effect length
> > in a future patch (where the payload is a limited-size struct that in
> > turn gives the real effect length as well as a subset of known ids for
> > which status is requested).  Other future NBD commands may also have a
> > request payload, so the 64-bit extension introduces a new
> > NBD_CMD_FLAG_PAYLOAD_LEN that distinguishes between whether the header
> > length is a payload length or an effect length, rather than
> > hard-coding the decision based on the command.
> > 
> > According to the spec, a client should never send a command with a
> > payload without the negotiation phase proving such extension is
> > available.  So in the unlikely event the bit is set or cleared
> > incorrectly, the client is already at fault; if the client then
> > provides the payload, we can gracefully consume it off the wire and
> > fail the command with NBD_EINVAL (subsequent checks for magic numbers
> > ensure we are still in sync), while if the client fails to send
> > payload we block waiting for it (basically deadlocking our connection
> > to the bad client, but not negatively impacting our ability to service
> > other clients, so not a security risk).  Note that we do not support
> > the payload version of BLOCK_STATUS yet.
> > 
> > Signed-off-by: Eric Blake <eblake@redhat.com>
> > ---
> > 
> > v7: another try at better logic [Vladimir]
> > 
> > v5: retitled from v4 13/24, rewrite on top of previous patch's switch
> > statement [Vladimir]
> > 
> > v4: less indentation on several 'if's [Vladimir]
> > ---
> >   nbd/server.c     | 37 +++++++++++++++++++++++++++++++++----
> >   nbd/trace-events |  1 +
> >   2 files changed, 34 insertions(+), 4 deletions(-)
> > 
> > diff --git a/nbd/server.c b/nbd/server.c
> > index 7a6f95071f8..1eabcfc908d 100644
> > --- a/nbd/server.c
> > +++ b/nbd/server.c
> > @@ -2322,9 +2322,11 @@ static int coroutine_fn nbd_co_receive_request(NBDRequestData *req,
> >                                                  Error **errp)
> >   {
> >       NBDClient *client = req->client;
> > +    bool extended_with_payload;
> >       bool check_length = false;
> >       bool check_rofs = false;
> >       bool allocate_buffer = false;
> > +    bool payload_okay = false;
> >       unsigned payload_len = 0;
> >       int valid_flags = NBD_CMD_FLAG_FUA;
> >       int ret;
> > @@ -2338,6 +2340,13 @@ static int coroutine_fn nbd_co_receive_request(NBDRequestData *req,
> > 
> >       trace_nbd_co_receive_request_decode_type(request->cookie, request->type,
> >                                                nbd_cmd_lookup(request->type));
> > +    extended_with_payload = client->mode >= NBD_MODE_EXTENDED &&
> > +        request->flags & NBD_CMD_FLAG_PAYLOAD_LEN;
> > +    if (extended_with_payload) {
> > +        payload_len = request->len;
> > +        check_length = true;
> > +    }
> > +
> >       switch (request->type) {
> >       case NBD_CMD_DISC:
> >           /* Special case: we're going to disconnect without a reply,
> > @@ -2354,6 +2363,15 @@ static int coroutine_fn nbd_co_receive_request(NBDRequestData *req,
> >           break;
> > 
> >       case NBD_CMD_WRITE:
> > +        if (client->mode >= NBD_MODE_EXTENDED) {
> > +            if (!extended_with_payload) {
> > +                /* The client is noncompliant. Trace it, but proceed. */
> > +                trace_nbd_co_receive_ext_payload_compliance(request->from,
> > +                                                            request->len);
> > +            }
> > +            valid_flags |= NBD_CMD_FLAG_PAYLOAD_LEN;
> > +        }
> > +        payload_okay = true;
> >           payload_len = request->len;
> >           check_length = true;
> >           allocate_buffer = true;
> > @@ -2395,6 +2413,14 @@ static int coroutine_fn nbd_co_receive_request(NBDRequestData *req,
> >                      request->len, NBD_MAX_BUFFER_SIZE);
> >           return -EINVAL;
> >       }
> > +    if (payload_len && !payload_okay) {
> > +        /*
> > +         * For now, we don't support payloads on other commands; but
> > +         * we can keep the connection alive by ignoring the payload.
> > +         */
> > +        assert(request->type != NBD_CMD_WRITE);
> > +        request->len = 0;
> 
> So, actually we handle a syntactic request with len=0 and return success... I'm afraid, that in the most scenarios that would not be what client want, but client will be confused by success return.

If request->len == 0, then payload_len is also 0 and we do not enter
this block, even when !payload_okay.  A 0-byte NBD_CMD_READ is already
unspecified by the NBD spec (although I think we do permit it as a
no-op where we return success).  But that is not the whole picture -
even if the client passes a payload length 0f 0, it is important to
remember that NBD_CMD_READ did not add NBD_CMD_FLAG_PAYLOAD_LEN to
valid_flags, so later in this function, whether or not we have payload
bytes to slurp off, we will still fail the command with EINVAL for
passing an unexpected flag when doing the comparison against
valid_flags.

But I agree that adding a comment to that effect would help.

> 
> So, for example, if client pass READ with positive length and accidentlly set NBD_CMD_FLAG_PAYLOAD_LEN bit, it will get successful result with wrong length=0.

In that particular case, you argue the client did NOT pass in payload
bytes; so our attempt to read payload bytes will block until the
client issues enough subsequent commands to actually satisfy the
payload_len (or we see early EOF when the client hangs up after
noticing we have not responded to commands in a while).  We are
effectively ignoring all such subsequent commands (having treated them
as NBD_CMD_READ payload instead of independent commands), and if we
finally do get enough bytes to satisfy the full payload, it is highly
likely that we will not land on a clean command boundary and
disconnect when we see a magic number mismatch.  Less likely is the
payload_len ending exactly on a command boundary; but hopefully our
return of NBD_EINVAL will explain to the client why we have ignored
its subsequent commands.  At any rate, the bug is in the client for
sending an unsupported flag; anything we do beyond that point is best
effort, and the most important part is that if we attempted to keep
the connection alive, we are at least not letting the client cause us
to do unintended actions.

> Or, for WRITE_ZEROES (with accidental NBD_CMD_FLAG_PAYLOAD_LEN) it will just get successful result, when actually nothing is done.

Again, once we consume payload_len bytes (if we did not outright close
the connection because the count was larger than 32M and we were
unwilling to wade through that many payload bytes, thanks to
check_length), we are most likely to hit a magic number mismatch after
having ignored all intermediate commands that came in; but the
NBD_COMMAND_WRITE_ZEROES will receive an NBD_EINVAL failure, not
success, for not satisfying valid_flags.

Which do we think is more likely: a client accidentally setting the
flag but not passing payload, or a client purposefully setting the
flag and including payload?  If the former, then it may be nicer to
just disconnect any time we see the flag set unexpectedly; trying to
keep the connection alive will ignore subsequent client commands, and
likely (but not always) eventually result in a magic number mismatch
once payload_len bytes are ignored.  If the latter, then skipping
payload bytes keeps us in sync, so that we can read the next command.

> 
> > +    }
> >       if (allocate_buffer) {
> >           /* READ, WRITE */
> >           req->data = blk_try_blockalign(client->exp->common.blk,
> > @@ -2405,10 +2431,13 @@ static int coroutine_fn nbd_co_receive_request(NBDRequestData *req,
> >           }
> >       }
> >       if (payload_len) {
> > -        /* WRITE */
> > -        assert(req->data);
> > -        ret = nbd_read(client->ioc, req->data, payload_len,
> > -                       "CMD_WRITE data", errp);
> > +        if (payload_okay) {
> > +            assert(req->data);
> > +            ret = nbd_read(client->ioc, req->data, payload_len,
> > +                           "CMD_WRITE data", errp);
> > +        } else {
> > +            ret = nbd_drop(client->ioc, payload_len, errp);

We could also try to be a bit more complicated by peeking at the next
few bytes: if they look like a magic number of the next request,
assume the client set the bit accidentally but didn't send a payload
after all; for anything else, assume the client did pass a payload.
But adding in machinery to peek at a prefix is more complex than
either assuming a payload is always present (as done in this patch) or
assuming the bit was in error (and dropping the connection
unconditionally).  Preferences?

At any rate, deciding what to do when dealing with a non-compliant
client should NOT hold up reviews of the rest of the series, which
only worry about compliant clients that never trigger this issue.
Worded differently, the important aspect of the review of this patch
is whether we see any scenario where a non-compliant client could
cause the server to misbehave, at which point declaring that we would
prefer to disconnect immediately rather than attempt to resync may be
safer, even though it is more abrupt.  But if we are satisfied that
the ill-behaved client cannot crash the server, trying to keep the
connection alive (at least, until we later hit a magic number
mismatch) is probably nicer.

> > +        }
> >           if (ret < 0) {
> >               return -EIO;
> >           }

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 01/12] nbd/server: Support a request payload
  2023-09-27 15:59     ` Eric Blake
@ 2023-09-28  9:09       ` Vladimir Sementsov-Ogievskiy
  2023-09-28 14:33         ` Eric Blake
  0 siblings, 1 reply; 30+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2023-09-28  9:09 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, libguestfs, qemu-block

On 27.09.23 18:59, Eric Blake wrote:
> We could also try to be a bit more complicated by peeking at the next
> few bytes: if they look like a magic number of the next request,
> assume the client set the bit accidentally but didn't send a payload
> after all; for anything else, assume the client did pass a payload.
> But adding in machinery to peek at a prefix is more complex than
> either assuming a payload is always present (as done in this patch) or
> assuming the bit was in error (and dropping the connection
> unconditionally).  Preferences?

Ohh, you are right, thanks for comprehensive explanation. I really missed some things you are saying about. Yes, now I agree that "payload always exist when flag is set" is the best effort. Finally, that was our aim of the protocol design: make it more context independent. Probably, we may fix that in specification as preferable or at least possible server behavior about non-compliant client.

r-b coming soon, I just need to take another look with corrected picture in mind.

-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 01/12] nbd/server: Support a request payload
  2023-09-28  9:09       ` Vladimir Sementsov-Ogievskiy
@ 2023-09-28 14:33         ` Eric Blake
  2023-09-28 20:52           ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 30+ messages in thread
From: Eric Blake @ 2023-09-28 14:33 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy; +Cc: qemu-devel, libguestfs, qemu-block

On Thu, Sep 28, 2023 at 12:09:51PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> On 27.09.23 18:59, Eric Blake wrote:
> > We could also try to be a bit more complicated by peeking at the next
> > few bytes: if they look like a magic number of the next request,
> > assume the client set the bit accidentally but didn't send a payload
> > after all; for anything else, assume the client did pass a payload.
> > But adding in machinery to peek at a prefix is more complex than
> > either assuming a payload is always present (as done in this patch) or
> > assuming the bit was in error (and dropping the connection
> > unconditionally).  Preferences?
> 
> 
> Ohh, you are right, thanks for comprehensive explanation. I really missed some things you are saying about. Yes, now I agree that "payload always exist when flag is set" is the best effort. Finally, that was our aim of the protocol design: make it more context independent. Probably, we may fix that in specification as preferable or at least possible server behavior about non-compliant client.

One other possibility I just thought of: have a heuristic where the
flag set with h->request_length less than 512 bytes is likely to
indicate an intentional payload (even if for a command where we
weren't expecting payload, so still a client error); while the flag
set wtih h->request_length >= 512 bytes is likely to be a mistaken
setting of the flag (but also still a client error).  NBD_CMD_WRITE is
probably the only command that will ever need to send a payload larger
than one sector, but that command already has handling to accept
payloads of all sizes because we know what to do with them and where
the client is not in error.

> 
> r-b coming soon, I just need to take another look with corrected picture in mind.
> 
> -- 
> Best regards,
> Vladimir
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 01/12] nbd/server: Support a request payload
  2023-09-28 14:33         ` Eric Blake
@ 2023-09-28 20:52           ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 30+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2023-09-28 20:52 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, libguestfs, qemu-block

On 28.09.23 17:33, Eric Blake wrote:
> On Thu, Sep 28, 2023 at 12:09:51PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>> On 27.09.23 18:59, Eric Blake wrote:
>>> We could also try to be a bit more complicated by peeking at the next
>>> few bytes: if they look like a magic number of the next request,
>>> assume the client set the bit accidentally but didn't send a payload
>>> after all; for anything else, assume the client did pass a payload.
>>> But adding in machinery to peek at a prefix is more complex than
>>> either assuming a payload is always present (as done in this patch) or
>>> assuming the bit was in error (and dropping the connection
>>> unconditionally).  Preferences?
>>
>>
>> Ohh, you are right, thanks for comprehensive explanation. I really missed some things you are saying about. Yes, now I agree that "payload always exist when flag is set" is the best effort. Finally, that was our aim of the protocol design: make it more context independent. Probably, we may fix that in specification as preferable or at least possible server behavior about non-compliant client.
> 
> One other possibility I just thought of: have a heuristic where the
> flag set with h->request_length less than 512 bytes is likely to
> indicate an intentional payload (even if for a command where we
> weren't expecting payload, so still a client error); while the flag
> set wtih h->request_length >= 512 bytes is likely to be a mistaken
> setting of the flag (but also still a client error).  NBD_CMD_WRITE is
> probably the only command that will ever need to send a payload larger
> than one sector, but that command already has handling to accept
> payloads of all sizes because we know what to do with them and where
> the client is not in error.
> 

I'd prefer to avoid extra logic for optimizing handling of bad client, better keep code simpler.


-- 
Best regards,
Vladimir



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 07/12] nbd/client: Initial support for extended headers
  2023-09-25 19:22 ` [PATCH v7 07/12] nbd/client: Initial support for extended headers Eric Blake
@ 2023-09-29  9:24   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 30+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2023-09-29  9:24 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: libguestfs, qemu-block, Kevin Wolf, Hanna Reitz

On 25.09.23 22:22, Eric Blake wrote:
>   nbd_receive_simple_reply(int32_t error, const char *errname, uint64_t cookie) "Got simple reply: { .error = %" PRId32 " (%s), cookie = %" PRIu64" }"
> -nbd_receive_structured_reply_chunk(uint16_t flags, uint16_t type, const char *name, uint64_t cookie, uint32_t length) "Got structured reply chunk: { flags = 0x%" PRIx16 ", type = %d (%s), cookie = %" PRIu64 ", length = %" PRIu32 " }"
> +nbd_receive_reply_chunk_header(uint16_t flags, uint16_t type, const char *name, uint64_t cookie, uint32_t length) "Got reply chunk header: { flags = 0x%" PRIx16 ", type = %d (%s), cookie = %" PRIu64 ", length = %" PRIu32 " }"

preexisting, but better would be s/%d/%" PRIu16 "/

> +nbd_receive_wrong_header(uint32_t magic, const char *mode) "Server sent unexpected magic 0x%" PRIx32 " for negotiated mode %s"


Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>

-- 
Best regards,
Vladimir



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 11/12] nbd/server: Prepare for per-request filtering of BLOCK_STATUS
  2023-09-25 19:22 ` [PATCH v7 11/12] nbd/server: Prepare for per-request filtering of BLOCK_STATUS Eric Blake
@ 2023-09-29 11:54   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 30+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2023-09-29 11:54 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: libguestfs, qemu-block, Kevin Wolf, Hanna Reitz

On 25.09.23 22:22, Eric Blake wrote:
> The next commit will add support for the optional extension
> NBD_CMD_FLAG_PAYLOAD during NBD_CMD_BLOCK_STATUS, where the client can
> request that the server only return a subset of negotiated contexts,
> rather than all contexts.  To make that task easier, this patch
> populates the list of contexts to return on a per-command basis (for
> now, identical to the full set of negotiated contexts).
> 
> Signed-off-by: Eric Blake<eblake@redhat.com>

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>

-- 
Best regards,
Vladimir



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 12/12] nbd/server: Add FLAG_PAYLOAD support to CMD_BLOCK_STATUS
  2023-09-25 19:22 ` [PATCH v7 12/12] nbd/server: Add FLAG_PAYLOAD support to CMD_BLOCK_STATUS Eric Blake
@ 2023-09-30 13:24   ` Vladimir Sementsov-Ogievskiy
  2023-10-04 21:55     ` Eric Blake
  2023-10-05 14:33   ` Vladimir Sementsov-Ogievskiy
  1 sibling, 1 reply; 30+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2023-09-30 13:24 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: libguestfs, qemu-block, Kevin Wolf, Hanna Reitz

On 25.09.23 22:22, Eric Blake wrote:
> Allow a client to request a subset of negotiated meta contexts.  For
> example, a client may ask to use a single connection to learn about
> both block status and dirty bitmaps, but where the dirty bitmap
> queries only need to be performed on a subset of the disk; forcing the
> server to compute that information on block status queries in the rest
> of the disk is wasted effort (both at the server, and on the amount of
> traffic sent over the wire to be parsed and ignored by the client).
> 
> Qemu as an NBD client never requests to use more than one meta
> context, so it has no need to use block status payloads.  Testing this
> instead requires support from libnbd, which CAN access multiple meta
> contexts in parallel from a single NBD connection; an interop test
> submitted to the libnbd project at the same time as this patch
> demonstrates the feature working, as well as testing some corner cases
> (for example, when the payload length is longer than the export
> length), although other corner cases (like passing the same id
> duplicated) requires a protocol fuzzer because libnbd is not wired up
> to break the protocol that badly.
> 
> This also includes tweaks to 'qemu-nbd --list' to show when a server
> is advertising the capability, and to the testsuite to reflect the
> addition to that output.
> 
> Of note: qemu will always advertise the new feature bit during
> NBD_OPT_INFO if extended headers have alreay been negotiated
> (regardless of whether any NBD_OPT_SET_META_CONTEXT negotiation has
> occurred); but for NBD_OPT_GO, qemu only advertises the feature if
> block status is also enabled (that is, if the client does not
> negotiate any contexts, then NBD_CMD_BLOCK_STATUS cannot be used, so
> the feature is not advertised).
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
> 

[..]

> 
> +/*
> + * nbd_co_block_status_payload_read
> + * Called when a client wants a subset of negotiated contexts via a
> + * BLOCK_STATUS payload.  Check the payload for valid length and
> + * contents.  On success, return 0 with request updated to effective
> + * length.  If request was invalid but all payload consumed, return 0
> + * with request->len and request->contexts->count set to 0 (which will
> + * trigger an appropriate NBD_EINVAL response later on).  Return
> + * negative errno if the payload was not fully consumed.
> + */
> +static int
> +nbd_co_block_status_payload_read(NBDClient *client, NBDRequest *request,
> +                                 Error **errp)

[..]

> +        payload_len > (sizeof(NBDBlockStatusPayload) +
> +                       sizeof(id) * client->contexts.count)) {
> +        goto skip;
> +    }
> +
> +    buf = g_malloc(payload_len);
> +    if (nbd_read(client->ioc, buf, payload_len,
> +                 "CMD_BLOCK_STATUS data", errp) < 0) {
> +        return -EIO;
> +    }
> +    trace_nbd_co_receive_request_payload_received(request->cookie,
> +                                                  payload_len);
> +    request->contexts->bitmaps = g_new0(bool, nr_bitmaps);
> +    count = (payload_len - sizeof(NBDBlockStatusPayload)) / sizeof(id);
> +    payload_len = 0;
> +
> +    for (i = 0; i < count; i++) {
> +        id = ldl_be_p(buf + sizeof(NBDBlockStatusPayload) + sizeof(id) * i);
> +        if (id == NBD_META_ID_BASE_ALLOCATION) {
> +            if (request->contexts->base_allocation) {
> +                goto skip;
> +            }

should we also check that base_allocation is negotiated?

> +            request->contexts->base_allocation = true;
> +        } else if (id == NBD_META_ID_ALLOCATION_DEPTH) {
> +            if (request->contexts->allocation_depth) {
> +                goto skip;
> +            }

same here

> +            request->contexts->allocation_depth = true;
> +        } else {
> +            int idx = id - NBD_META_ID_DIRTY_BITMAP;
> +

I think, we also should check that idx >=0 after this operation.

> +            if (idx > nr_bitmaps || request->contexts->bitmaps[idx]) {
> +                goto skip;
> +            }
> +            request->contexts->bitmaps[idx] = true;
> +        }
> +    }
> +
> +    request->len = ldq_be_p(buf);
> +    request->contexts->count = count;
> +    return 0;
> +
> + skip:
> +    trace_nbd_co_receive_block_status_payload_compliance(request->from,
> +                                                         request->len);
> +    request->len = request->contexts->count = 0;
> +    return nbd_drop(client->ioc, payload_len, errp);
> +}
> +

[..]

> diff --git a/nbd/trace-events b/nbd/trace-events
> index 8f4e20ee9f2..ac186c19ec0 100644
> --- a/nbd/trace-events
> +++ b/nbd/trace-events
> @@ -70,6 +70,7 @@ nbd_co_send_chunk_read(uint64_t cookie, uint64_t offset, void *data, uint64_t si
>   nbd_co_send_chunk_read_hole(uint64_t cookie, uint64_t offset, uint64_t size) "Send structured read hole reply: cookie = %" PRIu64 ", offset = %" PRIu64 ", len = %" PRIu64
>   nbd_co_send_extents(uint64_t cookie, unsigned int extents, uint32_t id, uint64_t length, int last) "Send block status reply: cookie = %" PRIu64 ", extents = %u, context = %d (extents cover %" PRIu64 " bytes, last chunk = %d)"
>   nbd_co_send_chunk_error(uint64_t cookie, int err, const char *errname, const char *msg) "Send structured error reply: cookie = %" PRIu64 ", error = %d (%s), msg = '%s'"
> +nbd_co_receive_block_status_payload_compliance(uint64_t from, int len) "client sent unusable block status payload: from=0x%" PRIx64 ", len=0x%x"

both passed parameters request->from and request->len are uint64_t actually

>   nbd_co_receive_request_decode_type(uint64_t cookie, uint16_t type, const char *name) "Decoding type: cookie = %" PRIu64 ", type = %" PRIu16 " (%s)"
>   nbd_co_receive_request_payload_received(uint64_t cookie, uint64_t len) "Payload received: cookie = %" PRIu64 ", len = %" PRIu64
>   nbd_co_receive_ext_payload_compliance(uint64_t from, uint64_t len) "client sent non-compliant write without payload flag: from=0x%" PRIx64 ", len=0x%" PRIx64

[..]

-- 
Best regards,
Vladimir



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 01/12] nbd/server: Support a request payload
  2023-09-25 19:22 ` [PATCH v7 01/12] nbd/server: Support a request payload Eric Blake
  2023-09-27  8:55   ` Vladimir Sementsov-Ogievskiy
@ 2023-09-30 13:34   ` Vladimir Sementsov-Ogievskiy
  2023-10-05 15:38   ` [Libguestfs] " Eric Blake
  2 siblings, 0 replies; 30+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2023-09-30 13:34 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: libguestfs, qemu-block

On 25.09.23 22:22, Eric Blake wrote:
> Upcoming additions to support NBD 64-bit effect lengths allow for the
> possibility to distinguish between payload length (capped at 32M) and
> effect length (64 bits, although we generally assume 63 bits because
> of off_t limitations).  Without that extension, only the NBD_CMD_WRITE
> request has a payload; but with the extension, it makes sense to allow
> at least NBD_CMD_BLOCK_STATUS to have both a payload and effect length
> in a future patch (where the payload is a limited-size struct that in
> turn gives the real effect length as well as a subset of known ids for
> which status is requested).  Other future NBD commands may also have a
> request payload, so the 64-bit extension introduces a new
> NBD_CMD_FLAG_PAYLOAD_LEN that distinguishes between whether the header
> length is a payload length or an effect length, rather than
> hard-coding the decision based on the command.
> 
> According to the spec, a client should never send a command with a
> payload without the negotiation phase proving such extension is
> available.  So in the unlikely event the bit is set or cleared
> incorrectly, the client is already at fault; if the client then
> provides the payload, we can gracefully consume it off the wire and
> fail the command with NBD_EINVAL (subsequent checks for magic numbers
> ensure we are still in sync), while if the client fails to send
> payload we block waiting for it (basically deadlocking our connection
> to the bad client, but not negatively impacting our ability to service
> other clients, so not a security risk).  Note that we do not support
> the payload version of BLOCK_STATUS yet.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
> 
> v7: another try at better logic [Vladimir]
> 
> v5: retitled from v4 13/24, rewrite on top of previous patch's switch
> statement [Vladimir]
> 
> v4: less indentation on several 'if's [Vladimir]
> ---
>   nbd/server.c     | 37 +++++++++++++++++++++++++++++++++----
>   nbd/trace-events |  1 +
>   2 files changed, 34 insertions(+), 4 deletions(-)
> 
> diff --git a/nbd/server.c b/nbd/server.c
> index 7a6f95071f8..1eabcfc908d 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -2322,9 +2322,11 @@ static int coroutine_fn nbd_co_receive_request(NBDRequestData *req,
>                                                  Error **errp)
>   {
>       NBDClient *client = req->client;
> +    bool extended_with_payload;
>       bool check_length = false;
>       bool check_rofs = false;
>       bool allocate_buffer = false;
> +    bool payload_okay = false;
>       unsigned payload_len = 0;
>       int valid_flags = NBD_CMD_FLAG_FUA;
>       int ret;
> @@ -2338,6 +2340,13 @@ static int coroutine_fn nbd_co_receive_request(NBDRequestData *req,
> 
>       trace_nbd_co_receive_request_decode_type(request->cookie, request->type,
>                                                nbd_cmd_lookup(request->type));
> +    extended_with_payload = client->mode >= NBD_MODE_EXTENDED &&
> +        request->flags & NBD_CMD_FLAG_PAYLOAD_LEN;
> +    if (extended_with_payload) {
> +        payload_len = request->len;
> +        check_length = true;
> +    }
> +
>       switch (request->type) {
>       case NBD_CMD_DISC:
>           /* Special case: we're going to disconnect without a reply,
> @@ -2354,6 +2363,15 @@ static int coroutine_fn nbd_co_receive_request(NBDRequestData *req,
>           break;
> 
>       case NBD_CMD_WRITE:
> +        if (client->mode >= NBD_MODE_EXTENDED) {
> +            if (!extended_with_payload) {
> +                /* The client is noncompliant. Trace it, but proceed. */
> +                trace_nbd_co_receive_ext_payload_compliance(request->from,
> +                                                            request->len);
> +            }
> +            valid_flags |= NBD_CMD_FLAG_PAYLOAD_LEN;
> +        }
> +        payload_okay = true;
>           payload_len = request->len;
>           check_length = true;
>           allocate_buffer = true;
> @@ -2395,6 +2413,14 @@ static int coroutine_fn nbd_co_receive_request(NBDRequestData *req,
>                      request->len, NBD_MAX_BUFFER_SIZE);
>           return -EINVAL;
>       }
> +    if (payload_len && !payload_okay) {
> +        /*
> +         * For now, we don't support payloads on other commands; but
> +         * we can keep the connection alive by ignoring the payload.
> +         */
> +        assert(request->type != NBD_CMD_WRITE);
> +        request->len = 0;

So, for sure, after this we go to

if (requests->flags & ~valid_flags)... and return -EINVAL.

Why we need to set request->len to 0? Just to not return "operation past EOF" instead of "unsupported flags", if len is too big? Maybe, that worth a comment.

Anyway, I now see nothing wrong in it, so:
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>

> +    }
>       if (allocate_buffer) {
>           /* READ, WRITE */
>           req->data = blk_try_blockalign(client->exp->common.blk,
> @@ -2405,10 +2431,13 @@ static int coroutine_fn nbd_co_receive_request(NBDRequestData *req,
>           }
>       }
>       if (payload_len) {
> -        /* WRITE */
> -        assert(req->data);
> -        ret = nbd_read(client->ioc, req->data, payload_len,
> -                       "CMD_WRITE data", errp);
> +        if (payload_okay) {
> +            assert(req->data);
> +            ret = nbd_read(client->ioc, req->data, payload_len,
> +                           "CMD_WRITE data", errp);
> +        } else {
> +            ret = nbd_drop(client->ioc, payload_len, errp);
> +        }
>           if (ret < 0) {
>               return -EIO;
>           }
> diff --git a/nbd/trace-events b/nbd/trace-events
> index f9dccfcfb44..c1a3227613f 100644
> --- a/nbd/trace-events
> +++ b/nbd/trace-events
> @@ -71,6 +71,7 @@ nbd_co_send_extents(uint64_t cookie, unsigned int extents, uint32_t id, uint64_t
>   nbd_co_send_chunk_error(uint64_t cookie, int err, const char *errname, const char *msg) "Send structured error reply: cookie = %" PRIu64 ", error = %d (%s), msg = '%s'"
>   nbd_co_receive_request_decode_type(uint64_t cookie, uint16_t type, const char *name) "Decoding type: cookie = %" PRIu64 ", type = %" PRIu16 " (%s)"
>   nbd_co_receive_request_payload_received(uint64_t cookie, uint64_t len) "Payload received: cookie = %" PRIu64 ", len = %" PRIu64
> +nbd_co_receive_ext_payload_compliance(uint64_t from, uint64_t len) "client sent non-compliant write without payload flag: from=0x%" PRIx64 ", len=0x%" PRIx64
>   nbd_co_receive_align_compliance(const char *op, uint64_t from, uint64_t len, uint32_t align) "client sent non-compliant unaligned %s request: from=0x%" PRIx64 ", len=0x%" PRIx64 ", align=0x%" PRIx32
>   nbd_trip(void) "Reading request"
> 

-- 
Best regards,
Vladimir



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 12/12] nbd/server: Add FLAG_PAYLOAD support to CMD_BLOCK_STATUS
  2023-09-30 13:24   ` Vladimir Sementsov-Ogievskiy
@ 2023-10-04 21:55     ` Eric Blake
  2023-10-05 13:49       ` [Libguestfs] " Eric Blake
  0 siblings, 1 reply; 30+ messages in thread
From: Eric Blake @ 2023-10-04 21:55 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: qemu-devel, libguestfs, qemu-block, Kevin Wolf, Hanna Reitz

On Sat, Sep 30, 2023 at 04:24:02PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> On 25.09.23 22:22, Eric Blake wrote:
> > Allow a client to request a subset of negotiated meta contexts.  For
> > example, a client may ask to use a single connection to learn about
> > both block status and dirty bitmaps, but where the dirty bitmap
> > queries only need to be performed on a subset of the disk; forcing the
> > server to compute that information on block status queries in the rest
> > of the disk is wasted effort (both at the server, and on the amount of
> > traffic sent over the wire to be parsed and ignored by the client).
> > 
> > Qemu as an NBD client never requests to use more than one meta
> > context, so it has no need to use block status payloads.  Testing this
> > instead requires support from libnbd, which CAN access multiple meta
> > contexts in parallel from a single NBD connection; an interop test
> > submitted to the libnbd project at the same time as this patch
> > demonstrates the feature working, as well as testing some corner cases
> > (for example, when the payload length is longer than the export
> > length), although other corner cases (like passing the same id
> > duplicated) requires a protocol fuzzer because libnbd is not wired up
> > to break the protocol that badly.
> > 
> > This also includes tweaks to 'qemu-nbd --list' to show when a server
> > is advertising the capability, and to the testsuite to reflect the
> > addition to that output.
> > 
> > Of note: qemu will always advertise the new feature bit during
> > NBD_OPT_INFO if extended headers have alreay been negotiated
> > (regardless of whether any NBD_OPT_SET_META_CONTEXT negotiation has
> > occurred); but for NBD_OPT_GO, qemu only advertises the feature if
> > block status is also enabled (that is, if the client does not
> > negotiate any contexts, then NBD_CMD_BLOCK_STATUS cannot be used, so
> > the feature is not advertised).
> > 
> > Signed-off-by: Eric Blake <eblake@redhat.com>
> > ---
> > 
> 
> [..]
> 
> > 
> > +/*
> > + * nbd_co_block_status_payload_read
> > + * Called when a client wants a subset of negotiated contexts via a
> > + * BLOCK_STATUS payload.  Check the payload for valid length and
> > + * contents.  On success, return 0 with request updated to effective
> > + * length.  If request was invalid but all payload consumed, return 0
> > + * with request->len and request->contexts->count set to 0 (which will
> > + * trigger an appropriate NBD_EINVAL response later on).  Return
> > + * negative errno if the payload was not fully consumed.
> > + */
> > +static int
> > +nbd_co_block_status_payload_read(NBDClient *client, NBDRequest *request,
> > +                                 Error **errp)
> 
> [..]
> 
> > +        payload_len > (sizeof(NBDBlockStatusPayload) +
> > +                       sizeof(id) * client->contexts.count)) {
> > +        goto skip;
> > +    }
> > +
> > +    buf = g_malloc(payload_len);
> > +    if (nbd_read(client->ioc, buf, payload_len,
> > +                 "CMD_BLOCK_STATUS data", errp) < 0) {
> > +        return -EIO;
> > +    }
> > +    trace_nbd_co_receive_request_payload_received(request->cookie,
> > +                                                  payload_len);
> > +    request->contexts->bitmaps = g_new0(bool, nr_bitmaps);
> > +    count = (payload_len - sizeof(NBDBlockStatusPayload)) / sizeof(id);
> > +    payload_len = 0;
> > +
> > +    for (i = 0; i < count; i++) {
> > +        id = ldl_be_p(buf + sizeof(NBDBlockStatusPayload) + sizeof(id) * i);
> > +        if (id == NBD_META_ID_BASE_ALLOCATION) {
> > +            if (request->contexts->base_allocation) {
> > +                goto skip;
> > +            }
> 
> should we also check that base_allocation is negotiated?

Oh, good point.  Without that check, the client can pass in random id
numbers that it never negotiated.  I've queued 1-11 and will probably
send a pull request for those this week, while respinning this patch
to fix the remaining issues you pointed out.

> 
> > +            request->contexts->base_allocation = true;
> > +        } else if (id == NBD_META_ID_ALLOCATION_DEPTH) {
> > +            if (request->contexts->allocation_depth) {
> > +                goto skip;
> > +            }
> 
> same here
> 
> > +            request->contexts->allocation_depth = true;
> > +        } else {
> > +            int idx = id - NBD_META_ID_DIRTY_BITMAP;
> > +
> 
> I think, we also should check that idx >=0 after this operation.
> 
> > +            if (idx > nr_bitmaps || request->contexts->bitmaps[idx]) {

Or else make idx an unsigned value, instead of signed.  Also a good catch.

> > +                goto skip;
> > +            }
> > +            request->contexts->bitmaps[idx] = true;
> > +        }
> > +    }
> > +
> > +    request->len = ldq_be_p(buf);
> > +    request->contexts->count = count;
> > +    return 0;
> > +
> > + skip:
> > +    trace_nbd_co_receive_block_status_payload_compliance(request->from,
> > +                                                         request->len);
> > +    request->len = request->contexts->count = 0;
> > +    return nbd_drop(client->ioc, payload_len, errp);
> > +}
> > +
> 
> [..]
> 
> > diff --git a/nbd/trace-events b/nbd/trace-events
> > index 8f4e20ee9f2..ac186c19ec0 100644
> > --- a/nbd/trace-events
> > +++ b/nbd/trace-events
> > @@ -70,6 +70,7 @@ nbd_co_send_chunk_read(uint64_t cookie, uint64_t offset, void *data, uint64_t si
> >   nbd_co_send_chunk_read_hole(uint64_t cookie, uint64_t offset, uint64_t size) "Send structured read hole reply: cookie = %" PRIu64 ", offset = %" PRIu64 ", len = %" PRIu64
> >   nbd_co_send_extents(uint64_t cookie, unsigned int extents, uint32_t id, uint64_t length, int last) "Send block status reply: cookie = %" PRIu64 ", extents = %u, context = %d (extents cover %" PRIu64 " bytes, last chunk = %d)"
> >   nbd_co_send_chunk_error(uint64_t cookie, int err, const char *errname, const char *msg) "Send structured error reply: cookie = %" PRIu64 ", error = %d (%s), msg = '%s'"
> > +nbd_co_receive_block_status_payload_compliance(uint64_t from, int len) "client sent unusable block status payload: from=0x%" PRIx64 ", len=0x%x"
> 
> both passed parameters request->from and request->len are uint64_t actually

Also a good catch.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Libguestfs] [PATCH v7 12/12] nbd/server: Add FLAG_PAYLOAD support to CMD_BLOCK_STATUS
  2023-10-04 21:55     ` Eric Blake
@ 2023-10-05 13:49       ` Eric Blake
  2023-10-05 14:26         ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 30+ messages in thread
From: Eric Blake @ 2023-10-05 13:49 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: Kevin Wolf, Hanna Reitz, qemu-block, qemu-devel, libguestfs

On Wed, Oct 04, 2023 at 04:55:02PM -0500, Eric Blake wrote:
> > > +static int
> > > +nbd_co_block_status_payload_read(NBDClient *client, NBDRequest *request,
> > > +                                 Error **errp)
> > 
> > [..]

> > > +    for (i = 0; i < count; i++) {
> > > +        id = ldl_be_p(buf + sizeof(NBDBlockStatusPayload) + sizeof(id) * i);
> > > +        if (id == NBD_META_ID_BASE_ALLOCATION) {
> > > +            if (request->contexts->base_allocation) {
> > > +                goto skip;
> > > +            }
> > 
> > should we also check that base_allocation is negotiated?
> 
> Oh, good point.  Without that check, the client can pass in random id
> numbers that it never negotiated.  I've queued 1-11 and will probably
> send a pull request for those this week, while respinning this patch
> to fix the remaining issues you pointed out.

I'm squashing in the following. If you can review it today, I'll
include it in my pull request this afternoon; if not, we still have
time before soft freeze to get it in the next batch.

diff --git i/nbd/server.c w/nbd/server.c
index 30816b42386..62654579cbc 100644
--- i/nbd/server.c
+++ w/nbd/server.c
@@ -2478,19 +2478,22 @@ nbd_co_block_status_payload_read(NBDClient *client, NBDRequest *request,
     for (i = 0; i < count; i++) {
         id = ldl_be_p(buf + sizeof(NBDBlockStatusPayload) + sizeof(id) * i);
         if (id == NBD_META_ID_BASE_ALLOCATION) {
-            if (request->contexts->base_allocation) {
+            if (!client->contexts.base_allocation ||
+                request->contexts->base_allocation) {
                 goto skip;
             }
             request->contexts->base_allocation = true;
         } else if (id == NBD_META_ID_ALLOCATION_DEPTH) {
-            if (request->contexts->allocation_depth) {
+            if (!client->contexts.allocation_depth ||
+                request->contexts->allocation_depth) {
                 goto skip;
             }
             request->contexts->allocation_depth = true;
         } else {
-            int idx = id - NBD_META_ID_DIRTY_BITMAP;
+            unsigned idx = id - NBD_META_ID_DIRTY_BITMAP;

-            if (idx > nr_bitmaps || request->contexts->bitmaps[idx]) {
+            if (idx > nr_bitmaps || !client->contexts.bitmaps[idx] ||
+                request->contexts->bitmaps[idx]) {
                 goto skip;
             }
             request->contexts->bitmaps[idx] = true;
diff --git i/nbd/trace-events w/nbd/trace-events
index 3cf2d00e458..00ae3216a11 100644
--- i/nbd/trace-events
+++ w/nbd/trace-events
@@ -70,7 +70,7 @@ nbd_co_send_chunk_read(uint64_t cookie, uint64_t offset, void *data, uint64_t si
 nbd_co_send_chunk_read_hole(uint64_t cookie, uint64_t offset, uint64_t size) "Send structured read hole reply: cookie = %" PRIu64 ", offset = %" PRIu64 ", len = %" PRIu64
 nbd_co_send_extents(uint64_t cookie, unsigned int extents, uint32_t id, uint64_t length, int last) "Send block status reply: cookie = %" PRIu64 ", extents = %u, context = %d (extents cover %" PRIu64 " bytes, last chunk = %d)"
 nbd_co_send_chunk_error(uint64_t cookie, int err, const char *errname, const char *msg) "Send structured error reply: cookie = %" PRIu64 ", error = %d (%s), msg = '%s'"
-nbd_co_receive_block_status_payload_compliance(uint64_t from, int len) "client sent unusable block status payload: from=0x%" PRIx64 ", len=0x%x"
+nbd_co_receive_block_status_payload_compliance(uint64_t from, uint64_t len) "client sent unusable block status payload: from=0x%" PRIx64 ", len=0x%" PRIx64
 nbd_co_receive_request_decode_type(uint64_t cookie, uint16_t type, const char *name) "Decoding type: cookie = %" PRIu64 ", type = %" PRIu16 " (%s)"
 nbd_co_receive_request_payload_received(uint64_t cookie, uint64_t len) "Payload received: cookie = %" PRIu64 ", len = %" PRIu64
 nbd_co_receive_ext_payload_compliance(uint64_t from, uint64_t len) "client sent non-compliant write without payload flag: from=0x%" PRIx64 ", len=0x%" PRIx64



-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [Libguestfs] [PATCH v7 12/12] nbd/server: Add FLAG_PAYLOAD support to CMD_BLOCK_STATUS
  2023-10-05 13:49       ` [Libguestfs] " Eric Blake
@ 2023-10-05 14:26         ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 30+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2023-10-05 14:26 UTC (permalink / raw)
  To: Eric Blake; +Cc: Kevin Wolf, Hanna Reitz, qemu-block, qemu-devel, libguestfs

On 05.10.23 16:49, Eric Blake wrote:
> On Wed, Oct 04, 2023 at 04:55:02PM -0500, Eric Blake wrote:
>>>> +static int
>>>> +nbd_co_block_status_payload_read(NBDClient *client, NBDRequest *request,
>>>> +                                 Error **errp)
>>>
>>> [..]
> 
>>>> +    for (i = 0; i < count; i++) {
>>>> +        id = ldl_be_p(buf + sizeof(NBDBlockStatusPayload) + sizeof(id) * i);
>>>> +        if (id == NBD_META_ID_BASE_ALLOCATION) {
>>>> +            if (request->contexts->base_allocation) {
>>>> +                goto skip;
>>>> +            }
>>>
>>> should we also check that base_allocation is negotiated?
>>
>> Oh, good point.  Without that check, the client can pass in random id
>> numbers that it never negotiated.  I've queued 1-11 and will probably
>> send a pull request for those this week, while respinning this patch
>> to fix the remaining issues you pointed out.
> 
> I'm squashing in the following. If you can review it today, I'll
> include it in my pull request this afternoon; if not, we still have
> time before soft freeze to get it in the next batch.
> 
> diff --git i/nbd/server.c w/nbd/server.c
> index 30816b42386..62654579cbc 100644
> --- i/nbd/server.c
> +++ w/nbd/server.c
> @@ -2478,19 +2478,22 @@ nbd_co_block_status_payload_read(NBDClient *client, NBDRequest *request,
>       for (i = 0; i < count; i++) {
>           id = ldl_be_p(buf + sizeof(NBDBlockStatusPayload) + sizeof(id) * i);
>           if (id == NBD_META_ID_BASE_ALLOCATION) {
> -            if (request->contexts->base_allocation) {
> +            if (!client->contexts.base_allocation ||
> +                request->contexts->base_allocation) {
>                   goto skip;
>               }
>               request->contexts->base_allocation = true;
>           } else if (id == NBD_META_ID_ALLOCATION_DEPTH) {
> -            if (request->contexts->allocation_depth) {
> +            if (!client->contexts.allocation_depth ||
> +                request->contexts->allocation_depth) {
>                   goto skip;
>               }
>               request->contexts->allocation_depth = true;
>           } else {
> -            int idx = id - NBD_META_ID_DIRTY_BITMAP;
> +            unsigned idx = id - NBD_META_ID_DIRTY_BITMAP;
> 
> -            if (idx > nr_bitmaps || request->contexts->bitmaps[idx]) {
> +            if (idx > nr_bitmaps || !client->contexts.bitmaps[idx] ||

Oops, I didn't notice: s/>/>=/, as nr_bitmaps is length of array.

> +                request->contexts->bitmaps[idx]) {
>                   goto skip;
>               }
>               request->contexts->bitmaps[idx] = true;
> diff --git i/nbd/trace-events w/nbd/trace-events
> index 3cf2d00e458..00ae3216a11 100644
> --- i/nbd/trace-events
> +++ w/nbd/trace-events
> @@ -70,7 +70,7 @@ nbd_co_send_chunk_read(uint64_t cookie, uint64_t offset, void *data, uint64_t si
>   nbd_co_send_chunk_read_hole(uint64_t cookie, uint64_t offset, uint64_t size) "Send structured read hole reply: cookie = %" PRIu64 ", offset = %" PRIu64 ", len = %" PRIu64
>   nbd_co_send_extents(uint64_t cookie, unsigned int extents, uint32_t id, uint64_t length, int last) "Send block status reply: cookie = %" PRIu64 ", extents = %u, context = %d (extents cover %" PRIu64 " bytes, last chunk = %d)"
>   nbd_co_send_chunk_error(uint64_t cookie, int err, const char *errname, const char *msg) "Send structured error reply: cookie = %" PRIu64 ", error = %d (%s), msg = '%s'"
> -nbd_co_receive_block_status_payload_compliance(uint64_t from, int len) "client sent unusable block status payload: from=0x%" PRIx64 ", len=0x%x"
> +nbd_co_receive_block_status_payload_compliance(uint64_t from, uint64_t len) "client sent unusable block status payload: from=0x%" PRIx64 ", len=0x%" PRIx64
>   nbd_co_receive_request_decode_type(uint64_t cookie, uint16_t type, const char *name) "Decoding type: cookie = %" PRIu64 ", type = %" PRIu16 " (%s)"
>   nbd_co_receive_request_payload_received(uint64_t cookie, uint64_t len) "Payload received: cookie = %" PRIu64 ", len = %" PRIu64
>   nbd_co_receive_ext_payload_compliance(uint64_t from, uint64_t len) "client sent non-compliant write without payload flag: from=0x%" PRIx64 ", len=0x%" PRIx64
> 
> 
> 

-- 
Best regards,
Vladimir



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 12/12] nbd/server: Add FLAG_PAYLOAD support to CMD_BLOCK_STATUS
  2023-09-25 19:22 ` [PATCH v7 12/12] nbd/server: Add FLAG_PAYLOAD support to CMD_BLOCK_STATUS Eric Blake
  2023-09-30 13:24   ` Vladimir Sementsov-Ogievskiy
@ 2023-10-05 14:33   ` Vladimir Sementsov-Ogievskiy
  2023-10-05 15:22     ` [Libguestfs] " Eric Blake
  1 sibling, 1 reply; 30+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2023-10-05 14:33 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: libguestfs, qemu-block, Kevin Wolf, Hanna Reitz

On 25.09.23 22:22, Eric Blake wrote:
> Allow a client to request a subset of negotiated meta contexts.  For
> example, a client may ask to use a single connection to learn about
> both block status and dirty bitmaps, but where the dirty bitmap
> queries only need to be performed on a subset of the disk; forcing the
> server to compute that information on block status queries in the rest
> of the disk is wasted effort (both at the server, and on the amount of
> traffic sent over the wire to be parsed and ignored by the client).
> 
> Qemu as an NBD client never requests to use more than one meta
> context, so it has no need to use block status payloads.  Testing this
> instead requires support from libnbd, which CAN access multiple meta
> contexts in parallel from a single NBD connection; an interop test
> submitted to the libnbd project at the same time as this patch
> demonstrates the feature working, as well as testing some corner cases
> (for example, when the payload length is longer than the export
> length), although other corner cases (like passing the same id
> duplicated) requires a protocol fuzzer because libnbd is not wired up
> to break the protocol that badly.
> 
> This also includes tweaks to 'qemu-nbd --list' to show when a server
> is advertising the capability, and to the testsuite to reflect the
> addition to that output.
> 
> Of note: qemu will always advertise the new feature bit during
> NBD_OPT_INFO if extended headers have alreay been negotiated
> (regardless of whether any NBD_OPT_SET_META_CONTEXT negotiation has
> occurred); but for NBD_OPT_GO, qemu only advertises the feature if
> block status is also enabled (that is, if the client does not
> negotiate any contexts, then NBD_CMD_BLOCK_STATUS cannot be used, so
> the feature is not advertised).
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
> 
> v5: factor out 'id - NBD_MTA_ID_DIRTY_BITMAP' [Vladimir], rework logic
> on zero-length requests to be clearer [Vladimir], rebase to earlier
> changes

[..]

> +/*
> + * nbd_co_block_status_payload_read
> + * Called when a client wants a subset of negotiated contexts via a
> + * BLOCK_STATUS payload.  Check the payload for valid length and
> + * contents.  On success, return 0 with request updated to effective
> + * length.  If request was invalid but all payload consumed, return 0
> + * with request->len and request->contexts->count set to 0 (which will
> + * trigger an appropriate NBD_EINVAL response later on).  Return
> + * negative errno if the payload was not fully consumed.
> + */
> +static int
> +nbd_co_block_status_payload_read(NBDClient *client, NBDRequest *request,
> +                                 Error **errp)
> +{
> +    int payload_len = request->len;

payload_len should be uint64_t

> +    g_autofree char *buf = NULL;
> +    size_t count, i, nr_bitmaps;
> +    uint32_t id;
> +

otherwise, we may do something unexpected here, when reqeuest->len is too big for int:

> +    if (payload_len > NBD_MAX_BUFFER_SIZE) {
> +        error_setg(errp, "len (%" PRIu64 ") is larger than max len (%u)",
> +                   request->len, NBD_MAX_BUFFER_SIZE);
> +        return -EINVAL;
> +    }
> +
> +    assert(client->contexts.exp == client->exp);
> +    nr_bitmaps = client->exp->nr_export_bitmaps;
> +    request->contexts = g_new0(NBDMetaContexts, 1);
> +    request->contexts->exp = client->exp;
> +
> +    if (payload_len % sizeof(uint32_t) ||
> +        payload_len < sizeof(NBDBlockStatusPayload) ||
> +        payload_len > (sizeof(NBDBlockStatusPayload) +
> +                       sizeof(id) * client->contexts.count)) {
> +        goto skip;
> +    }

[..]

>    * connection right away, -EAGAIN to indicate we were interrupted and the
> @@ -2505,7 +2593,18 @@ static int coroutine_fn nbd_co_receive_request(NBDRequestData *req,
>           break;
> 
>       case NBD_CMD_BLOCK_STATUS:
> -        request->contexts = &client->contexts;
> +        if (extended_with_payload) {
> +            ret = nbd_co_block_status_payload_read(client, request, errp);
> +            if (ret < 0) {
> +                return ret;
> +            }
> +            /* payload now consumed */
> +            check_length = extended_with_payload = false;

why set extended_with_payload to false? it's a bit misleading. And you don't do this for WRITE request.

> +            payload_len = 0;
> +            valid_flags |= NBD_CMD_FLAG_PAYLOAD_LEN;
> +        } else {
> +            request->contexts = &client->contexts;
> +        }
>           valid_flags |= NBD_CMD_FLAG_REQ_ONE;
>           break;
> 

[..]


with payload_len changed to uint64_t, your squash-in applied and s/>/>=/ fixed:
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>


-- 
Best regards,
Vladimir



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Libguestfs] [PATCH v7 12/12] nbd/server: Add FLAG_PAYLOAD support to CMD_BLOCK_STATUS
  2023-10-05 14:33   ` Vladimir Sementsov-Ogievskiy
@ 2023-10-05 15:22     ` Eric Blake
  0 siblings, 0 replies; 30+ messages in thread
From: Eric Blake @ 2023-10-05 15:22 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: qemu-devel, Kevin Wolf, Hanna Reitz, qemu-block, libguestfs

On Thu, Oct 05, 2023 at 05:33:26PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > +static int
> > +nbd_co_block_status_payload_read(NBDClient *client, NBDRequest *request,
> > +                                 Error **errp)
> > +{
> > +    int payload_len = request->len;
> 
> payload_len should be uint64_t
> 
> > +    g_autofree char *buf = NULL;
> > +    size_t count, i, nr_bitmaps;
> > +    uint32_t id;
> > +
> 
> otherwise, we may do something unexpected here, when reqeuest->len is too big for int:
> 
> > +    if (payload_len > NBD_MAX_BUFFER_SIZE) {
> > +        error_setg(errp, "len (%" PRIu64 ") is larger than max len (%u)",
> > +                   request->len, NBD_MAX_BUFFER_SIZE);
> > +        return -EINVAL;
> > +    }

Oh, it looks like I introduced that same type mismatch in commit
8db7e2d6 as well, although it appears to have a latent effect until
this series enables the ability for request->length to actually exceed
32 bits.  I'll reply on 1/12 with another squash I'm making there.

> > +
> > +    assert(client->contexts.exp == client->exp);
> > +    nr_bitmaps = client->exp->nr_export_bitmaps;
> > +    request->contexts = g_new0(NBDMetaContexts, 1);
> > +    request->contexts->exp = client->exp;
> > +
> > +    if (payload_len % sizeof(uint32_t) ||
> > +        payload_len < sizeof(NBDBlockStatusPayload) ||
> > +        payload_len > (sizeof(NBDBlockStatusPayload) +
> > +                       sizeof(id) * client->contexts.count)) {
> > +        goto skip;
> > +    }
> 
> [..]
> 
> >    * connection right away, -EAGAIN to indicate we were interrupted and the
> > @@ -2505,7 +2593,18 @@ static int coroutine_fn nbd_co_receive_request(NBDRequestData *req,
> >           break;
> > 
> >       case NBD_CMD_BLOCK_STATUS:
> > -        request->contexts = &client->contexts;
> > +        if (extended_with_payload) {
> > +            ret = nbd_co_block_status_payload_read(client, request, errp);
> > +            if (ret < 0) {
> > +                return ret;
> > +            }
> > +            /* payload now consumed */
> > +            check_length = extended_with_payload = false;
> 
> why set extended_with_payload to false? it's a bit misleading. And you don't do this for WRITE request.

Indeed; it doesn't make any different to later in the function.  Will drop.

> 
> > +            payload_len = 0;
> > +            valid_flags |= NBD_CMD_FLAG_PAYLOAD_LEN;
> > +        } else {
> > +            request->contexts = &client->contexts;
> > +        }
> >           valid_flags |= NBD_CMD_FLAG_REQ_ONE;
> >           break;
> > 
> 
> [..]
> 
> 
> with payload_len changed to uint64_t, your squash-in applied and s/>/>=/ fixed:
> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>

Thanks for the careful review.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Libguestfs] [PATCH v7 01/12] nbd/server: Support a request payload
  2023-09-25 19:22 ` [PATCH v7 01/12] nbd/server: Support a request payload Eric Blake
  2023-09-27  8:55   ` Vladimir Sementsov-Ogievskiy
  2023-09-30 13:34   ` Vladimir Sementsov-Ogievskiy
@ 2023-10-05 15:38   ` Eric Blake
  2023-10-05 16:02     ` Vladimir Sementsov-Ogievskiy
  2 siblings, 1 reply; 30+ messages in thread
From: Eric Blake @ 2023-10-05 15:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, libguestfs, vsementsov

On Mon, Sep 25, 2023 at 02:22:31PM -0500, Eric Blake wrote:
> Upcoming additions to support NBD 64-bit effect lengths allow for the
> possibility to distinguish between payload length (capped at 32M) and
> effect length (64 bits, although we generally assume 63 bits because
> of off_t limitations).
[...]

> +++ b/nbd/server.c
> @@ -2322,9 +2322,11 @@ static int coroutine_fn nbd_co_receive_request(NBDRequestData *req,
>                                                 Error **errp)
>  {
>      NBDClient *client = req->client;
> +    bool extended_with_payload;
>      bool check_length = false;
>      bool check_rofs = false;
>      bool allocate_buffer = false;
> +    bool payload_okay = false;
>      unsigned payload_len = 0;

Pre-existing type mismatch caught as a result of Vladimir's review of
12/12, but:

>      int valid_flags = NBD_CMD_FLAG_FUA;
>      int ret;
> @@ -2338,6 +2340,13 @@ static int coroutine_fn nbd_co_receive_request(NBDRequestData *req,
> 
>      trace_nbd_co_receive_request_decode_type(request->cookie, request->type,
>                                               nbd_cmd_lookup(request->type));
> +    extended_with_payload = client->mode >= NBD_MODE_EXTENDED &&
> +        request->flags & NBD_CMD_FLAG_PAYLOAD_LEN;
> +    if (extended_with_payload) {
> +        payload_len = request->len;

this can assign a 64-bit number into a 32-bit variable, which can
truncate to 0,...

> +        check_length = true;
> +    }
> +
>      switch (request->type) {
>      case NBD_CMD_DISC:
>          /* Special case: we're going to disconnect without a reply,
> @@ -2354,6 +2363,15 @@ static int coroutine_fn nbd_co_receive_request(NBDRequestData *req,
>          break;
> 
>      case NBD_CMD_WRITE:
> +        if (client->mode >= NBD_MODE_EXTENDED) {
> +            if (!extended_with_payload) {
> +                /* The client is noncompliant. Trace it, but proceed. */
> +                trace_nbd_co_receive_ext_payload_compliance(request->from,
> +                                                            request->len);
> +            }
> +            valid_flags |= NBD_CMD_FLAG_PAYLOAD_LEN;
> +        }
> +        payload_okay = true;
>          payload_len = request->len;

...the pre-existing code is safe only as long as request->len cannot
exceed 32 bytes (which it can't do until later in this series actually
enables extended requests).  Switching the type now is prudent...

>          check_length = true;
>          allocate_buffer = true;
> @@ -2395,6 +2413,14 @@ static int coroutine_fn nbd_co_receive_request(NBDRequestData *req,
>                     request->len, NBD_MAX_BUFFER_SIZE);
>          return -EINVAL;
>      }
> +    if (payload_len && !payload_okay) {
> +        /*
> +         * For now, we don't support payloads on other commands; but
> +         * we can keep the connection alive by ignoring the payload.
> +         */
> +        assert(request->type != NBD_CMD_WRITE);
> +        request->len = 0;

...otherwise, this check is bypassed for a request size of exactly 4G
if check_length is false and thus the previous conditional for
request->len vs. NBD_MAX_BUFFER_SIZE didn't trigger (prior to this
patch, payload_len was only set for CND_WRITE which also set
check_length).  Thus, I'm squashing in:

diff --git i/nbd/server.c w/nbd/server.c
index 5258064e5ac..1cb66e86a89 100644
--- i/nbd/server.c
+++ w/nbd/server.c
@@ -2327,7 +2327,7 @@ static int coroutine_fn nbd_co_receive_request(NBDRequestData *req,
     bool check_rofs = false;
     bool allocate_buffer = false;
     bool payload_okay = false;
-    unsigned payload_len = 0;
+    uint64_t payload_len = 0;
     int valid_flags = NBD_CMD_FLAG_FUA;
     int ret;



-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [Libguestfs] [PATCH v7 01/12] nbd/server: Support a request payload
  2023-10-05 15:38   ` [Libguestfs] " Eric Blake
@ 2023-10-05 16:02     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 30+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2023-10-05 16:02 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: qemu-block, libguestfs

On 05.10.23 18:38, Eric Blake wrote:
> On Mon, Sep 25, 2023 at 02:22:31PM -0500, Eric Blake wrote:
>> Upcoming additions to support NBD 64-bit effect lengths allow for the
>> possibility to distinguish between payload length (capped at 32M) and
>> effect length (64 bits, although we generally assume 63 bits because
>> of off_t limitations).
> [...]
> 
>> +++ b/nbd/server.c
>> @@ -2322,9 +2322,11 @@ static int coroutine_fn nbd_co_receive_request(NBDRequestData *req,
>>                                                  Error **errp)
>>   {
>>       NBDClient *client = req->client;
>> +    bool extended_with_payload;
>>       bool check_length = false;
>>       bool check_rofs = false;
>>       bool allocate_buffer = false;
>> +    bool payload_okay = false;
>>       unsigned payload_len = 0;
> 
> Pre-existing type mismatch caught as a result of Vladimir's review of
> 12/12, but:
> 
>>       int valid_flags = NBD_CMD_FLAG_FUA;
>>       int ret;
>> @@ -2338,6 +2340,13 @@ static int coroutine_fn nbd_co_receive_request(NBDRequestData *req,
>>
>>       trace_nbd_co_receive_request_decode_type(request->cookie, request->type,
>>                                                nbd_cmd_lookup(request->type));
>> +    extended_with_payload = client->mode >= NBD_MODE_EXTENDED &&
>> +        request->flags & NBD_CMD_FLAG_PAYLOAD_LEN;
>> +    if (extended_with_payload) {
>> +        payload_len = request->len;
> 
> this can assign a 64-bit number into a 32-bit variable, which can
> truncate to 0,...
> 
>> +        check_length = true;
>> +    }
>> +
>>       switch (request->type) {
>>       case NBD_CMD_DISC:
>>           /* Special case: we're going to disconnect without a reply,
>> @@ -2354,6 +2363,15 @@ static int coroutine_fn nbd_co_receive_request(NBDRequestData *req,
>>           break;
>>
>>       case NBD_CMD_WRITE:
>> +        if (client->mode >= NBD_MODE_EXTENDED) {
>> +            if (!extended_with_payload) {
>> +                /* The client is noncompliant. Trace it, but proceed. */
>> +                trace_nbd_co_receive_ext_payload_compliance(request->from,
>> +                                                            request->len);
>> +            }
>> +            valid_flags |= NBD_CMD_FLAG_PAYLOAD_LEN;
>> +        }
>> +        payload_okay = true;
>>           payload_len = request->len;
> 
> ...the pre-existing code is safe only as long as request->len cannot
> exceed 32 bytes (which it can't do until later in this series actually
> enables extended requests).  Switching the type now is prudent...
> 
>>           check_length = true;
>>           allocate_buffer = true;
>> @@ -2395,6 +2413,14 @@ static int coroutine_fn nbd_co_receive_request(NBDRequestData *req,
>>                      request->len, NBD_MAX_BUFFER_SIZE);
>>           return -EINVAL;
>>       }
>> +    if (payload_len && !payload_okay) {
>> +        /*
>> +         * For now, we don't support payloads on other commands; but
>> +         * we can keep the connection alive by ignoring the payload.
>> +         */
>> +        assert(request->type != NBD_CMD_WRITE);
>> +        request->len = 0;
> 
> ...otherwise, this check is bypassed for a request size of exactly 4G
> if check_length is false and thus the previous conditional for
> request->len vs. NBD_MAX_BUFFER_SIZE didn't trigger (prior to this
> patch, payload_len was only set for CND_WRITE which also set
> check_length).  Thus, I'm squashing in:
> 
> diff --git i/nbd/server.c w/nbd/server.c
> index 5258064e5ac..1cb66e86a89 100644
> --- i/nbd/server.c
> +++ w/nbd/server.c
> @@ -2327,7 +2327,7 @@ static int coroutine_fn nbd_co_receive_request(NBDRequestData *req,
>       bool check_rofs = false;
>       bool allocate_buffer = false;
>       bool payload_okay = false;
> -    unsigned payload_len = 0;
> +    uint64_t payload_len = 0;
>       int valid_flags = NBD_CMD_FLAG_FUA;
>       int ret;
> 
> 

OK, agree

-- 
Best regards,
Vladimir



^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2023-10-05 16:04 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-25 19:22 [PATCH v7 00/12] NBD 64-bit extensions for qemu Eric Blake
2023-09-25 19:22 ` [PATCH v7 01/12] nbd/server: Support a request payload Eric Blake
2023-09-27  8:55   ` Vladimir Sementsov-Ogievskiy
2023-09-27 15:59     ` Eric Blake
2023-09-28  9:09       ` Vladimir Sementsov-Ogievskiy
2023-09-28 14:33         ` Eric Blake
2023-09-28 20:52           ` Vladimir Sementsov-Ogievskiy
2023-09-30 13:34   ` Vladimir Sementsov-Ogievskiy
2023-10-05 15:38   ` [Libguestfs] " Eric Blake
2023-10-05 16:02     ` Vladimir Sementsov-Ogievskiy
2023-09-25 19:22 ` [PATCH v7 02/12] nbd/server: Prepare to receive extended header requests Eric Blake
2023-09-25 19:22 ` [PATCH v7 03/12] nbd/server: Prepare to send extended header replies Eric Blake
2023-09-25 19:22 ` [PATCH v7 04/12] nbd/server: Support 64-bit block status Eric Blake
2023-09-25 19:22 ` [PATCH v7 05/12] nbd/server: Enable initial support for extended headers Eric Blake
2023-09-25 19:22 ` [PATCH v7 06/12] nbd/client: Plumb errp through nbd_receive_replies Eric Blake
2023-09-27 11:29   ` Vladimir Sementsov-Ogievskiy
2023-09-25 19:22 ` [PATCH v7 07/12] nbd/client: Initial support for extended headers Eric Blake
2023-09-29  9:24   ` Vladimir Sementsov-Ogievskiy
2023-09-25 19:22 ` [PATCH v7 08/12] nbd/client: Accept 64-bit block status chunks Eric Blake
2023-09-25 19:22 ` [PATCH v7 09/12] nbd/client: Request extended headers during negotiation Eric Blake
2023-09-25 19:22 ` [PATCH v7 10/12] nbd/server: Refactor list of negotiated meta contexts Eric Blake
2023-09-25 19:22 ` [PATCH v7 11/12] nbd/server: Prepare for per-request filtering of BLOCK_STATUS Eric Blake
2023-09-29 11:54   ` Vladimir Sementsov-Ogievskiy
2023-09-25 19:22 ` [PATCH v7 12/12] nbd/server: Add FLAG_PAYLOAD support to CMD_BLOCK_STATUS Eric Blake
2023-09-30 13:24   ` Vladimir Sementsov-Ogievskiy
2023-10-04 21:55     ` Eric Blake
2023-10-05 13:49       ` [Libguestfs] " Eric Blake
2023-10-05 14:26         ` Vladimir Sementsov-Ogievskiy
2023-10-05 14:33   ` Vladimir Sementsov-Ogievskiy
2023-10-05 15:22     ` [Libguestfs] " Eric Blake

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).