From: Eric Blake <eblake@redhat.com>
To: qemu-devel@nongnu.org
Cc: Paolo Bonzini <pbonzini@redhat.com>,
"open list:Network Block Dev..." <qemu-block@nongnu.org>
Subject: [Qemu-devel] [PULL 1/3] nbd/server: Implement sparse reads atop structured reply
Date: Mon, 8 Jan 2018 09:31:35 -0600 [thread overview]
Message-ID: <20180108153137.5195-2-eblake@redhat.com> (raw)
In-Reply-To: <20180108153137.5195-1-eblake@redhat.com>
The reason that NBD added structured reply in the first place was
to allow for efficient reads of sparse files, by allowing the
reply to include chunks to quickly communicate holes to the client
without sending lots of zeroes over the wire. Time to implement
this in the server; our client can already read such data.
We can only skip holes insofar as the block layer can query them;
and only if the client is okay with a fragmented request (if a
client requests NBD_CMD_FLAG_DF and the entire read is a hole, we
could technically return a single NBD_REPLY_TYPE_OFFSET_HOLE, but
that's a fringe case not worth catering to here). Sadly, the
control flow is a bit wonkier than I would have preferred, but
it was minimally invasive to have a split in the action between
a fragmented read (handled directly where we recognize
NBD_CMD_READ with the right conditions, and sending multiple
chunks) vs. a single read (handled at the end of nbd_trip, for
both simple and structured replies, when we know there is only
one thing being read). Likewise, I didn't make any effort to
optimize the final chunk of a fragmented read to set the
NBD_REPLY_FLAG_DONE, but unconditionally send that as a separate
NBD_REPLY_TYPE_NONE.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20171107030912.23930-2-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
nbd/server.c | 78 +++++++++++++++++++++++++++++++++++++++++++++++++++++---
nbd/trace-events | 1 +
2 files changed, 76 insertions(+), 3 deletions(-)
diff --git a/nbd/server.c b/nbd/server.c
index 92c0fdd03b..be7310cb41 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -1303,6 +1303,7 @@ static int coroutine_fn nbd_co_send_structured_read(NBDClient *client,
uint64_t offset,
void *data,
size_t size,
+ bool final,
Error **errp)
{
NBDStructuredReadData chunk;
@@ -1313,13 +1314,73 @@ static int coroutine_fn nbd_co_send_structured_read(NBDClient *client,
assert(size);
trace_nbd_co_send_structured_read(handle, offset, data, size);
- set_be_chunk(&chunk.h, NBD_REPLY_FLAG_DONE, NBD_REPLY_TYPE_OFFSET_DATA,
- handle, sizeof(chunk) - sizeof(chunk.h) + size);
+ set_be_chunk(&chunk.h, final ? NBD_REPLY_FLAG_DONE : 0,
+ NBD_REPLY_TYPE_OFFSET_DATA, handle,
+ sizeof(chunk) - sizeof(chunk.h) + size);
stq_be_p(&chunk.offset, offset);
return nbd_co_send_iov(client, iov, 2, errp);
}
+static int coroutine_fn nbd_co_send_sparse_read(NBDClient *client,
+ uint64_t handle,
+ uint64_t offset,
+ uint8_t *data,
+ size_t size,
+ Error **errp)
+{
+ int ret = 0;
+ NBDExport *exp = client->exp;
+ size_t progress = 0;
+
+ while (progress < size) {
+ int64_t pnum;
+ int status = bdrv_block_status_above(blk_bs(exp->blk), NULL,
+ offset + progress,
+ size - progress, &pnum, NULL,
+ NULL);
+
+ if (status < 0) {
+ error_setg_errno(errp, -status, "unable to check for holes");
+ return status;
+ }
+ assert(pnum && pnum <= size - progress);
+ if (status & BDRV_BLOCK_ZERO) {
+ NBDStructuredReadHole chunk;
+ struct iovec iov[] = {
+ {.iov_base = &chunk, .iov_len = sizeof(chunk)},
+ };
+
+ trace_nbd_co_send_structured_read_hole(handle, offset + progress,
+ pnum);
+ set_be_chunk(&chunk.h, 0, NBD_REPLY_TYPE_OFFSET_HOLE,
+ handle, sizeof(chunk) - sizeof(chunk.h));
+ stq_be_p(&chunk.offset, offset + progress);
+ stl_be_p(&chunk.length, pnum);
+ ret = nbd_co_send_iov(client, iov, 1, errp);
+ } else {
+ ret = blk_pread(exp->blk, offset + progress + exp->dev_offset,
+ data + progress, pnum);
+ if (ret < 0) {
+ error_setg_errno(errp, -ret, "reading from file failed");
+ break;
+ }
+ ret = nbd_co_send_structured_read(client, handle, offset + progress,
+ data + progress, pnum, false,
+ errp);
+ }
+
+ if (ret < 0) {
+ break;
+ }
+ progress += pnum;
+ }
+ if (!ret) {
+ ret = nbd_co_send_structured_done(client, handle, errp);
+ }
+ return ret;
+}
+
static int coroutine_fn nbd_co_send_structured_error(NBDClient *client,
uint64_t handle,
uint32_t error,
@@ -1481,6 +1542,16 @@ static coroutine_fn void nbd_trip(void *opaque)
}
}
+ if (client->structured_reply && !(request.flags & NBD_CMD_FLAG_DF)) {
+ ret = nbd_co_send_sparse_read(req->client, request.handle,
+ request.from, req->data, request.len,
+ &local_err);
+ if (ret < 0) {
+ goto reply;
+ }
+ goto done;
+ }
+
ret = blk_pread(exp->blk, request.from + exp->dev_offset,
req->data, request.len);
if (ret < 0) {
@@ -1561,7 +1632,8 @@ reply:
} else if (reply_data_len) {
ret = nbd_co_send_structured_read(req->client, request.handle,
request.from, req->data,
- reply_data_len, &local_err);
+ reply_data_len, true,
+ &local_err);
} else {
ret = nbd_co_send_structured_done(req->client, request.handle,
&local_err);
diff --git a/nbd/trace-events b/nbd/trace-events
index 92568edce5..2b8268ce8c 100644
--- a/nbd/trace-events
+++ b/nbd/trace-events
@@ -57,6 +57,7 @@ nbd_blk_aio_detach(const char *name, void *ctx) "Export %s: Detaching clients fr
nbd_co_send_simple_reply(uint64_t handle, uint32_t error, const char *errname, int len) "Send simple reply: handle = %" PRIu64 ", error = %" PRIu32 " (%s), len = %d"
nbd_co_send_structured_done(uint64_t handle) "Send structured reply done: handle = %" PRIu64
nbd_co_send_structured_read(uint64_t handle, uint64_t offset, void *data, size_t size) "Send structured read data reply: handle = %" PRIu64 ", offset = %" PRIu64 ", data = %p, len = %zu"
+nbd_co_send_structured_read_hole(uint64_t handle, uint64_t offset, size_t size) "Send structured read hole reply: handle = %" PRIu64 ", offset = %" PRIu64 ", len = %zu"
nbd_co_send_structured_error(uint64_t handle, int err, const char *errname, const char *msg) "Send structured error reply: handle = %" PRIu64 ", error = %d (%s), msg = '%s'"
nbd_co_receive_request_decode_type(uint64_t handle, uint16_t type, const char *name) "Decoding type: handle = %" PRIu64 ", type = %" PRIu16 " (%s)"
nbd_co_receive_request_payload_received(uint64_t handle, uint32_t len) "Payload received: handle = %" PRIu64 ", len = %" PRIu32
--
2.14.3
next prev parent reply other threads:[~2018-01-08 15:31 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-01-08 15:31 [Qemu-devel] [PULL 0/3] NBD patches through 2018-01-08 Eric Blake
2018-01-08 15:31 ` Eric Blake [this message]
2018-01-08 15:31 ` [Qemu-devel] [PULL 2/3] nbd/server: Optimize final chunk of sparse read Eric Blake
2018-01-08 15:31 ` [Qemu-devel] [PULL 3/3] block/nbd: fix segmentation fault when .desc is not null-terminated Eric Blake
2018-01-09 16:28 ` [Qemu-devel] [PULL 0/3] NBD patches through 2018-01-08 Peter Maydell
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180108153137.5195-2-eblake@redhat.com \
--to=eblake@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).