From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:40545) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XwXc4-0000p2-TE for qemu-devel@nongnu.org; Thu, 04 Dec 2014 09:42:42 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XwXbx-0000Ei-I5 for qemu-devel@nongnu.org; Thu, 04 Dec 2014 09:42:36 -0500 Received: from mx-v6.kamp.de ([2a02:248:0:51::16]:48050 helo=mx01.kamp.de) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XwXbx-0000EW-44 for qemu-devel@nongnu.org; Thu, 04 Dec 2014 09:42:29 -0500 Message-ID: <548072CD.5000302@kamp.de> Date: Thu, 04 Dec 2014 15:42:21 +0100 From: Peter Lieven MIME-Version: 1.0 References: <1417530788-15172-1-git-send-email-pl@kamp.de> <1417530788-15172-4-git-send-email-pl@kamp.de> <20141204141201.GC5129@noname.redhat.com> In-Reply-To: <20141204141201.GC5129@noname.redhat.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [RFC PATCH 3/3] virtio-blk: introduce multiread List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Kevin Wolf Cc: famz@redhat.com, benoit@irqsave.net, ming.lei@canonical.com, armbru@redhat.com, qemu-devel@nongnu.org, stefanha@redhat.com, pbonzini@redhat.com, mreitz@redhat.com On 04.12.2014 15:12, Kevin Wolf wrote: > Am 02.12.2014 um 15:33 hat Peter Lieven geschrieben: >> this patch finally introduce multiread support to virtio-blk while >> multiwrite support was there for a long time read support was missing. >> >> To achieve this the patch does serveral things which might need futher >> explaination: >> >> - the whole merge and multireq logic is moved from block.c into >> virtio-blk. This is move is a preparation for directly creating a >> coroutine out of virtio-blk. >> >> - requests are only merged if they are strictly sequential and no >> longer sorted. This simplification decreases overhead and reduces >> latency. It will also merge some requests which were unmergable before. >> >> The old algorithm took up to 32 requests sorted them and tried to merge >> them. The outcome was anything between 1 and 32 requests. In case of >> 32 requests there were 31 requests unnecessarily delayed. >> >> On the other hand lets imagine e.g. 16 unmergeable requests followed >> by 32 mergable requests. The latter 32 requests would have been split >> into two 16 byte requests. >> >> Last the simplified logic allows for a fast path if we have only a >> single request in the multirequest. In this case the request is sent as >> ordinary request without mulltireq callbacks. >> >> As a first benchmark I installed Ubuntu 14.04.1 on a ramdisk. The number of >> merged requests is in the same order while the latency is slightly decreased. >> One should not stick to much to the numbers because the number of wr_requests >> are highly fluctuant. I hope the numbers show that this patch is at least >> not causing to big harm: >> >> Cmdline: >> qemu-system-x86_64 -m 1024 -smp 2 -enable-kvm -cdrom ubuntu-14.04.1-server-amd64.iso \ >> -drive if=virtio,file=/tmp/ubuntu.raw -monitor stdio >> >> Before: >> virtio0: rd_bytes=150979072 wr_bytes=2591138816 rd_operations=18475 wr_operations=69216 >> flush_operations=15343 wr_total_time_ns=26969283701 rd_total_time_ns=1018449432 >> flush_total_time_ns=562596819 rd_merged=0 wr_merged=10453 >> >> After: >> virtio0: rd_bytes=148112896 wr_bytes=2845834240 rd_operations=17535 wr_operations=72197 >> flush_operations=15760 wr_total_time_ns=26104971623 rd_total_time_ns=1012926283 >> flush_total_time_ns=564414752 rd_merged=517 wr_merged=9859 >> >> Some first numbers of improved read performance while booting: >> >> The Ubuntu 14.04.1 vServer from above: >> virtio0: rd_bytes=86158336 wr_bytes=688128 rd_operations=5851 wr_operations=70 >> flush_operations=4 wr_total_time_ns=10886705 rd_total_time_ns=416688595 >> flush_total_time_ns=288776 rd_merged=1297 wr_merged=2 >> >> Windows 2012R2: >> virtio0: rd_bytes=176559104 wr_bytes=61859840 rd_operations=7200 wr_operations=360 >> flush_operations=68 wr_total_time_ns=34344992718 rd_total_time_ns=134386844669 >> flush_total_time_ns=18115517 rd_merged=641 wr_merged=216 >> >> Signed-off-by: Peter Lieven >> --- >> hw/block/dataplane/virtio-blk.c | 10 +- >> hw/block/virtio-blk.c | 222 ++++++++++++++++++++++++--------------- >> include/hw/virtio/virtio-blk.h | 23 ++-- >> 3 files changed, 156 insertions(+), 99 deletions(-) >> >> diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c >> index 1222a37..aa4ad91 100644 >> --- a/hw/block/dataplane/virtio-blk.c >> +++ b/hw/block/dataplane/virtio-blk.c >> @@ -96,9 +96,8 @@ static void handle_notify(EventNotifier *e) >> event_notifier_test_and_clear(&s->host_notifier); >> blk_io_plug(s->conf->conf.blk); >> for (;;) { >> - MultiReqBuffer mrb = { >> - .num_writes = 0, >> - }; >> + MultiReqBuffer mrb_rd = {}; >> + MultiReqBuffer mrb_wr = {.is_write = 1}; >> int ret; >> >> /* Disable guest->host notifies to avoid unnecessary vmexits */ >> @@ -117,10 +116,11 @@ static void handle_notify(EventNotifier *e) >> req->elem.in_num, >> req->elem.index); >> >> - virtio_blk_handle_request(req, &mrb); >> + virtio_blk_handle_request(req, &mrb_wr, &mrb_rd); >> } >> >> - virtio_submit_multiwrite(s->conf->conf.blk, &mrb); >> + virtio_submit_multireq(s->conf->conf.blk, &mrb_wr); >> + virtio_submit_multireq(s->conf->conf.blk, &mrb_rd); >> >> if (likely(ret == -EAGAIN)) { /* vring emptied */ >> /* Re-enable guest->host notifies and stop processing the vring. >> diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c >> index 490f961..f522bfc 100644 >> --- a/hw/block/virtio-blk.c >> +++ b/hw/block/virtio-blk.c >> @@ -22,12 +22,15 @@ >> #include "dataplane/virtio-blk.h" >> #include "migration/migration.h" >> #include "block/scsi.h" >> +#include "block/block_int.h" > No. :-) > > We could either expose the information that you need through > BlockBackend, or wait until your automatic request splitting logic is > implemented in block.c and the information isn't needed here any more. I will add sth to block.c and follow-up with the request splitting logic later. It will still make sense to now the limit here. Otherwise we merge something that we split again afterwards. > >> #ifdef __linux__ >> # include >> #endif >> #include "hw/virtio/virtio-bus.h" >> #include "hw/virtio/virtio-access.h" >> >> +/* #define DEBUG_MULTIREQ */ >> + >> VirtIOBlockReq *virtio_blk_alloc_request(VirtIOBlock *s) >> { >> VirtIOBlockReq *req = g_slice_new(VirtIOBlockReq); >> @@ -88,6 +91,11 @@ static void virtio_blk_rw_complete(void *opaque, int ret) >> >> trace_virtio_blk_rw_complete(req, ret); >> >> +#ifdef DEBUG_MULTIREQ >> + printf("virtio_blk_rw_complete p %p ret %d\n", >> + req, ret); >> +#endif > In the final version, please use tracepoints instead of printfs. of course. This is just my debug code. > >> if (ret) { >> int p = virtio_ldl_p(VIRTIO_DEVICE(req->dev), &req->out.type); >> bool is_read = !(p & VIRTIO_BLK_T_OUT); >> @@ -257,24 +265,63 @@ static void virtio_blk_handle_scsi(VirtIOBlockReq *req) >> virtio_blk_free_request(req); >> } >> >> -void virtio_submit_multiwrite(BlockBackend *blk, MultiReqBuffer *mrb) >> +static void virtio_multireq_cb(void *opaque, int ret) >> +{ >> + MultiReqBuffer *mrb = opaque; >> + int i; >> +#ifdef DEBUG_MULTIREQ >> + printf("virtio_multireq_cb: p %p sector_num %ld nb_sectors %d is_write %d num_reqs %d\n", >> + mrb, mrb->sector_num, mrb->nb_sectors, mrb->is_write, mrb->num_reqs); >> +#endif >> + for (i = 0; i < mrb->num_reqs; i++) { >> + virtio_blk_rw_complete(mrb->reqs[i], ret); >> + } >> + >> + qemu_iovec_destroy(&mrb->qiov); >> + g_free(mrb); >> +} >> + >> +void virtio_submit_multireq(BlockBackend *blk, MultiReqBuffer *mrb0) >> { >> - int i, ret; >> + MultiReqBuffer *mrb = NULL; >> >> - if (!mrb->num_writes) { >> + if (!mrb0->num_reqs) { >> return; >> } >> >> - ret = blk_aio_multiwrite(blk, mrb->blkreq, mrb->num_writes); >> - if (ret != 0) { >> - for (i = 0; i < mrb->num_writes; i++) { >> - if (mrb->blkreq[i].error) { >> - virtio_blk_rw_complete(mrb->blkreq[i].opaque, -EIO); >> - } >> + if (mrb0->num_reqs == 1) { >> + if (mrb0->is_write) { >> + blk_aio_writev(blk, mrb0->sector_num, &mrb0->reqs[0]->qiov, mrb0->nb_sectors, >> + virtio_blk_rw_complete, mrb0->reqs[0]); >> + } else { >> + blk_aio_readv(blk, mrb0->sector_num, &mrb0->reqs[0]->qiov, mrb0->nb_sectors, >> + virtio_blk_rw_complete, mrb0->reqs[0]); >> } >> + qemu_iovec_destroy(&mrb0->qiov); >> + mrb0->num_reqs = 0; >> + return; >> + } >> + >> + mrb = g_malloc(sizeof(MultiReqBuffer)); >> + memcpy(mrb, mrb0, sizeof(MultiReqBuffer)); >> + mrb0->num_reqs = 0; > We probably want to avoid allocations as much as we can. If at all > possible, we don't want to memcpy() either. > > Most parts of MultiReqBuffer don't seem to be actually needed during the > request and in the callback any more. Essentially, all that is needed is > the qiov and a way to find all the requests that belong to the same > MultiReqBuffer, so that they can all be completed. > > We could create a linked list by having a VirtIOBlockReq *next in the > request struct. For the qiov we can probably just use the field in the > first request (i.e. we extend the qiov of the first request). Thats a good idea and should work. I will look into this. But where would you store the merge qiov? It looks like I would need a small struct which holds the qiov and the pointer to req[0]. This can then be used as opaque value to the callback. > > With these requirements removed, MultiReqBuffer can live on the stack > and doesn't need to be copied to the heap. > > > With the additional patch that you send me, the next few lines read: > > + qemu_iovec_init(&mrb->qiov, mrb->num_reqs); > + > + for (i = 0; i < mrb->num_reqs; i++) { > + qemu_iovec_concat(&mrb->qiov, &mrb->reqs[i]->qiov, 0, mrb->reqs[i]->qiov.size); > + } > + assert(mrb->nb_sectors == mrb->qiov.size / BDRV_SECTOR_SIZE); > > The long line is more than 80 characters, but probably more importantly, > mrb->num_reqs is a really bad estimation for niov. If the guest uses any > vectored I/O, we're sure to get reallocations. What you really want to > use here is the old mrb->qiov.niov (which is an abuse to store in the > qiov, see below). Of course, my fault. I was so happy to have an exact value for the number of niov that I put in the wrong variable ;-) > > (I also wonder whether it would make sense to keep qiovs allocated when > putting VirtIOBlockReqs into the pool, but that's not for this series.) > >> + >> +#ifdef DEBUG_MULTIREQ >> + printf("virtio_submit_multireq: p %p sector_num %ld nb_sectors %d is_write %d num_reqs %d\n", >> + mrb, mrb->sector_num, mrb->nb_sectors, mrb->is_write, mrb->num_reqs); >> +#endif >> + >> + if (mrb->is_write) { >> + blk_aio_writev(blk, mrb->sector_num, &mrb->qiov, mrb->nb_sectors, >> + virtio_multireq_cb, mrb); >> + } else { >> + blk_aio_readv(blk, mrb->sector_num, &mrb->qiov, mrb->nb_sectors, >> + virtio_multireq_cb, mrb); >> } >> >> - mrb->num_writes = 0; >> + block_acct_merge_done(blk_get_stats(blk), >> + mrb->is_write ? BLOCK_ACCT_WRITE : BLOCK_ACCT_READ, >> + mrb->num_reqs - 1); >> } >> >> static void virtio_blk_handle_flush(VirtIOBlockReq *req, MultiReqBuffer *mrb) >> @@ -283,9 +330,9 @@ static void virtio_blk_handle_flush(VirtIOBlockReq *req, MultiReqBuffer *mrb) >> BLOCK_ACCT_FLUSH); >> >> /* >> - * Make sure all outstanding writes are posted to the backing device. >> + * Make sure all outstanding requests are posted to the backing device. >> */ >> - virtio_submit_multiwrite(req->dev->blk, mrb); >> + virtio_submit_multireq(req->dev->blk, mrb); >> blk_aio_flush(req->dev->blk, virtio_blk_flush_complete, req); >> } >> >> @@ -308,61 +355,8 @@ static bool virtio_blk_sect_range_ok(VirtIOBlock *dev, >> return true; >> } >> >> -static void virtio_blk_handle_write(VirtIOBlockReq *req, MultiReqBuffer *mrb) >> -{ >> - BlockRequest *blkreq; >> - uint64_t sector; >> - >> - sector = virtio_ldq_p(VIRTIO_DEVICE(req->dev), &req->out.sector); >> - >> - trace_virtio_blk_handle_write(req, sector, req->qiov.size / 512); >> - >> - if (!virtio_blk_sect_range_ok(req->dev, sector, req->qiov.size)) { >> - virtio_blk_req_complete(req, VIRTIO_BLK_S_IOERR); >> - virtio_blk_free_request(req); >> - return; >> - } >> - >> - block_acct_start(blk_get_stats(req->dev->blk), &req->acct, req->qiov.size, >> - BLOCK_ACCT_WRITE); >> - >> - if (mrb->num_writes == VIRTIO_BLK_MAX_MERGE_REQS) { >> - virtio_submit_multiwrite(req->dev->blk, mrb); >> - } >> - >> - blkreq = &mrb->blkreq[mrb->num_writes]; >> - blkreq->sector = sector; >> - blkreq->nb_sectors = req->qiov.size / BDRV_SECTOR_SIZE; >> - blkreq->qiov = &req->qiov; >> - blkreq->cb = virtio_blk_rw_complete; >> - blkreq->opaque = req; >> - blkreq->error = 0; >> - >> - mrb->num_writes++; >> -} >> - >> -static void virtio_blk_handle_read(VirtIOBlockReq *req) >> -{ >> - uint64_t sector; >> - >> - sector = virtio_ldq_p(VIRTIO_DEVICE(req->dev), &req->out.sector); >> - >> - trace_virtio_blk_handle_read(req, sector, req->qiov.size / 512); >> - >> - if (!virtio_blk_sect_range_ok(req->dev, sector, req->qiov.size)) { >> - virtio_blk_req_complete(req, VIRTIO_BLK_S_IOERR); >> - virtio_blk_free_request(req); >> - return; >> - } >> - >> - block_acct_start(blk_get_stats(req->dev->blk), &req->acct, req->qiov.size, >> - BLOCK_ACCT_READ); >> - blk_aio_readv(req->dev->blk, sector, &req->qiov, >> - req->qiov.size / BDRV_SECTOR_SIZE, >> - virtio_blk_rw_complete, req); >> -} > Handling reads and writes in a common place makes sense, but the code is > long enough that I would still keep it in a separate function, like > virtio_blk_handle_rw(). Sure > >> -void virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb) >> +void virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb_wr, >> + MultiReqBuffer *mrb_rd) >> { >> uint32_t type; >> struct iovec *in_iov = req->elem.in_sg; >> @@ -397,7 +391,7 @@ void virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb) >> type = virtio_ldl_p(VIRTIO_DEVICE(req->dev), &req->out.type); >> >> if (type & VIRTIO_BLK_T_FLUSH) { >> - virtio_blk_handle_flush(req, mrb); >> + virtio_blk_handle_flush(req, mrb_wr); >> } else if (type & VIRTIO_BLK_T_SCSI_CMD) { >> virtio_blk_handle_scsi(req); >> } else if (type & VIRTIO_BLK_T_GET_ID) { >> @@ -414,13 +408,71 @@ void virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb) >> iov_from_buf(in_iov, in_num, 0, serial, size); >> virtio_blk_req_complete(req, VIRTIO_BLK_S_OK); >> virtio_blk_free_request(req); >> - } else if (type & VIRTIO_BLK_T_OUT) { >> - qemu_iovec_init_external(&req->qiov, iov, out_num); >> - virtio_blk_handle_write(req, mrb); >> - } else if (type == VIRTIO_BLK_T_IN || type == VIRTIO_BLK_T_BARRIER) { >> - /* VIRTIO_BLK_T_IN is 0, so we can't just & it. */ >> - qemu_iovec_init_external(&req->qiov, in_iov, in_num); >> - virtio_blk_handle_read(req); >> + } else if (type & VIRTIO_BLK_T_OUT || type == VIRTIO_BLK_T_IN || type == VIRTIO_BLK_T_BARRIER) { >> + bool is_write = type & VIRTIO_BLK_T_OUT; >> + MultiReqBuffer *mrb = is_write ? mrb_wr : mrb_rd; >> + int64_t sector_num = virtio_ldq_p(VIRTIO_DEVICE(req->dev), &req->out.sector); >> + BlockDriverState *bs = blk_bs(req->dev->blk); >> + int nb_sectors = 0; >> + int merge = 1; >> + >> + if (!virtio_blk_sect_range_ok(req->dev, sector_num, req->qiov.size)) { >> + virtio_blk_req_complete(req, VIRTIO_BLK_S_IOERR); >> + virtio_blk_free_request(req); >> + return; >> + } >> + >> + if (is_write) { >> + qemu_iovec_init_external(&req->qiov, iov, out_num); >> + trace_virtio_blk_handle_write(req, sector_num, req->qiov.size / BDRV_SECTOR_SIZE); >> + } else { >> + qemu_iovec_init_external(&req->qiov, in_iov, in_num); >> + trace_virtio_blk_handle_read(req, sector_num, req->qiov.size / BDRV_SECTOR_SIZE); >> + } >> + >> + nb_sectors = req->qiov.size / BDRV_SECTOR_SIZE; >> +#ifdef DEBUG_MULTIREQ >> + printf("virtio_blk_handle_request p %p sector_num %ld nb_sectors %d is_write %d\n", >> + req, sector_num, nb_sectors, is_write); >> +#endif >> + >> + block_acct_start(blk_get_stats(req->dev->blk), &req->acct, req->qiov.size, >> + is_write ? BLOCK_ACCT_WRITE : BLOCK_ACCT_READ); >> + >> + /* merge would exceed maximum number of requests or IOVs */ >> + if (mrb->num_reqs == MAX_MERGE_REQS || >> + mrb->qiov.niov + req->qiov.niov + 1 > IOV_MAX) { >> + merge = 0; >> + } > Again with your latest patch applied on top: > > You're abusing mrb->qiov.niov while gathering requests. mrb->qiov is > never initialised and you don't actually add iovs anywhere, but you just > increase qiov.niov without the associated data. > > Why don't you make it in simple int instead of QEMUIOVector? I was trying to use same struct for collecting the requests and building the callback opaque. I will split that. > >> + >> + /* merge would exceed maximum transfer length of backend device */ >> + if (bs->bl.max_transfer_length && >> + mrb->nb_sectors + nb_sectors > bs->bl.max_transfer_length) { >> + merge = 0; >> + } >> + >> + /* requests are not sequential */ >> + if (mrb->num_reqs && mrb->sector_num + mrb->nb_sectors != sector_num) { >> + merge = 0; >> + } >> + >> + if (!merge) { >> + virtio_submit_multireq(req->dev->blk, mrb); >> + } >> + >> + if (mrb->num_reqs == 0) { >> + qemu_iovec_init(&mrb->qiov, MAX_MERGE_REQS); >> + mrb->sector_num = sector_num; >> + mrb->nb_sectors = 0; >> + } >> + >> + qemu_iovec_concat(&mrb->qiov, &req->qiov, 0, req->qiov.size); >> + mrb->nb_sectors += req->qiov.size / BDRV_SECTOR_SIZE; >> + >> + assert(mrb->nb_sectors == mrb->qiov.size / BDRV_SECTOR_SIZE); >> + >> + mrb->reqs[mrb->num_reqs] = req; >> + mrb->num_reqs++; >> } else { >> virtio_blk_req_complete(req, VIRTIO_BLK_S_UNSUPP); >> virtio_blk_free_request(req); > Otherwise it looks as if it could work. Thanks for your comments, Peter