From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:46390) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XwY40-0002x0-Pt for qemu-devel@nongnu.org; Thu, 04 Dec 2014 10:11:34 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XwY3u-0001G1-D3 for qemu-devel@nongnu.org; Thu, 04 Dec 2014 10:11:28 -0500 Received: from mx-v6.kamp.de ([2a02:248:0:51::16]:55849 helo=mx01.kamp.de) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XwY3t-0001Ei-Va for qemu-devel@nongnu.org; Thu, 04 Dec 2014 10:11:22 -0500 Message-ID: <54807991.3050007@kamp.de> Date: Thu, 04 Dec 2014 16:11:13 +0100 From: Peter Lieven MIME-Version: 1.0 References: <1417530788-15172-1-git-send-email-pl@kamp.de> <1417530788-15172-4-git-send-email-pl@kamp.de> <20141204141201.GC5129@noname.redhat.com> <548072CD.5000302@kamp.de> <20141204150302.GF5129@noname.redhat.com> In-Reply-To: <20141204150302.GF5129@noname.redhat.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 8bit Subject: Re: [Qemu-devel] [RFC PATCH 3/3] virtio-blk: introduce multiread List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Kevin Wolf Cc: famz@redhat.com, benoit@irqsave.net, ming.lei@canonical.com, armbru@redhat.com, qemu-devel@nongnu.org, stefanha@redhat.com, pbonzini@redhat.com, mreitz@redhat.com On 04.12.2014 16:03, Kevin Wolf wrote: > Am 04.12.2014 um 15:42 hat Peter Lieven geschrieben: >> On 04.12.2014 15:12, Kevin Wolf wrote: >>> Am 02.12.2014 um 15:33 hat Peter Lieven geschrieben: >>>> this patch finally introduce multiread support to virtio-blk while >>>> multiwrite support was there for a long time read support was missing. >>>> >>>> To achieve this the patch does serveral things which might need futher >>>> explaination: >>>> >>>> - the whole merge and multireq logic is moved from block.c into >>>> virtio-blk. This is move is a preparation for directly creating a >>>> coroutine out of virtio-blk. >>>> >>>> - requests are only merged if they are strictly sequential and no >>>> longer sorted. This simplification decreases overhead and reduces >>>> latency. It will also merge some requests which were unmergable before. >>>> >>>> The old algorithm took up to 32 requests sorted them and tried to merge >>>> them. The outcome was anything between 1 and 32 requests. In case of >>>> 32 requests there were 31 requests unnecessarily delayed. >>>> >>>> On the other hand lets imagine e.g. 16 unmergeable requests followed >>>> by 32 mergable requests. The latter 32 requests would have been split >>>> into two 16 byte requests. >>>> >>>> Last the simplified logic allows for a fast path if we have only a >>>> single request in the multirequest. In this case the request is sent as >>>> ordinary request without mulltireq callbacks. >>>> >>>> As a first benchmark I installed Ubuntu 14.04.1 on a ramdisk. The number of >>>> merged requests is in the same order while the latency is slightly decreased. >>>> One should not stick to much to the numbers because the number of wr_requests >>>> are highly fluctuant. I hope the numbers show that this patch is at least >>>> not causing to big harm: >>>> >>>> Cmdline: >>>> qemu-system-x86_64 -m 1024 -smp 2 -enable-kvm -cdrom ubuntu-14.04.1-server-amd64.iso \ >>>> -drive if=virtio,file=/tmp/ubuntu.raw -monitor stdio >>>> >>>> Before: >>>> virtio0: rd_bytes=150979072 wr_bytes=2591138816 rd_operations=18475 wr_operations=69216 >>>> flush_operations=15343 wr_total_time_ns=26969283701 rd_total_time_ns=1018449432 >>>> flush_total_time_ns=562596819 rd_merged=0 wr_merged=10453 >>>> >>>> After: >>>> virtio0: rd_bytes=148112896 wr_bytes=2845834240 rd_operations=17535 wr_operations=72197 >>>> flush_operations=15760 wr_total_time_ns=26104971623 rd_total_time_ns=1012926283 >>>> flush_total_time_ns=564414752 rd_merged=517 wr_merged=9859 >>>> >>>> Some first numbers of improved read performance while booting: >>>> >>>> The Ubuntu 14.04.1 vServer from above: >>>> virtio0: rd_bytes=86158336 wr_bytes=688128 rd_operations=5851 wr_operations=70 >>>> flush_operations=4 wr_total_time_ns=10886705 rd_total_time_ns=416688595 >>>> flush_total_time_ns=288776 rd_merged=1297 wr_merged=2 >>>> >>>> Windows 2012R2: >>>> virtio0: rd_bytes=176559104 wr_bytes=61859840 rd_operations=7200 wr_operations=360 >>>> flush_operations=68 wr_total_time_ns=34344992718 rd_total_time_ns=134386844669 >>>> flush_total_time_ns=18115517 rd_merged=641 wr_merged=216 >>>> >>>> Signed-off-by: Peter Lieven >>>> --- >>>> hw/block/dataplane/virtio-blk.c | 10 +- >>>> hw/block/virtio-blk.c | 222 ++++++++++++++++++++++++--------------- >>>> include/hw/virtio/virtio-blk.h | 23 ++-- >>>> 3 files changed, 156 insertions(+), 99 deletions(-) >>>> >>>> diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c >>>> index 1222a37..aa4ad91 100644 >>>> --- a/hw/block/dataplane/virtio-blk.c >>>> +++ b/hw/block/dataplane/virtio-blk.c >>>> @@ -96,9 +96,8 @@ static void handle_notify(EventNotifier *e) >>>> event_notifier_test_and_clear(&s->host_notifier); >>>> blk_io_plug(s->conf->conf.blk); >>>> for (;;) { >>>> - MultiReqBuffer mrb = { >>>> - .num_writes = 0, >>>> - }; >>>> + MultiReqBuffer mrb_rd = {}; >>>> + MultiReqBuffer mrb_wr = {.is_write = 1}; >>>> int ret; >>>> /* Disable guest->host notifies to avoid unnecessary vmexits */ >>>> @@ -117,10 +116,11 @@ static void handle_notify(EventNotifier *e) >>>> req->elem.in_num, >>>> req->elem.index); >>>> - virtio_blk_handle_request(req, &mrb); >>>> + virtio_blk_handle_request(req, &mrb_wr, &mrb_rd); >>>> } >>>> - virtio_submit_multiwrite(s->conf->conf.blk, &mrb); >>>> + virtio_submit_multireq(s->conf->conf.blk, &mrb_wr); >>>> + virtio_submit_multireq(s->conf->conf.blk, &mrb_rd); >>>> if (likely(ret == -EAGAIN)) { /* vring emptied */ >>>> /* Re-enable guest->host notifies and stop processing the vring. >>>> diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c >>>> index 490f961..f522bfc 100644 >>>> --- a/hw/block/virtio-blk.c >>>> +++ b/hw/block/virtio-blk.c >>>> @@ -22,12 +22,15 @@ >>>> #include "dataplane/virtio-blk.h" >>>> #include "migration/migration.h" >>>> #include "block/scsi.h" >>>> +#include "block/block_int.h" >>> No. :-) >>> >>> We could either expose the information that you need through >>> BlockBackend, or wait until your automatic request splitting logic is >>> implemented in block.c and the information isn't needed here any more. >> I will add sth to block.c and follow-up with the request splitting logic >> later. It will still make sense to now the limit here. Otherwise we >> merge something that we split again afterwards. > Yup, but then make it an official block layer interface instead of > including block_int.h in the device emulation. > >>>> #ifdef __linux__ >>>> # include >>>> #endif >>>> #include "hw/virtio/virtio-bus.h" >>>> #include "hw/virtio/virtio-access.h" >>>> +/* #define DEBUG_MULTIREQ */ >>>> + >>>> VirtIOBlockReq *virtio_blk_alloc_request(VirtIOBlock *s) >>>> { >>>> VirtIOBlockReq *req = g_slice_new(VirtIOBlockReq); >>>> @@ -88,6 +91,11 @@ static void virtio_blk_rw_complete(void *opaque, int ret) >>>> trace_virtio_blk_rw_complete(req, ret); >>>> +#ifdef DEBUG_MULTIREQ >>>> + printf("virtio_blk_rw_complete p %p ret %d\n", >>>> + req, ret); >>>> +#endif >>> In the final version, please use tracepoints instead of printfs. >> of course. This is just my debug code. >> >>>> if (ret) { >>>> int p = virtio_ldl_p(VIRTIO_DEVICE(req->dev), &req->out.type); >>>> bool is_read = !(p & VIRTIO_BLK_T_OUT); >>>> @@ -257,24 +265,63 @@ static void virtio_blk_handle_scsi(VirtIOBlockReq *req) >>>> virtio_blk_free_request(req); >>>> } >>>> -void virtio_submit_multiwrite(BlockBackend *blk, MultiReqBuffer *mrb) >>>> +static void virtio_multireq_cb(void *opaque, int ret) >>>> +{ >>>> + MultiReqBuffer *mrb = opaque; >>>> + int i; >>>> +#ifdef DEBUG_MULTIREQ >>>> + printf("virtio_multireq_cb: p %p sector_num %ld nb_sectors %d is_write %d num_reqs %d\n", >>>> + mrb, mrb->sector_num, mrb->nb_sectors, mrb->is_write, mrb->num_reqs); >>>> +#endif >>>> + for (i = 0; i < mrb->num_reqs; i++) { >>>> + virtio_blk_rw_complete(mrb->reqs[i], ret); >>>> + } >>>> + >>>> + qemu_iovec_destroy(&mrb->qiov); >>>> + g_free(mrb); >>>> +} >>>> + >>>> +void virtio_submit_multireq(BlockBackend *blk, MultiReqBuffer *mrb0) >>>> { >>>> - int i, ret; >>>> + MultiReqBuffer *mrb = NULL; >>>> - if (!mrb->num_writes) { >>>> + if (!mrb0->num_reqs) { >>>> return; >>>> } >>>> - ret = blk_aio_multiwrite(blk, mrb->blkreq, mrb->num_writes); >>>> - if (ret != 0) { >>>> - for (i = 0; i < mrb->num_writes; i++) { >>>> - if (mrb->blkreq[i].error) { >>>> - virtio_blk_rw_complete(mrb->blkreq[i].opaque, -EIO); >>>> - } >>>> + if (mrb0->num_reqs == 1) { >>>> + if (mrb0->is_write) { >>>> + blk_aio_writev(blk, mrb0->sector_num, &mrb0->reqs[0]->qiov, mrb0->nb_sectors, >>>> + virtio_blk_rw_complete, mrb0->reqs[0]); >>>> + } else { >>>> + blk_aio_readv(blk, mrb0->sector_num, &mrb0->reqs[0]->qiov, mrb0->nb_sectors, >>>> + virtio_blk_rw_complete, mrb0->reqs[0]); >>>> } >>>> + qemu_iovec_destroy(&mrb0->qiov); >>>> + mrb0->num_reqs = 0; >>>> + return; >>>> + } >>>> + >>>> + mrb = g_malloc(sizeof(MultiReqBuffer)); >>>> + memcpy(mrb, mrb0, sizeof(MultiReqBuffer)); >>>> + mrb0->num_reqs = 0; >>> We probably want to avoid allocations as much as we can. If at all >>> possible, we don't want to memcpy() either. >>> >>> Most parts of MultiReqBuffer don't seem to be actually needed during the >>> request and in the callback any more. Essentially, all that is needed is >>> the qiov and a way to find all the requests that belong to the same >>> MultiReqBuffer, so that they can all be completed. >>> >>> We could create a linked list by having a VirtIOBlockReq *next in the >>> request struct. For the qiov we can probably just use the field in the >>> first request (i.e. we extend the qiov of the first request). >> Thats a good idea and should work. I will look into this. >> But where would you store the merge qiov? >> >> It looks like I would need a small struct which holds the qiov and the pointer to req[0]. >> This can then be used as opaque value to the callback. > My thought was that you replace the original qiov of the first request > with the merged one. I don't think you need the original one any more > after merging the requests. Alternatively, you could use a separate > field in VirtIOBlockReq which stays unused in all but the first request. > > In any case the goal is to get rid of a dynamically allocated struct, no > matter how small, so opaque must be directly pointing to a request in > the end. Okay, that sounds doable. The old interface does this for any write even if it is not mergable at all. Peter > >>> With these requirements removed, MultiReqBuffer can live on the stack >>> and doesn't need to be copied to the heap. >>> >>> >>> With the additional patch that you send me, the next few lines read: >>> >>> + qemu_iovec_init(&mrb->qiov, mrb->num_reqs); >>> + >>> + for (i = 0; i < mrb->num_reqs; i++) { >>> + qemu_iovec_concat(&mrb->qiov, &mrb->reqs[i]->qiov, 0, mrb->reqs[i]->qiov.size); >>> + } >>> + assert(mrb->nb_sectors == mrb->qiov.size / BDRV_SECTOR_SIZE); >>> >>> The long line is more than 80 characters, but probably more importantly, >>> mrb->num_reqs is a really bad estimation for niov. If the guest uses any >>> vectored I/O, we're sure to get reallocations. What you really want to >>> use here is the old mrb->qiov.niov (which is an abuse to store in the >>> qiov, see below). >> Of course, my fault. I was so happy to have an exact value for the number >> of niov that I put in the wrong variable ;-) > Happens. :-) > >>> (I also wonder whether it would make sense to keep qiovs allocated when >>> putting VirtIOBlockReqs into the pool, but that's not for this series.) >>> >>>> + >>>> +#ifdef DEBUG_MULTIREQ >>>> + printf("virtio_submit_multireq: p %p sector_num %ld nb_sectors %d is_write %d num_reqs %d\n", >>>> + mrb, mrb->sector_num, mrb->nb_sectors, mrb->is_write, mrb->num_reqs); >>>> +#endif >>>> + >>>> + if (mrb->is_write) { >>>> + blk_aio_writev(blk, mrb->sector_num, &mrb->qiov, mrb->nb_sectors, >>>> + virtio_multireq_cb, mrb); >>>> + } else { >>>> + blk_aio_readv(blk, mrb->sector_num, &mrb->qiov, mrb->nb_sectors, >>>> + virtio_multireq_cb, mrb); >>>> } >>>> - mrb->num_writes = 0; >>>> + block_acct_merge_done(blk_get_stats(blk), >>>> + mrb->is_write ? BLOCK_ACCT_WRITE : BLOCK_ACCT_READ, >>>> + mrb->num_reqs - 1); >>>> } >>>> static void virtio_blk_handle_flush(VirtIOBlockReq *req, MultiReqBuffer *mrb) > Kevin -- Mit freundlichen Grüßen Peter Lieven ........................................................... KAMP Netzwerkdienste GmbH Vestische Str. 89-91 | 46117 Oberhausen Tel: +49 (0) 208.89 402-50 | Fax: +49 (0) 208.89 402-40 pl@kamp.de | http://www.kamp.de Geschäftsführer: Heiner Lante | Michael Lante Amtsgericht Duisburg | HRB Nr. 12154 USt-Id-Nr.: DE 120607556 ...........................................................