qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Peter Lieven <pl@kamp.de>
To: Kevin Wolf <kwolf@redhat.com>
Cc: famz@redhat.com, benoit@irqsave.net, ming.lei@canonical.com,
	armbru@redhat.com, qemu-devel@nongnu.org, stefanha@redhat.com,
	pbonzini@redhat.com, mreitz@redhat.com
Subject: Re: [Qemu-devel] [RFC PATCH 3/3] virtio-blk: introduce multiread
Date: Thu, 04 Dec 2014 16:11:13 +0100	[thread overview]
Message-ID: <54807991.3050007@kamp.de> (raw)
In-Reply-To: <20141204150302.GF5129@noname.redhat.com>

On 04.12.2014 16:03, Kevin Wolf wrote:
> Am 04.12.2014 um 15:42 hat Peter Lieven geschrieben:
>> On 04.12.2014 15:12, Kevin Wolf wrote:
>>> Am 02.12.2014 um 15:33 hat Peter Lieven geschrieben:
>>>> this patch finally introduce multiread support to virtio-blk while
>>>> multiwrite support was there for a long time read support was missing.
>>>>
>>>> To achieve this the patch does serveral things which might need futher
>>>> explaination:
>>>>
>>>>   - the whole merge and multireq logic is moved from block.c into
>>>>     virtio-blk. This is move is a preparation for directly creating a
>>>>     coroutine out of virtio-blk.
>>>>
>>>>   - requests are only merged if they are strictly sequential and no
>>>>     longer sorted. This simplification decreases overhead and reduces
>>>>     latency. It will also merge some requests which were unmergable before.
>>>>
>>>>     The old algorithm took up to 32 requests sorted them and tried to merge
>>>>     them. The outcome was anything between 1 and 32 requests. In case of
>>>>     32 requests there were 31 requests unnecessarily delayed.
>>>>
>>>>     On the other hand lets imagine e.g. 16 unmergeable requests followed
>>>>     by 32 mergable requests. The latter 32 requests would have been split
>>>>     into two 16 byte requests.
>>>>
>>>>     Last the simplified logic allows for a fast path if we have only a
>>>>     single request in the multirequest. In this case the request is sent as
>>>>     ordinary request without mulltireq callbacks.
>>>>
>>>> As a first benchmark I installed Ubuntu 14.04.1 on a ramdisk. The number of
>>>> merged requests is in the same order while the latency is slightly decreased.
>>>> One should not stick to much to the numbers because the number of wr_requests
>>>> are highly fluctuant. I hope the numbers show that this patch is at least
>>>> not causing to big harm:
>>>>
>>>> Cmdline:
>>>> qemu-system-x86_64 -m 1024 -smp 2 -enable-kvm -cdrom ubuntu-14.04.1-server-amd64.iso \
>>>>   -drive if=virtio,file=/tmp/ubuntu.raw -monitor stdio
>>>>
>>>> Before:
>>>> virtio0: rd_bytes=150979072 wr_bytes=2591138816 rd_operations=18475 wr_operations=69216
>>>>           flush_operations=15343 wr_total_time_ns=26969283701 rd_total_time_ns=1018449432
>>>>           flush_total_time_ns=562596819 rd_merged=0 wr_merged=10453
>>>>
>>>> After:
>>>> virtio0: rd_bytes=148112896 wr_bytes=2845834240 rd_operations=17535 wr_operations=72197
>>>>           flush_operations=15760 wr_total_time_ns=26104971623 rd_total_time_ns=1012926283
>>>>           flush_total_time_ns=564414752 rd_merged=517 wr_merged=9859
>>>>
>>>> Some first numbers of improved read performance while booting:
>>>>
>>>> The Ubuntu 14.04.1 vServer from above:
>>>> virtio0: rd_bytes=86158336 wr_bytes=688128 rd_operations=5851 wr_operations=70
>>>>           flush_operations=4 wr_total_time_ns=10886705 rd_total_time_ns=416688595
>>>>           flush_total_time_ns=288776 rd_merged=1297 wr_merged=2
>>>>
>>>> Windows 2012R2:
>>>> virtio0: rd_bytes=176559104 wr_bytes=61859840 rd_operations=7200 wr_operations=360
>>>>           flush_operations=68 wr_total_time_ns=34344992718 rd_total_time_ns=134386844669
>>>>           flush_total_time_ns=18115517 rd_merged=641 wr_merged=216
>>>>
>>>> Signed-off-by: Peter Lieven <pl@kamp.de>
>>>> ---
>>>>   hw/block/dataplane/virtio-blk.c |   10 +-
>>>>   hw/block/virtio-blk.c           |  222 ++++++++++++++++++++++++---------------
>>>>   include/hw/virtio/virtio-blk.h  |   23 ++--
>>>>   3 files changed, 156 insertions(+), 99 deletions(-)
>>>>
>>>> diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
>>>> index 1222a37..aa4ad91 100644
>>>> --- a/hw/block/dataplane/virtio-blk.c
>>>> +++ b/hw/block/dataplane/virtio-blk.c
>>>> @@ -96,9 +96,8 @@ static void handle_notify(EventNotifier *e)
>>>>       event_notifier_test_and_clear(&s->host_notifier);
>>>>       blk_io_plug(s->conf->conf.blk);
>>>>       for (;;) {
>>>> -        MultiReqBuffer mrb = {
>>>> -            .num_writes = 0,
>>>> -        };
>>>> +        MultiReqBuffer mrb_rd = {};
>>>> +        MultiReqBuffer mrb_wr = {.is_write = 1};
>>>>           int ret;
>>>>           /* Disable guest->host notifies to avoid unnecessary vmexits */
>>>> @@ -117,10 +116,11 @@ static void handle_notify(EventNotifier *e)
>>>>                                                           req->elem.in_num,
>>>>                                                           req->elem.index);
>>>> -            virtio_blk_handle_request(req, &mrb);
>>>> +            virtio_blk_handle_request(req, &mrb_wr, &mrb_rd);
>>>>           }
>>>> -        virtio_submit_multiwrite(s->conf->conf.blk, &mrb);
>>>> +        virtio_submit_multireq(s->conf->conf.blk, &mrb_wr);
>>>> +        virtio_submit_multireq(s->conf->conf.blk, &mrb_rd);
>>>>           if (likely(ret == -EAGAIN)) { /* vring emptied */
>>>>               /* Re-enable guest->host notifies and stop processing the vring.
>>>> diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
>>>> index 490f961..f522bfc 100644
>>>> --- a/hw/block/virtio-blk.c
>>>> +++ b/hw/block/virtio-blk.c
>>>> @@ -22,12 +22,15 @@
>>>>   #include "dataplane/virtio-blk.h"
>>>>   #include "migration/migration.h"
>>>>   #include "block/scsi.h"
>>>> +#include "block/block_int.h"
>>> No. :-)
>>>
>>> We could either expose the information that you need through
>>> BlockBackend, or wait until your automatic request splitting logic is
>>> implemented in block.c and the information isn't needed here any more.
>> I will add sth to block.c and follow-up with the request splitting logic
>> later. It will still make sense to now the limit here. Otherwise we
>> merge something that we split again afterwards.
> Yup, but then make it an official block layer interface instead of
> including block_int.h in the device emulation.
>
>>>>   #ifdef __linux__
>>>>   # include <scsi/sg.h>
>>>>   #endif
>>>>   #include "hw/virtio/virtio-bus.h"
>>>>   #include "hw/virtio/virtio-access.h"
>>>> +/* #define DEBUG_MULTIREQ */
>>>> +
>>>>   VirtIOBlockReq *virtio_blk_alloc_request(VirtIOBlock *s)
>>>>   {
>>>>       VirtIOBlockReq *req = g_slice_new(VirtIOBlockReq);
>>>> @@ -88,6 +91,11 @@ static void virtio_blk_rw_complete(void *opaque, int ret)
>>>>       trace_virtio_blk_rw_complete(req, ret);
>>>> +#ifdef DEBUG_MULTIREQ
>>>> +    printf("virtio_blk_rw_complete p %p ret %d\n",
>>>> +           req, ret);
>>>> +#endif
>>> In the final version, please use tracepoints instead of printfs.
>> of course. This is just my debug code.
>>
>>>>       if (ret) {
>>>>           int p = virtio_ldl_p(VIRTIO_DEVICE(req->dev), &req->out.type);
>>>>           bool is_read = !(p & VIRTIO_BLK_T_OUT);
>>>> @@ -257,24 +265,63 @@ static void virtio_blk_handle_scsi(VirtIOBlockReq *req)
>>>>       virtio_blk_free_request(req);
>>>>   }
>>>> -void virtio_submit_multiwrite(BlockBackend *blk, MultiReqBuffer *mrb)
>>>> +static void virtio_multireq_cb(void *opaque, int ret)
>>>> +{
>>>> +    MultiReqBuffer *mrb = opaque;
>>>> +    int i;
>>>> +#ifdef DEBUG_MULTIREQ
>>>> +    printf("virtio_multireq_cb: p %p sector_num %ld nb_sectors %d is_write %d num_reqs %d\n",
>>>> +           mrb, mrb->sector_num, mrb->nb_sectors, mrb->is_write, mrb->num_reqs);
>>>> +#endif
>>>> +    for (i = 0; i < mrb->num_reqs; i++) {
>>>> +        virtio_blk_rw_complete(mrb->reqs[i], ret);
>>>> +    }
>>>> +
>>>> +    qemu_iovec_destroy(&mrb->qiov);
>>>> +    g_free(mrb);
>>>> +}
>>>> +
>>>> +void virtio_submit_multireq(BlockBackend *blk, MultiReqBuffer *mrb0)
>>>>   {
>>>> -    int i, ret;
>>>> +    MultiReqBuffer *mrb = NULL;
>>>> -    if (!mrb->num_writes) {
>>>> +    if (!mrb0->num_reqs) {
>>>>           return;
>>>>       }
>>>> -    ret = blk_aio_multiwrite(blk, mrb->blkreq, mrb->num_writes);
>>>> -    if (ret != 0) {
>>>> -        for (i = 0; i < mrb->num_writes; i++) {
>>>> -            if (mrb->blkreq[i].error) {
>>>> -                virtio_blk_rw_complete(mrb->blkreq[i].opaque, -EIO);
>>>> -            }
>>>> +    if (mrb0->num_reqs == 1) {
>>>> +        if (mrb0->is_write) {
>>>> +            blk_aio_writev(blk, mrb0->sector_num, &mrb0->reqs[0]->qiov, mrb0->nb_sectors,
>>>> +                           virtio_blk_rw_complete, mrb0->reqs[0]);
>>>> +        } else {
>>>> +            blk_aio_readv(blk, mrb0->sector_num, &mrb0->reqs[0]->qiov, mrb0->nb_sectors,
>>>> +                          virtio_blk_rw_complete, mrb0->reqs[0]);
>>>>           }
>>>> +        qemu_iovec_destroy(&mrb0->qiov);
>>>> +        mrb0->num_reqs = 0;
>>>> +        return;
>>>> +    }
>>>> +
>>>> +    mrb = g_malloc(sizeof(MultiReqBuffer));
>>>> +    memcpy(mrb, mrb0, sizeof(MultiReqBuffer));
>>>> +    mrb0->num_reqs = 0;
>>> We probably want to avoid allocations as much as we can. If at all
>>> possible, we don't want to memcpy() either.
>>>
>>> Most parts of MultiReqBuffer don't seem to be actually needed during the
>>> request and in the callback any more. Essentially, all that is needed is
>>> the qiov and a way to find all the requests that belong to the same
>>> MultiReqBuffer, so that they can all be completed.
>>>
>>> We could create a linked list by having a VirtIOBlockReq *next in the
>>> request struct. For the qiov we can probably just use the field in the
>>> first request (i.e. we extend the qiov of the first request).
>> Thats a good idea and should work. I will look into this.
>> But where would you store the merge qiov?
>>
>> It looks like I would need a small struct which holds the qiov and the pointer to req[0].
>> This can then be used as opaque value to the callback.
> My thought was that you replace the original qiov of the first request
> with the merged one. I don't think you need the original one any more
> after merging the requests.  Alternatively, you could use a separate
> field in VirtIOBlockReq which stays unused in all but the first request.
>
> In any case the goal is to get rid of a dynamically allocated struct, no
> matter how small, so opaque must be directly pointing to a request in
> the end.

Okay, that sounds doable. The old interface does this for any write
even if it is not mergable at all.

Peter

>
>>> With these requirements removed, MultiReqBuffer can live on the stack
>>> and doesn't need to be copied to the heap.
>>>
>>>
>>> With the additional patch that you send me, the next few lines read:
>>>
>>> +    qemu_iovec_init(&mrb->qiov, mrb->num_reqs);
>>> +
>>> +    for (i = 0; i < mrb->num_reqs; i++) {
>>> +        qemu_iovec_concat(&mrb->qiov, &mrb->reqs[i]->qiov, 0, mrb->reqs[i]->qiov.size);
>>> +    }
>>> +    assert(mrb->nb_sectors == mrb->qiov.size / BDRV_SECTOR_SIZE);
>>>
>>> The long line is more than 80 characters, but probably more importantly,
>>> mrb->num_reqs is a really bad estimation for niov. If the guest uses any
>>> vectored I/O, we're sure to get reallocations. What you really want to
>>> use here is the old mrb->qiov.niov (which is an abuse to store in the
>>> qiov, see below).
>> Of course, my fault. I was so happy to have an exact value for the number
>> of niov that I put in the wrong variable ;-)
> Happens. :-)
>
>>> (I also wonder whether it would make sense to keep qiovs allocated when
>>> putting VirtIOBlockReqs into the pool, but that's not for this series.)
>>>
>>>> +
>>>> +#ifdef DEBUG_MULTIREQ
>>>> +    printf("virtio_submit_multireq: p %p sector_num %ld nb_sectors %d is_write %d num_reqs %d\n",
>>>> +           mrb, mrb->sector_num, mrb->nb_sectors, mrb->is_write, mrb->num_reqs);
>>>> +#endif
>>>> +
>>>> +    if (mrb->is_write) {
>>>> +        blk_aio_writev(blk, mrb->sector_num, &mrb->qiov, mrb->nb_sectors,
>>>> +                       virtio_multireq_cb, mrb);
>>>> +    } else {
>>>> +        blk_aio_readv(blk, mrb->sector_num, &mrb->qiov, mrb->nb_sectors,
>>>> +                      virtio_multireq_cb, mrb);
>>>>       }
>>>> -    mrb->num_writes = 0;
>>>> +    block_acct_merge_done(blk_get_stats(blk),
>>>> +                          mrb->is_write ? BLOCK_ACCT_WRITE : BLOCK_ACCT_READ,
>>>> +                          mrb->num_reqs - 1);
>>>>   }
>>>>   static void virtio_blk_handle_flush(VirtIOBlockReq *req, MultiReqBuffer *mrb)
> Kevin


-- 

Mit freundlichen Grüßen

Peter Lieven

...........................................................

   KAMP Netzwerkdienste GmbH
   Vestische Str. 89-91 | 46117 Oberhausen
   Tel: +49 (0) 208.89 402-50 | Fax: +49 (0) 208.89 402-40
   pl@kamp.de | http://www.kamp.de

   Geschäftsführer: Heiner Lante | Michael Lante
   Amtsgericht Duisburg | HRB Nr. 12154
   USt-Id-Nr.: DE 120607556

...........................................................

      reply	other threads:[~2014-12-04 15:11 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-12-02 14:33 [Qemu-devel] [RFC PATCH 0/3] virtio-blk: add multiread support Peter Lieven
2014-12-02 14:33 ` [Qemu-devel] [RFC PATCH 1/3] block: add accounting for merged requests Peter Lieven
2014-12-03 17:39   ` Eric Blake
2014-12-02 14:33 ` [Qemu-devel] [RFC PATCH 2/3] hw/virtio-blk: add a constant for max number of " Peter Lieven
2014-12-02 14:33 ` [Qemu-devel] [RFC PATCH 3/3] virtio-blk: introduce multiread Peter Lieven
2014-12-03 17:25   ` Max Reitz
2014-12-04  7:16     ` Peter Lieven
2014-12-04 14:12   ` Kevin Wolf
2014-12-04 14:42     ` Peter Lieven
2014-12-04 15:03       ` Kevin Wolf
2014-12-04 15:11         ` Peter Lieven [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54807991.3050007@kamp.de \
    --to=pl@kamp.de \
    --cc=armbru@redhat.com \
    --cc=benoit@irqsave.net \
    --cc=famz@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=ming.lei@canonical.com \
    --cc=mreitz@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).