qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Max Reitz <mreitz@redhat.com>
To: Peter Lieven <pl@kamp.de>, qemu-devel@nongnu.org
Cc: kwolf@redhat.com, famz@redhat.com, benoit@irqsave.net,
	ming.lei@canonical.com, armbru@redhat.com, stefanha@redhat.com,
	pbonzini@redhat.com
Subject: Re: [Qemu-devel] [RFC PATCH 3/3] virtio-blk: introduce multiread
Date: Wed, 03 Dec 2014 18:25:20 +0100	[thread overview]
Message-ID: <547F4780.60405@redhat.com> (raw)
In-Reply-To: <1417530788-15172-4-git-send-email-pl@kamp.de>

On 2014-12-02 at 15:33, Peter Lieven wrote:
> this patch finally introduce multiread support to virtio-blk while
> multiwrite support was there for a long time read support was missing.
>
> To achieve this the patch does serveral things which might need futher
> explaination:
>
>   - the whole merge and multireq logic is moved from block.c into
>     virtio-blk. This is move is a preparation for directly creating a
>     coroutine out of virtio-blk.
>
>   - requests are only merged if they are strictly sequential and no
>     longer sorted. This simplification decreases overhead and reduces
>     latency. It will also merge some requests which were unmergable before.
>
>     The old algorithm took up to 32 requests sorted them and tried to merge
>     them. The outcome was anything between 1 and 32 requests. In case of
>     32 requests there were 31 requests unnecessarily delayed.
>
>     On the other hand lets imagine e.g. 16 unmergeable requests followed
>     by 32 mergable requests. The latter 32 requests would have been split
>     into two 16 byte requests.
>
>     Last the simplified logic allows for a fast path if we have only a
>     single request in the multirequest. In this case the request is sent as
>     ordinary request without mulltireq callbacks.
>
> As a first benchmark I installed Ubuntu 14.04.1 on a ramdisk. The number of
> merged requests is in the same order while the latency is slightly decreased.
> One should not stick to much to the numbers because the number of wr_requests
> are highly fluctuant. I hope the numbers show that this patch is at least
> not causing to big harm:
>
> Cmdline:
> qemu-system-x86_64 -m 1024 -smp 2 -enable-kvm -cdrom ubuntu-14.04.1-server-amd64.iso \
>   -drive if=virtio,file=/tmp/ubuntu.raw -monitor stdio
>
> Before:
> virtio0: rd_bytes=150979072 wr_bytes=2591138816 rd_operations=18475 wr_operations=69216
>           flush_operations=15343 wr_total_time_ns=26969283701 rd_total_time_ns=1018449432
>           flush_total_time_ns=562596819 rd_merged=0 wr_merged=10453
>
> After:
> virtio0: rd_bytes=148112896 wr_bytes=2845834240 rd_operations=17535 wr_operations=72197
>           flush_operations=15760 wr_total_time_ns=26104971623 rd_total_time_ns=1012926283
>           flush_total_time_ns=564414752 rd_merged=517 wr_merged=9859
>
> Some first numbers of improved read performance while booting:
>
> The Ubuntu 14.04.1 vServer from above:
> virtio0: rd_bytes=86158336 wr_bytes=688128 rd_operations=5851 wr_operations=70
>           flush_operations=4 wr_total_time_ns=10886705 rd_total_time_ns=416688595
>           flush_total_time_ns=288776 rd_merged=1297 wr_merged=2
>
> Windows 2012R2:
> virtio0: rd_bytes=176559104 wr_bytes=61859840 rd_operations=7200 wr_operations=360
>           flush_operations=68 wr_total_time_ns=34344992718 rd_total_time_ns=134386844669
>           flush_total_time_ns=18115517 rd_merged=641 wr_merged=216
>
> Signed-off-by: Peter Lieven <pl@kamp.de>
> ---
>   hw/block/dataplane/virtio-blk.c |   10 +-
>   hw/block/virtio-blk.c           |  222 ++++++++++++++++++++++++---------------
>   include/hw/virtio/virtio-blk.h  |   23 ++--
>   3 files changed, 156 insertions(+), 99 deletions(-)

So, since you CC-ed me, I guess you're expecting a reply from me. I feel 
like I'm saying this more often recently (about anything, not just about 
virtio), but I don't know anything about virtio, so you should take 
anything from me with a not just a grain but more of a pack of salt.

> diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
> index 1222a37..aa4ad91 100644
> --- a/hw/block/dataplane/virtio-blk.c
> +++ b/hw/block/dataplane/virtio-blk.c
> @@ -96,9 +96,8 @@ static void handle_notify(EventNotifier *e)
>       event_notifier_test_and_clear(&s->host_notifier);
>       blk_io_plug(s->conf->conf.blk);
>       for (;;) {
> -        MultiReqBuffer mrb = {
> -            .num_writes = 0,
> -        };
> +        MultiReqBuffer mrb_rd = {};
> +        MultiReqBuffer mrb_wr = {.is_write = 1};
>           int ret;
>   
>           /* Disable guest->host notifies to avoid unnecessary vmexits */
> @@ -117,10 +116,11 @@ static void handle_notify(EventNotifier *e)
>                                                           req->elem.in_num,
>                                                           req->elem.index);
>   
> -            virtio_blk_handle_request(req, &mrb);
> +            virtio_blk_handle_request(req, &mrb_wr, &mrb_rd);
>           }
>   
> -        virtio_submit_multiwrite(s->conf->conf.blk, &mrb);
> +        virtio_submit_multireq(s->conf->conf.blk, &mrb_wr);
> +        virtio_submit_multireq(s->conf->conf.blk, &mrb_rd);
>   
>           if (likely(ret == -EAGAIN)) { /* vring emptied */
>               /* Re-enable guest->host notifies and stop processing the vring.
> diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
> index 490f961..f522bfc 100644
> --- a/hw/block/virtio-blk.c
> +++ b/hw/block/virtio-blk.c
> @@ -22,12 +22,15 @@
>   #include "dataplane/virtio-blk.h"
>   #include "migration/migration.h"
>   #include "block/scsi.h"
> +#include "block/block_int.h"
>   #ifdef __linux__
>   # include <scsi/sg.h>
>   #endif
>   #include "hw/virtio/virtio-bus.h"
>   #include "hw/virtio/virtio-access.h"
>   
> +/* #define DEBUG_MULTIREQ */
> +
>   VirtIOBlockReq *virtio_blk_alloc_request(VirtIOBlock *s)
>   {
>       VirtIOBlockReq *req = g_slice_new(VirtIOBlockReq);
> @@ -88,6 +91,11 @@ static void virtio_blk_rw_complete(void *opaque, int ret)
>   
>       trace_virtio_blk_rw_complete(req, ret);
>   
> +#ifdef DEBUG_MULTIREQ
> +    printf("virtio_blk_rw_complete p %p ret %d\n",
> +           req, ret);
> +#endif
> +
>       if (ret) {
>           int p = virtio_ldl_p(VIRTIO_DEVICE(req->dev), &req->out.type);
>           bool is_read = !(p & VIRTIO_BLK_T_OUT);
> @@ -257,24 +265,63 @@ static void virtio_blk_handle_scsi(VirtIOBlockReq *req)
>       virtio_blk_free_request(req);
>   }
>   
> -void virtio_submit_multiwrite(BlockBackend *blk, MultiReqBuffer *mrb)
> +static void virtio_multireq_cb(void *opaque, int ret)
> +{
> +    MultiReqBuffer *mrb = opaque;
> +    int i;
> +#ifdef DEBUG_MULTIREQ
> +    printf("virtio_multireq_cb: p %p sector_num %ld nb_sectors %d is_write %d num_reqs %d\n",
> +           mrb, mrb->sector_num, mrb->nb_sectors, mrb->is_write, mrb->num_reqs);
> +#endif
> +    for (i = 0; i < mrb->num_reqs; i++) {
> +        virtio_blk_rw_complete(mrb->reqs[i], ret);
> +    }
> +
> +    qemu_iovec_destroy(&mrb->qiov);
> +    g_free(mrb);
> +}
> +
> +void virtio_submit_multireq(BlockBackend *blk, MultiReqBuffer *mrb0)
>   {
> -    int i, ret;
> +    MultiReqBuffer *mrb = NULL;
>   
> -    if (!mrb->num_writes) {
> +    if (!mrb0->num_reqs) {
>           return;
>       }
>   
> -    ret = blk_aio_multiwrite(blk, mrb->blkreq, mrb->num_writes);
> -    if (ret != 0) {
> -        for (i = 0; i < mrb->num_writes; i++) {
> -            if (mrb->blkreq[i].error) {
> -                virtio_blk_rw_complete(mrb->blkreq[i].opaque, -EIO);
> -            }
> +    if (mrb0->num_reqs == 1) {
> +        if (mrb0->is_write) {
> +            blk_aio_writev(blk, mrb0->sector_num, &mrb0->reqs[0]->qiov, mrb0->nb_sectors,
> +                           virtio_blk_rw_complete, mrb0->reqs[0]);
> +        } else {
> +            blk_aio_readv(blk, mrb0->sector_num, &mrb0->reqs[0]->qiov, mrb0->nb_sectors,
> +                          virtio_blk_rw_complete, mrb0->reqs[0]);
>           }
> +        qemu_iovec_destroy(&mrb0->qiov);
> +        mrb0->num_reqs = 0;
> +        return;
> +    }
> +
> +    mrb = g_malloc(sizeof(MultiReqBuffer));
> +    memcpy(mrb, mrb0, sizeof(MultiReqBuffer));
> +    mrb0->num_reqs = 0;

I guess you're making mrb0->num_reqs == 1 a special case so you don't 
have to duplicate mrb0 in that case. But why are you duplicating it at 
all? And you're just moving over the IO vector without duplicating 
mrb->qiov.iov, which seems risky to me considering the 
qemu_iovec_destroy() in virtio_multireq_cb().

I could imagine you're duplicating it so the request can run in the 
background while you're again filling up the MultiReqBuffer, but that 
doesn't fit together with not duplicating it in the case of 
mrb0->num_reqs == 1 which can run in the background just the same.

> +
> +#ifdef DEBUG_MULTIREQ
> +    printf("virtio_submit_multireq: p %p sector_num %ld nb_sectors %d is_write %d num_reqs %d\n",
> +           mrb, mrb->sector_num, mrb->nb_sectors, mrb->is_write, mrb->num_reqs);
> +#endif
> +
> +    if (mrb->is_write) {
> +        blk_aio_writev(blk, mrb->sector_num, &mrb->qiov, mrb->nb_sectors,
> +                       virtio_multireq_cb, mrb);
> +    } else {
> +        blk_aio_readv(blk, mrb->sector_num, &mrb->qiov, mrb->nb_sectors,
> +                      virtio_multireq_cb, mrb);
>       }
>   
> -    mrb->num_writes = 0;
> +    block_acct_merge_done(blk_get_stats(blk),
> +                          mrb->is_write ? BLOCK_ACCT_WRITE : BLOCK_ACCT_READ,
> +                          mrb->num_reqs - 1);
>   }
>   
>   static void virtio_blk_handle_flush(VirtIOBlockReq *req, MultiReqBuffer *mrb)
> @@ -283,9 +330,9 @@ static void virtio_blk_handle_flush(VirtIOBlockReq *req, MultiReqBuffer *mrb)
>                        BLOCK_ACCT_FLUSH);
>   
>       /*
> -     * Make sure all outstanding writes are posted to the backing device.
> +     * Make sure all outstanding requests are posted to the backing device.
>        */
> -    virtio_submit_multiwrite(req->dev->blk, mrb);
> +    virtio_submit_multireq(req->dev->blk, mrb);
>       blk_aio_flush(req->dev->blk, virtio_blk_flush_complete, req);
>   }
>   
> @@ -308,61 +355,8 @@ static bool virtio_blk_sect_range_ok(VirtIOBlock *dev,
>       return true;
>   }
>   
> -static void virtio_blk_handle_write(VirtIOBlockReq *req, MultiReqBuffer *mrb)
> -{
> -    BlockRequest *blkreq;
> -    uint64_t sector;
> -
> -    sector = virtio_ldq_p(VIRTIO_DEVICE(req->dev), &req->out.sector);
> -
> -    trace_virtio_blk_handle_write(req, sector, req->qiov.size / 512);
> -
> -    if (!virtio_blk_sect_range_ok(req->dev, sector, req->qiov.size)) {
> -        virtio_blk_req_complete(req, VIRTIO_BLK_S_IOERR);
> -        virtio_blk_free_request(req);
> -        return;
> -    }
> -
> -    block_acct_start(blk_get_stats(req->dev->blk), &req->acct, req->qiov.size,
> -                     BLOCK_ACCT_WRITE);
> -
> -    if (mrb->num_writes == VIRTIO_BLK_MAX_MERGE_REQS) {
> -        virtio_submit_multiwrite(req->dev->blk, mrb);
> -    }
> -
> -    blkreq = &mrb->blkreq[mrb->num_writes];
> -    blkreq->sector = sector;
> -    blkreq->nb_sectors = req->qiov.size / BDRV_SECTOR_SIZE;
> -    blkreq->qiov = &req->qiov;
> -    blkreq->cb = virtio_blk_rw_complete;
> -    blkreq->opaque = req;
> -    blkreq->error = 0;
> -
> -    mrb->num_writes++;
> -}
> -
> -static void virtio_blk_handle_read(VirtIOBlockReq *req)
> -{
> -    uint64_t sector;
> -
> -    sector = virtio_ldq_p(VIRTIO_DEVICE(req->dev), &req->out.sector);
> -
> -    trace_virtio_blk_handle_read(req, sector, req->qiov.size / 512);
> -
> -    if (!virtio_blk_sect_range_ok(req->dev, sector, req->qiov.size)) {
> -        virtio_blk_req_complete(req, VIRTIO_BLK_S_IOERR);
> -        virtio_blk_free_request(req);
> -        return;
> -    }
> -
> -    block_acct_start(blk_get_stats(req->dev->blk), &req->acct, req->qiov.size,
> -                     BLOCK_ACCT_READ);
> -    blk_aio_readv(req->dev->blk, sector, &req->qiov,
> -                  req->qiov.size / BDRV_SECTOR_SIZE,
> -                  virtio_blk_rw_complete, req);
> -}
> -
> -void virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
> +void virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb_wr,
> +                               MultiReqBuffer *mrb_rd)
>   {
>       uint32_t type;
>       struct iovec *in_iov = req->elem.in_sg;
> @@ -397,7 +391,7 @@ void virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
>       type = virtio_ldl_p(VIRTIO_DEVICE(req->dev), &req->out.type);
>   
>       if (type & VIRTIO_BLK_T_FLUSH) {
> -        virtio_blk_handle_flush(req, mrb);
> +        virtio_blk_handle_flush(req, mrb_wr);
>       } else if (type & VIRTIO_BLK_T_SCSI_CMD) {
>           virtio_blk_handle_scsi(req);
>       } else if (type & VIRTIO_BLK_T_GET_ID) {
> @@ -414,13 +408,71 @@ void virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
>           iov_from_buf(in_iov, in_num, 0, serial, size);
>           virtio_blk_req_complete(req, VIRTIO_BLK_S_OK);
>           virtio_blk_free_request(req);
> -    } else if (type & VIRTIO_BLK_T_OUT) {
> -        qemu_iovec_init_external(&req->qiov, iov, out_num);
> -        virtio_blk_handle_write(req, mrb);
> -    } else if (type == VIRTIO_BLK_T_IN || type == VIRTIO_BLK_T_BARRIER) {
> -        /* VIRTIO_BLK_T_IN is 0, so we can't just & it. */
> -        qemu_iovec_init_external(&req->qiov, in_iov, in_num);
> -        virtio_blk_handle_read(req);
> +    } else if (type & VIRTIO_BLK_T_OUT || type == VIRTIO_BLK_T_IN || type == VIRTIO_BLK_T_BARRIER) {

Please don't omit the comment about VIRTIO_BLK_T_IN being 0, I spend 
quite a while thinking about why VIRTIO_BLK_T_BARRIER is seemingly 
handled just the same as VIRTIO_BLK_T_IN. I still don't know what 
VIRTIO_BLK_T_BARRIER is supposed to do (and I don't really want to grab 
the virtio spec right now, although I should do it some time...), but 
now it at least seems to me like it doesn't really matter.

> +        bool is_write = type & VIRTIO_BLK_T_OUT;
> +        MultiReqBuffer *mrb = is_write ? mrb_wr : mrb_rd;
> +        int64_t sector_num = virtio_ldq_p(VIRTIO_DEVICE(req->dev), &req->out.sector);
> +        BlockDriverState *bs = blk_bs(req->dev->blk);
> +        int nb_sectors = 0;
> +        int merge = 1;

I shouldn't be criticizing such small issues in an RFC, but it's the 
most I can do for a virtio-blk series: I'd like this to be a bool 
instead of an int, please.

Judging from the not-so-quick-in-quantity-but-pretty-brief-in-quality 
scan through this RFC, it looks OK to me. There are some minor things 
(e.g. "too" instead of "to" in the commit message) which should be fixed 
eventually, but logic-wise, I didn't spot anything major.

Perhaps we could put the merge logic into an own function, though, just 
like bdrv_aio_multiwrite() did with multiwrite_merge().

I trust you that overlapping requests are not an issue, at least when 
not sorting the requests. Other than that, nothing bad from me. The 
patch looks good, but I don't want to be too optimistic, as I don't want 
to definitely judge anything related to virtio-blk. :-)

(Apart from the first two patches of this series, they were completely 
fine, but I seem to remember having them reviewed once already...)

Max

  reply	other threads:[~2014-12-03 17:25 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-12-02 14:33 [Qemu-devel] [RFC PATCH 0/3] virtio-blk: add multiread support Peter Lieven
2014-12-02 14:33 ` [Qemu-devel] [RFC PATCH 1/3] block: add accounting for merged requests Peter Lieven
2014-12-03 17:39   ` Eric Blake
2014-12-02 14:33 ` [Qemu-devel] [RFC PATCH 2/3] hw/virtio-blk: add a constant for max number of " Peter Lieven
2014-12-02 14:33 ` [Qemu-devel] [RFC PATCH 3/3] virtio-blk: introduce multiread Peter Lieven
2014-12-03 17:25   ` Max Reitz [this message]
2014-12-04  7:16     ` Peter Lieven
2014-12-04 14:12   ` Kevin Wolf
2014-12-04 14:42     ` Peter Lieven
2014-12-04 15:03       ` Kevin Wolf
2014-12-04 15:11         ` Peter Lieven

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=547F4780.60405@redhat.com \
    --to=mreitz@redhat.com \
    --cc=armbru@redhat.com \
    --cc=benoit@irqsave.net \
    --cc=famz@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=ming.lei@canonical.com \
    --cc=pbonzini@redhat.com \
    --cc=pl@kamp.de \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).