All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Lieven <pl@kamp.de>
To: Kevin Wolf <kwolf@redhat.com>
Cc: famz@redhat.com, benoit@irqsave.net, ming.lei@canonical.com,
	armbru@redhat.com, qemu-devel@nongnu.org, stefanha@redhat.com,
	pbonzini@redhat.com, mreitz@redhat.com
Subject: Re: [Qemu-devel] [PATCH 4/4] virtio-blk: introduce multiread
Date: Mon, 15 Dec 2014 16:52:16 +0100	[thread overview]
Message-ID: <548F03B0.5000206@kamp.de> (raw)
In-Reply-To: <548F01A7.2020907@kamp.de>

On 15.12.2014 16:43, Peter Lieven wrote:
> On 15.12.2014 16:01, Kevin Wolf wrote:
>> Am 09.12.2014 um 17:26 hat Peter Lieven geschrieben:
>>> this patch finally introduces multiread support to virtio-blk. While
>>> multiwrite support was there for a long time, read support was missing.
>>>
>>> To achieve this the patch does several things which might need further
>>> explanation:
>>>
>>>   - the whole merge and multireq logic is moved from block.c into
>>>     virtio-blk. This is move is a preparation for directly creating a
>>>     coroutine out of virtio-blk.
>>>
>>>   - requests are only merged if they are strictly sequential, and no
>>>     longer sorted. This simplification decreases overhead and reduces
>>>     latency. It will also merge some requests which were unmergable before.
>>>
>>>     The old algorithm took up to 32 requests, sorted them and tried to merge
>>>     them. The outcome was anything between 1 and 32 requests. In case of
>>>     32 requests there were 31 requests unnecessarily delayed.
>>>
>>>     On the other hand let's imagine e.g. 16 unmergeable requests followed
>>>     by 32 mergable requests. The latter 32 requests would have been split
>>>     into two 16 byte requests.
>>>
>>>     Last the simplified logic allows for a fast path if we have only a
>>>     single request in the multirequest. In this case the request is sent as
>>>     ordinary request without multireq callbacks.
>>>
>>> As a first benchmark I installed Ubuntu 14.04.1 on a local SSD. The number of
>>> merged requests is in the same order while the write latency is obviously
>>> decreased by several percent.
>>>
>>> cmdline:
>>> qemu-system-x86_64 -m 1024 -smp 2 -enable-kvm -cdrom ubuntu-14.04.1-server-amd64.iso \
>>>   -drive if=virtio,file=/dev/ssd/ubuntu1404,aio=native,cache=none -monitor stdio
>>>
>>> Before:
>>> virtio0:
>>>   rd_bytes=151056896 wr_bytes=2683947008 rd_operations=18614 wr_operations=67979
>>>   flush_operations=15335 wr_total_time_ns=540428034217 rd_total_time_ns=11110520068
>>>   flush_total_time_ns=40673685006 rd_merged=0 wr_merged=15531
>>>
>>> After:
>>> virtio0:
>>>   rd_bytes=149487104 wr_bytes=2701344768 rd_operations=18148 wr_operations=68578
>>>   flush_operations=15368 wr_total_time_ns=437030089565 rd_total_time_ns=9836288815
>>>   flush_total_time_ns=40597981121 rd_merged=690 wr_merged=14615
>>>
>>> Some first numbers of improved read performance while booting:
>>>
>>> The Ubuntu 14.04.1 vServer from above:
>>> virtio0:
>>>   rd_bytes=97545216 wr_bytes=119808 rd_operations=5071 wr_operations=26
>>>   flush_operations=2 wr_total_time_ns=8847669 rd_total_time_ns=13952575478
>>>   flush_total_time_ns=3075496 rd_merged=742 wr_merged=0
>>>
>>> Windows 2012R2 (booted from iSCSI):
>>> virtio0: rd_bytes=176559104 wr_bytes=61859840 rd_operations=7200 wr_operations=360
>>>   flush_operations=68 wr_total_time_ns=34344992718 rd_total_time_ns=134386844669
>>>   flush_total_time_ns=18115517 rd_merged=641 wr_merged=216
>>>
>>> Signed-off-by: Peter Lieven <pl@kamp.de>
>> Looks pretty good. The only thing I'm still unsure about are possible
>> integer overflows in the merging logic. Maybe you can have another look
>> there (ideally not only the places I commented on below, but the whole
>> function).
>>
>>> @@ -414,14 +402,81 @@ void virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
>>>           iov_from_buf(in_iov, in_num, 0, serial, size);
>>>           virtio_blk_req_complete(req, VIRTIO_BLK_S_OK);
>>>           virtio_blk_free_request(req);
>>> -    } else if (type & VIRTIO_BLK_T_OUT) {
>>> -        qemu_iovec_init_external(&req->qiov, iov, out_num);
>>> -        virtio_blk_handle_write(req, mrb);
>>> -    } else if (type == VIRTIO_BLK_T_IN || type == VIRTIO_BLK_T_BARRIER) {
>>> -        /* VIRTIO_BLK_T_IN is 0, so we can't just & it. */
>>> -        qemu_iovec_init_external(&req->qiov, in_iov, in_num);
>>> -        virtio_blk_handle_read(req);
>>> -    } else {
>>> +        break;
>>> +    }
>>> +    case VIRTIO_BLK_T_IN:
>>> +    case VIRTIO_BLK_T_OUT:
>>> +    {
>>> +        bool is_write = type & VIRTIO_BLK_T_OUT;
>>> +        int64_t sector_num = virtio_ldq_p(VIRTIO_DEVICE(req->dev),
>>> + &req->out.sector);
>>> +        int max_transfer_length = blk_get_max_transfer_length(req->dev->blk);
>>> +        int nb_sectors = 0;
>>> +        bool merge = true;
>>> +
>>> +        if (!virtio_blk_sect_range_ok(req->dev, sector_num, req->qiov.size)) {
>>> +            virtio_blk_req_complete(req, VIRTIO_BLK_S_IOERR);
>>> +            virtio_blk_free_request(req);
>>> +            return;
>>> +        }
>>> +
>>> +        if (is_write) {
>>> +            qemu_iovec_init_external(&req->qiov, iov, out_num);
>>> +            trace_virtio_blk_handle_write(req, sector_num,
>>> +                                          req->qiov.size / BDRV_SECTOR_SIZE);
>>> +        } else {
>>> +            qemu_iovec_init_external(&req->qiov, in_iov, in_num);
>>> +            trace_virtio_blk_handle_read(req, sector_num,
>>> +                                         req->qiov.size / BDRV_SECTOR_SIZE);
>>> +        }
>>> +
>>> +        nb_sectors = req->qiov.size / BDRV_SECTOR_SIZE;
>> qiov.size is controlled by the guest, and nb_sectors is only an int. Are
>> you sure that this can't overflow?
>
> In theory, yes. For this to happen in_iov or iov needs to contain
> 2TB of data on 32-bit systems. But theoretically there could
> also be already an overflow in qemu_iovec_init_external where
> multiple size_t are summed up in a size_t.
>
> There has been no overflow checking in the merge routine in
> the past, but if you feel better, we could add sth like this:
>
> diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
> index cc0076a..e9236da 100644
> --- a/hw/block/virtio-blk.c
> +++ b/hw/block/virtio-blk.c
> @@ -410,8 +410,8 @@ void virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
>          bool is_write = type & VIRTIO_BLK_T_OUT;
>          int64_t sector_num = virtio_ldq_p(VIRTIO_DEVICE(req->dev),
> &req->out.sector);
> -        int max_transfer_length = blk_get_max_transfer_length(req->dev->blk);
> -        int nb_sectors = 0;
> +        int64_t max_transfer_length = blk_get_max_transfer_length(req->dev->blk);
> +        int64_t nb_sectors = 0;
>          bool merge = true;
>
>          if (!virtio_blk_sect_range_ok(req->dev, sector_num, req->qiov.size)) {
> @@ -431,6 +431,7 @@ void virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
>          }
>
>          nb_sectors = req->qiov.size / BDRV_SECTOR_SIZE;
> +        max_transfer_length = MIN_NON_ZERO(max_transfer_length, INT_MAX);
>
>          block_acct_start(blk_get_stats(req->dev->blk),
>                           &req->acct, req->qiov.size,
> @@ -443,8 +444,7 @@ void virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
>          }
>
>          /* merge would exceed maximum transfer length of backend device */
> -        if (max_transfer_length &&
> -            mrb->nb_sectors + nb_sectors > max_transfer_length) {
> +        if (nb_sectors + mrb->nb_sectors > max_transfer_length) {
>              merge = false;
>          }
>

May also this here:

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index cc0076a..fa647b6 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -333,6 +333,9 @@ static bool virtio_blk_sect_range_ok(VirtIOBlock *dev,
      uint64_t nb_sectors = size >> BDRV_SECTOR_BITS;
      uint64_t total_sectors;

+    if (nb_sectors > INT_MAX) {
+        return false;
+    }
      if (sector & dev->sector_mask) {
          return false;
      }


Thats something that has not been checked for ages as well.

Peter

  reply	other threads:[~2014-12-15 15:52 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-12-09 16:26 [Qemu-devel] [PATCH 0/4] virtio-blk: add multiread support Peter Lieven
2014-12-09 16:26 ` [Qemu-devel] [PATCH 1/4] block: add accounting for merged requests Peter Lieven
2014-12-09 16:26 ` [Qemu-devel] [PATCH 2/4] hw/virtio-blk: add a constant for max number of " Peter Lieven
2014-12-10  4:54   ` Fam Zheng
2014-12-09 16:26 ` [Qemu-devel] [PATCH 3/4] block-backend: expose bs->bl.max_transfer_length Peter Lieven
2014-12-09 16:26 ` [Qemu-devel] [PATCH 4/4] virtio-blk: introduce multiread Peter Lieven
2014-12-10  7:48   ` Fam Zheng
2014-12-11 13:07     ` Peter Lieven
2014-12-15 15:01   ` Kevin Wolf
2014-12-15 15:43     ` Peter Lieven
2014-12-15 15:52       ` Peter Lieven [this message]
2014-12-15 16:00         ` Kevin Wolf
2014-12-15 16:02           ` Peter Lieven
2014-12-15 15:57       ` Kevin Wolf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=548F03B0.5000206@kamp.de \
    --to=pl@kamp.de \
    --cc=armbru@redhat.com \
    --cc=benoit@irqsave.net \
    --cc=famz@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=ming.lei@canonical.com \
    --cc=mreitz@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.