From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:49079)
	by lists.gnu.org with esmtp (Exim 4.71) (envelope-from <pl@kamp.de>)
	id 1Y0Xwk-0005tR-BF
	for qemu-devel@nongnu.org; Mon, 15 Dec 2014 10:52:35 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <pl@kamp.de>) id 1Y0Xwe-0000Eb-Bb
	for qemu-devel@nongnu.org; Mon, 15 Dec 2014 10:52:30 -0500
Received: from mx-v6.kamp.de ([2a02:248:0:51::16]:38768 helo=mx01.kamp.de)
	by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <pl@kamp.de>)
	id 1Y0Xwd-0000Cs-Tm
	for qemu-devel@nongnu.org; Mon, 15 Dec 2014 10:52:24 -0500
Message-ID: <548F03B0.5000206@kamp.de>
Date: Mon, 15 Dec 2014 16:52:16 +0100
From: Peter Lieven <pl@kamp.de>
MIME-Version: 1.0
References: <1418142410-19057-1-git-send-email-pl@kamp.de>
	<1418142410-19057-5-git-send-email-pl@kamp.de>
	<20141215150107.GK4411@noname.str.redhat.com>
	<548F01A7.2020907@kamp.de>
In-Reply-To: <548F01A7.2020907@kamp.de>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [PATCH 4/4] virtio-blk: introduce multiread
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Kevin Wolf <kwolf@redhat.com>
Cc: famz@redhat.com, benoit@irqsave.net, ming.lei@canonical.com, armbru@redhat.com, qemu-devel@nongnu.org, stefanha@redhat.com, pbonzini@redhat.com, mreitz@redhat.com

On 15.12.2014 16:43, Peter Lieven wrote:
> On 15.12.2014 16:01, Kevin Wolf wrote:
>> Am 09.12.2014 um 17:26 hat Peter Lieven geschrieben:
>>> this patch finally introduces multiread support to virtio-blk. While
>>> multiwrite support was there for a long time, read support was missing.
>>>
>>> To achieve this the patch does several things which might need further
>>> explanation:
>>>
>>>   - the whole merge and multireq logic is moved from block.c into
>>>     virtio-blk. This is move is a preparation for directly creating a
>>>     coroutine out of virtio-blk.
>>>
>>>   - requests are only merged if they are strictly sequential, and no
>>>     longer sorted. This simplification decreases overhead and reduces
>>>     latency. It will also merge some requests which were unmergable before.
>>>
>>>     The old algorithm took up to 32 requests, sorted them and tried to merge
>>>     them. The outcome was anything between 1 and 32 requests. In case of
>>>     32 requests there were 31 requests unnecessarily delayed.
>>>
>>>     On the other hand let's imagine e.g. 16 unmergeable requests followed
>>>     by 32 mergable requests. The latter 32 requests would have been split
>>>     into two 16 byte requests.
>>>
>>>     Last the simplified logic allows for a fast path if we have only a
>>>     single request in the multirequest. In this case the request is sent as
>>>     ordinary request without multireq callbacks.
>>>
>>> As a first benchmark I installed Ubuntu 14.04.1 on a local SSD. The number of
>>> merged requests is in the same order while the write latency is obviously
>>> decreased by several percent.
>>>
>>> cmdline:
>>> qemu-system-x86_64 -m 1024 -smp 2 -enable-kvm -cdrom ubuntu-14.04.1-server-amd64.iso \
>>>   -drive if=virtio,file=/dev/ssd/ubuntu1404,aio=native,cache=none -monitor stdio
>>>
>>> Before:
>>> virtio0:
>>>   rd_bytes=151056896 wr_bytes=2683947008 rd_operations=18614 wr_operations=67979
>>>   flush_operations=15335 wr_total_time_ns=540428034217 rd_total_time_ns=11110520068
>>>   flush_total_time_ns=40673685006 rd_merged=0 wr_merged=15531
>>>
>>> After:
>>> virtio0:
>>>   rd_bytes=149487104 wr_bytes=2701344768 rd_operations=18148 wr_operations=68578
>>>   flush_operations=15368 wr_total_time_ns=437030089565 rd_total_time_ns=9836288815
>>>   flush_total_time_ns=40597981121 rd_merged=690 wr_merged=14615
>>>
>>> Some first numbers of improved read performance while booting:
>>>
>>> The Ubuntu 14.04.1 vServer from above:
>>> virtio0:
>>>   rd_bytes=97545216 wr_bytes=119808 rd_operations=5071 wr_operations=26
>>>   flush_operations=2 wr_total_time_ns=8847669 rd_total_time_ns=13952575478
>>>   flush_total_time_ns=3075496 rd_merged=742 wr_merged=0
>>>
>>> Windows 2012R2 (booted from iSCSI):
>>> virtio0: rd_bytes=176559104 wr_bytes=61859840 rd_operations=7200 wr_operations=360
>>>   flush_operations=68 wr_total_time_ns=34344992718 rd_total_time_ns=134386844669
>>>   flush_total_time_ns=18115517 rd_merged=641 wr_merged=216
>>>
>>> Signed-off-by: Peter Lieven <pl@kamp.de>
>> Looks pretty good. The only thing I'm still unsure about are possible
>> integer overflows in the merging logic. Maybe you can have another look
>> there (ideally not only the places I commented on below, but the whole
>> function).
>>
>>> @@ -414,14 +402,81 @@ void virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
>>>           iov_from_buf(in_iov, in_num, 0, serial, size);
>>>           virtio_blk_req_complete(req, VIRTIO_BLK_S_OK);
>>>           virtio_blk_free_request(req);
>>> -    } else if (type & VIRTIO_BLK_T_OUT) {
>>> -        qemu_iovec_init_external(&req->qiov, iov, out_num);
>>> -        virtio_blk_handle_write(req, mrb);
>>> -    } else if (type == VIRTIO_BLK_T_IN || type == VIRTIO_BLK_T_BARRIER) {
>>> -        /* VIRTIO_BLK_T_IN is 0, so we can't just & it. */
>>> -        qemu_iovec_init_external(&req->qiov, in_iov, in_num);
>>> -        virtio_blk_handle_read(req);
>>> -    } else {
>>> +        break;
>>> +    }
>>> +    case VIRTIO_BLK_T_IN:
>>> +    case VIRTIO_BLK_T_OUT:
>>> +    {
>>> +        bool is_write = type & VIRTIO_BLK_T_OUT;
>>> +        int64_t sector_num = virtio_ldq_p(VIRTIO_DEVICE(req->dev),
>>> + &req->out.sector);
>>> +        int max_transfer_length = blk_get_max_transfer_length(req->dev->blk);
>>> +        int nb_sectors = 0;
>>> +        bool merge = true;
>>> +
>>> +        if (!virtio_blk_sect_range_ok(req->dev, sector_num, req->qiov.size)) {
>>> +            virtio_blk_req_complete(req, VIRTIO_BLK_S_IOERR);
>>> +            virtio_blk_free_request(req);
>>> +            return;
>>> +        }
>>> +
>>> +        if (is_write) {
>>> +            qemu_iovec_init_external(&req->qiov, iov, out_num);
>>> +            trace_virtio_blk_handle_write(req, sector_num,
>>> +                                          req->qiov.size / BDRV_SECTOR_SIZE);
>>> +        } else {
>>> +            qemu_iovec_init_external(&req->qiov, in_iov, in_num);
>>> +            trace_virtio_blk_handle_read(req, sector_num,
>>> +                                         req->qiov.size / BDRV_SECTOR_SIZE);
>>> +        }
>>> +
>>> +        nb_sectors = req->qiov.size / BDRV_SECTOR_SIZE;
>> qiov.size is controlled by the guest, and nb_sectors is only an int. Are
>> you sure that this can't overflow?
>
> In theory, yes. For this to happen in_iov or iov needs to contain
> 2TB of data on 32-bit systems. But theoretically there could
> also be already an overflow in qemu_iovec_init_external where
> multiple size_t are summed up in a size_t.
>
> There has been no overflow checking in the merge routine in
> the past, but if you feel better, we could add sth like this:
>
> diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
> index cc0076a..e9236da 100644
> --- a/hw/block/virtio-blk.c
> +++ b/hw/block/virtio-blk.c
> @@ -410,8 +410,8 @@ void virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
>          bool is_write = type & VIRTIO_BLK_T_OUT;
>          int64_t sector_num = virtio_ldq_p(VIRTIO_DEVICE(req->dev),
> &req->out.sector);
> -        int max_transfer_length = blk_get_max_transfer_length(req->dev->blk);
> -        int nb_sectors = 0;
> +        int64_t max_transfer_length = blk_get_max_transfer_length(req->dev->blk);
> +        int64_t nb_sectors = 0;
>          bool merge = true;
>
>          if (!virtio_blk_sect_range_ok(req->dev, sector_num, req->qiov.size)) {
> @@ -431,6 +431,7 @@ void virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
>          }
>
>          nb_sectors = req->qiov.size / BDRV_SECTOR_SIZE;
> +        max_transfer_length = MIN_NON_ZERO(max_transfer_length, INT_MAX);
>
>          block_acct_start(blk_get_stats(req->dev->blk),
>                           &req->acct, req->qiov.size,
> @@ -443,8 +444,7 @@ void virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
>          }
>
>          /* merge would exceed maximum transfer length of backend device */
> -        if (max_transfer_length &&
> -            mrb->nb_sectors + nb_sectors > max_transfer_length) {
> +        if (nb_sectors + mrb->nb_sectors > max_transfer_length) {
>              merge = false;
>          }
>

May also this here:

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index cc0076a..fa647b6 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -333,6 +333,9 @@ static bool virtio_blk_sect_range_ok(VirtIOBlock *dev,
      uint64_t nb_sectors = size >> BDRV_SECTOR_BITS;
      uint64_t total_sectors;

+    if (nb_sectors > INT_MAX) {
+        return false;
+    }
      if (sector & dev->sector_mask) {
          return false;
      }


Thats something that has not been checked for ages as well.

Peter