From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:49079) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Y0Xwk-0005tR-BF for qemu-devel@nongnu.org; Mon, 15 Dec 2014 10:52:35 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Y0Xwe-0000Eb-Bb for qemu-devel@nongnu.org; Mon, 15 Dec 2014 10:52:30 -0500 Received: from mx-v6.kamp.de ([2a02:248:0:51::16]:38768 helo=mx01.kamp.de) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Y0Xwd-0000Cs-Tm for qemu-devel@nongnu.org; Mon, 15 Dec 2014 10:52:24 -0500 Message-ID: <548F03B0.5000206@kamp.de> Date: Mon, 15 Dec 2014 16:52:16 +0100 From: Peter Lieven MIME-Version: 1.0 References: <1418142410-19057-1-git-send-email-pl@kamp.de> <1418142410-19057-5-git-send-email-pl@kamp.de> <20141215150107.GK4411@noname.str.redhat.com> <548F01A7.2020907@kamp.de> In-Reply-To: <548F01A7.2020907@kamp.de> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH 4/4] virtio-blk: introduce multiread List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Kevin Wolf Cc: famz@redhat.com, benoit@irqsave.net, ming.lei@canonical.com, armbru@redhat.com, qemu-devel@nongnu.org, stefanha@redhat.com, pbonzini@redhat.com, mreitz@redhat.com On 15.12.2014 16:43, Peter Lieven wrote: > On 15.12.2014 16:01, Kevin Wolf wrote: >> Am 09.12.2014 um 17:26 hat Peter Lieven geschrieben: >>> this patch finally introduces multiread support to virtio-blk. While >>> multiwrite support was there for a long time, read support was missing. >>> >>> To achieve this the patch does several things which might need further >>> explanation: >>> >>> - the whole merge and multireq logic is moved from block.c into >>> virtio-blk. This is move is a preparation for directly creating a >>> coroutine out of virtio-blk. >>> >>> - requests are only merged if they are strictly sequential, and no >>> longer sorted. This simplification decreases overhead and reduces >>> latency. It will also merge some requests which were unmergable before. >>> >>> The old algorithm took up to 32 requests, sorted them and tried to merge >>> them. The outcome was anything between 1 and 32 requests. In case of >>> 32 requests there were 31 requests unnecessarily delayed. >>> >>> On the other hand let's imagine e.g. 16 unmergeable requests followed >>> by 32 mergable requests. The latter 32 requests would have been split >>> into two 16 byte requests. >>> >>> Last the simplified logic allows for a fast path if we have only a >>> single request in the multirequest. In this case the request is sent as >>> ordinary request without multireq callbacks. >>> >>> As a first benchmark I installed Ubuntu 14.04.1 on a local SSD. The number of >>> merged requests is in the same order while the write latency is obviously >>> decreased by several percent. >>> >>> cmdline: >>> qemu-system-x86_64 -m 1024 -smp 2 -enable-kvm -cdrom ubuntu-14.04.1-server-amd64.iso \ >>> -drive if=virtio,file=/dev/ssd/ubuntu1404,aio=native,cache=none -monitor stdio >>> >>> Before: >>> virtio0: >>> rd_bytes=151056896 wr_bytes=2683947008 rd_operations=18614 wr_operations=67979 >>> flush_operations=15335 wr_total_time_ns=540428034217 rd_total_time_ns=11110520068 >>> flush_total_time_ns=40673685006 rd_merged=0 wr_merged=15531 >>> >>> After: >>> virtio0: >>> rd_bytes=149487104 wr_bytes=2701344768 rd_operations=18148 wr_operations=68578 >>> flush_operations=15368 wr_total_time_ns=437030089565 rd_total_time_ns=9836288815 >>> flush_total_time_ns=40597981121 rd_merged=690 wr_merged=14615 >>> >>> Some first numbers of improved read performance while booting: >>> >>> The Ubuntu 14.04.1 vServer from above: >>> virtio0: >>> rd_bytes=97545216 wr_bytes=119808 rd_operations=5071 wr_operations=26 >>> flush_operations=2 wr_total_time_ns=8847669 rd_total_time_ns=13952575478 >>> flush_total_time_ns=3075496 rd_merged=742 wr_merged=0 >>> >>> Windows 2012R2 (booted from iSCSI): >>> virtio0: rd_bytes=176559104 wr_bytes=61859840 rd_operations=7200 wr_operations=360 >>> flush_operations=68 wr_total_time_ns=34344992718 rd_total_time_ns=134386844669 >>> flush_total_time_ns=18115517 rd_merged=641 wr_merged=216 >>> >>> Signed-off-by: Peter Lieven >> Looks pretty good. The only thing I'm still unsure about are possible >> integer overflows in the merging logic. Maybe you can have another look >> there (ideally not only the places I commented on below, but the whole >> function). >> >>> @@ -414,14 +402,81 @@ void virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb) >>> iov_from_buf(in_iov, in_num, 0, serial, size); >>> virtio_blk_req_complete(req, VIRTIO_BLK_S_OK); >>> virtio_blk_free_request(req); >>> - } else if (type & VIRTIO_BLK_T_OUT) { >>> - qemu_iovec_init_external(&req->qiov, iov, out_num); >>> - virtio_blk_handle_write(req, mrb); >>> - } else if (type == VIRTIO_BLK_T_IN || type == VIRTIO_BLK_T_BARRIER) { >>> - /* VIRTIO_BLK_T_IN is 0, so we can't just & it. */ >>> - qemu_iovec_init_external(&req->qiov, in_iov, in_num); >>> - virtio_blk_handle_read(req); >>> - } else { >>> + break; >>> + } >>> + case VIRTIO_BLK_T_IN: >>> + case VIRTIO_BLK_T_OUT: >>> + { >>> + bool is_write = type & VIRTIO_BLK_T_OUT; >>> + int64_t sector_num = virtio_ldq_p(VIRTIO_DEVICE(req->dev), >>> + &req->out.sector); >>> + int max_transfer_length = blk_get_max_transfer_length(req->dev->blk); >>> + int nb_sectors = 0; >>> + bool merge = true; >>> + >>> + if (!virtio_blk_sect_range_ok(req->dev, sector_num, req->qiov.size)) { >>> + virtio_blk_req_complete(req, VIRTIO_BLK_S_IOERR); >>> + virtio_blk_free_request(req); >>> + return; >>> + } >>> + >>> + if (is_write) { >>> + qemu_iovec_init_external(&req->qiov, iov, out_num); >>> + trace_virtio_blk_handle_write(req, sector_num, >>> + req->qiov.size / BDRV_SECTOR_SIZE); >>> + } else { >>> + qemu_iovec_init_external(&req->qiov, in_iov, in_num); >>> + trace_virtio_blk_handle_read(req, sector_num, >>> + req->qiov.size / BDRV_SECTOR_SIZE); >>> + } >>> + >>> + nb_sectors = req->qiov.size / BDRV_SECTOR_SIZE; >> qiov.size is controlled by the guest, and nb_sectors is only an int. Are >> you sure that this can't overflow? > > In theory, yes. For this to happen in_iov or iov needs to contain > 2TB of data on 32-bit systems. But theoretically there could > also be already an overflow in qemu_iovec_init_external where > multiple size_t are summed up in a size_t. > > There has been no overflow checking in the merge routine in > the past, but if you feel better, we could add sth like this: > > diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c > index cc0076a..e9236da 100644 > --- a/hw/block/virtio-blk.c > +++ b/hw/block/virtio-blk.c > @@ -410,8 +410,8 @@ void virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb) > bool is_write = type & VIRTIO_BLK_T_OUT; > int64_t sector_num = virtio_ldq_p(VIRTIO_DEVICE(req->dev), > &req->out.sector); > - int max_transfer_length = blk_get_max_transfer_length(req->dev->blk); > - int nb_sectors = 0; > + int64_t max_transfer_length = blk_get_max_transfer_length(req->dev->blk); > + int64_t nb_sectors = 0; > bool merge = true; > > if (!virtio_blk_sect_range_ok(req->dev, sector_num, req->qiov.size)) { > @@ -431,6 +431,7 @@ void virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb) > } > > nb_sectors = req->qiov.size / BDRV_SECTOR_SIZE; > + max_transfer_length = MIN_NON_ZERO(max_transfer_length, INT_MAX); > > block_acct_start(blk_get_stats(req->dev->blk), > &req->acct, req->qiov.size, > @@ -443,8 +444,7 @@ void virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb) > } > > /* merge would exceed maximum transfer length of backend device */ > - if (max_transfer_length && > - mrb->nb_sectors + nb_sectors > max_transfer_length) { > + if (nb_sectors + mrb->nb_sectors > max_transfer_length) { > merge = false; > } > May also this here: diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c index cc0076a..fa647b6 100644 --- a/hw/block/virtio-blk.c +++ b/hw/block/virtio-blk.c @@ -333,6 +333,9 @@ static bool virtio_blk_sect_range_ok(VirtIOBlock *dev, uint64_t nb_sectors = size >> BDRV_SECTOR_BITS; uint64_t total_sectors; + if (nb_sectors > INT_MAX) { + return false; + } if (sector & dev->sector_mask) { return false; } Thats something that has not been checked for ages as well. Peter