From: Peter Lieven <pl@kamp.de>
To: Kevin Wolf <kwolf@redhat.com>
Cc: famz@redhat.com, benoit@irqsave.net, ming.lei@canonical.com,
armbru@redhat.com, qemu-devel@nongnu.org, stefanha@redhat.com,
pbonzini@redhat.com, mreitz@redhat.com
Subject: Re: [Qemu-devel] [PATCH 4/4] virtio-blk: introduce multiread
Date: Mon, 15 Dec 2014 16:43:35 +0100 [thread overview]
Message-ID: <548F01A7.2020907@kamp.de> (raw)
In-Reply-To: <20141215150107.GK4411@noname.str.redhat.com>
On 15.12.2014 16:01, Kevin Wolf wrote:
> Am 09.12.2014 um 17:26 hat Peter Lieven geschrieben:
>> this patch finally introduces multiread support to virtio-blk. While
>> multiwrite support was there for a long time, read support was missing.
>>
>> To achieve this the patch does several things which might need further
>> explanation:
>>
>> - the whole merge and multireq logic is moved from block.c into
>> virtio-blk. This is move is a preparation for directly creating a
>> coroutine out of virtio-blk.
>>
>> - requests are only merged if they are strictly sequential, and no
>> longer sorted. This simplification decreases overhead and reduces
>> latency. It will also merge some requests which were unmergable before.
>>
>> The old algorithm took up to 32 requests, sorted them and tried to merge
>> them. The outcome was anything between 1 and 32 requests. In case of
>> 32 requests there were 31 requests unnecessarily delayed.
>>
>> On the other hand let's imagine e.g. 16 unmergeable requests followed
>> by 32 mergable requests. The latter 32 requests would have been split
>> into two 16 byte requests.
>>
>> Last the simplified logic allows for a fast path if we have only a
>> single request in the multirequest. In this case the request is sent as
>> ordinary request without multireq callbacks.
>>
>> As a first benchmark I installed Ubuntu 14.04.1 on a local SSD. The number of
>> merged requests is in the same order while the write latency is obviously
>> decreased by several percent.
>>
>> cmdline:
>> qemu-system-x86_64 -m 1024 -smp 2 -enable-kvm -cdrom ubuntu-14.04.1-server-amd64.iso \
>> -drive if=virtio,file=/dev/ssd/ubuntu1404,aio=native,cache=none -monitor stdio
>>
>> Before:
>> virtio0:
>> rd_bytes=151056896 wr_bytes=2683947008 rd_operations=18614 wr_operations=67979
>> flush_operations=15335 wr_total_time_ns=540428034217 rd_total_time_ns=11110520068
>> flush_total_time_ns=40673685006 rd_merged=0 wr_merged=15531
>>
>> After:
>> virtio0:
>> rd_bytes=149487104 wr_bytes=2701344768 rd_operations=18148 wr_operations=68578
>> flush_operations=15368 wr_total_time_ns=437030089565 rd_total_time_ns=9836288815
>> flush_total_time_ns=40597981121 rd_merged=690 wr_merged=14615
>>
>> Some first numbers of improved read performance while booting:
>>
>> The Ubuntu 14.04.1 vServer from above:
>> virtio0:
>> rd_bytes=97545216 wr_bytes=119808 rd_operations=5071 wr_operations=26
>> flush_operations=2 wr_total_time_ns=8847669 rd_total_time_ns=13952575478
>> flush_total_time_ns=3075496 rd_merged=742 wr_merged=0
>>
>> Windows 2012R2 (booted from iSCSI):
>> virtio0: rd_bytes=176559104 wr_bytes=61859840 rd_operations=7200 wr_operations=360
>> flush_operations=68 wr_total_time_ns=34344992718 rd_total_time_ns=134386844669
>> flush_total_time_ns=18115517 rd_merged=641 wr_merged=216
>>
>> Signed-off-by: Peter Lieven <pl@kamp.de>
> Looks pretty good. The only thing I'm still unsure about are possible
> integer overflows in the merging logic. Maybe you can have another look
> there (ideally not only the places I commented on below, but the whole
> function).
>
>> @@ -414,14 +402,81 @@ void virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
>> iov_from_buf(in_iov, in_num, 0, serial, size);
>> virtio_blk_req_complete(req, VIRTIO_BLK_S_OK);
>> virtio_blk_free_request(req);
>> - } else if (type & VIRTIO_BLK_T_OUT) {
>> - qemu_iovec_init_external(&req->qiov, iov, out_num);
>> - virtio_blk_handle_write(req, mrb);
>> - } else if (type == VIRTIO_BLK_T_IN || type == VIRTIO_BLK_T_BARRIER) {
>> - /* VIRTIO_BLK_T_IN is 0, so we can't just & it. */
>> - qemu_iovec_init_external(&req->qiov, in_iov, in_num);
>> - virtio_blk_handle_read(req);
>> - } else {
>> + break;
>> + }
>> + case VIRTIO_BLK_T_IN:
>> + case VIRTIO_BLK_T_OUT:
>> + {
>> + bool is_write = type & VIRTIO_BLK_T_OUT;
>> + int64_t sector_num = virtio_ldq_p(VIRTIO_DEVICE(req->dev),
>> + &req->out.sector);
>> + int max_transfer_length = blk_get_max_transfer_length(req->dev->blk);
>> + int nb_sectors = 0;
>> + bool merge = true;
>> +
>> + if (!virtio_blk_sect_range_ok(req->dev, sector_num, req->qiov.size)) {
>> + virtio_blk_req_complete(req, VIRTIO_BLK_S_IOERR);
>> + virtio_blk_free_request(req);
>> + return;
>> + }
>> +
>> + if (is_write) {
>> + qemu_iovec_init_external(&req->qiov, iov, out_num);
>> + trace_virtio_blk_handle_write(req, sector_num,
>> + req->qiov.size / BDRV_SECTOR_SIZE);
>> + } else {
>> + qemu_iovec_init_external(&req->qiov, in_iov, in_num);
>> + trace_virtio_blk_handle_read(req, sector_num,
>> + req->qiov.size / BDRV_SECTOR_SIZE);
>> + }
>> +
>> + nb_sectors = req->qiov.size / BDRV_SECTOR_SIZE;
> qiov.size is controlled by the guest, and nb_sectors is only an int. Are
> you sure that this can't overflow?
In theory, yes. For this to happen in_iov or iov needs to contain
2TB of data on 32-bit systems. But theoretically there could
also be already an overflow in qemu_iovec_init_external where
multiple size_t are summed up in a size_t.
There has been no overflow checking in the merge routine in
the past, but if you feel better, we could add sth like this:
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index cc0076a..e9236da 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -410,8 +410,8 @@ void virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
bool is_write = type & VIRTIO_BLK_T_OUT;
int64_t sector_num = virtio_ldq_p(VIRTIO_DEVICE(req->dev),
&req->out.sector);
- int max_transfer_length = blk_get_max_transfer_length(req->dev->blk);
- int nb_sectors = 0;
+ int64_t max_transfer_length = blk_get_max_transfer_length(req->dev->blk);
+ int64_t nb_sectors = 0;
bool merge = true;
if (!virtio_blk_sect_range_ok(req->dev, sector_num, req->qiov.size)) {
@@ -431,6 +431,7 @@ void virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
}
nb_sectors = req->qiov.size / BDRV_SECTOR_SIZE;
+ max_transfer_length = MIN_NON_ZERO(max_transfer_length, INT_MAX);
block_acct_start(blk_get_stats(req->dev->blk),
&req->acct, req->qiov.size,
@@ -443,8 +444,7 @@ void virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
}
/* merge would exceed maximum transfer length of backend device */
- if (max_transfer_length &&
- mrb->nb_sectors + nb_sectors > max_transfer_length) {
+ if (nb_sectors + mrb->nb_sectors > max_transfer_length) {
merge = false;
}
Peter
next prev parent reply other threads:[~2014-12-15 15:43 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-12-09 16:26 [Qemu-devel] [PATCH 0/4] virtio-blk: add multiread support Peter Lieven
2014-12-09 16:26 ` [Qemu-devel] [PATCH 1/4] block: add accounting for merged requests Peter Lieven
2014-12-09 16:26 ` [Qemu-devel] [PATCH 2/4] hw/virtio-blk: add a constant for max number of " Peter Lieven
2014-12-10 4:54 ` Fam Zheng
2014-12-09 16:26 ` [Qemu-devel] [PATCH 3/4] block-backend: expose bs->bl.max_transfer_length Peter Lieven
2014-12-09 16:26 ` [Qemu-devel] [PATCH 4/4] virtio-blk: introduce multiread Peter Lieven
2014-12-10 7:48 ` Fam Zheng
2014-12-11 13:07 ` Peter Lieven
2014-12-15 15:01 ` Kevin Wolf
2014-12-15 15:43 ` Peter Lieven [this message]
2014-12-15 15:52 ` Peter Lieven
2014-12-15 16:00 ` Kevin Wolf
2014-12-15 16:02 ` Peter Lieven
2014-12-15 15:57 ` Kevin Wolf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=548F01A7.2020907@kamp.de \
--to=pl@kamp.de \
--cc=armbru@redhat.com \
--cc=benoit@irqsave.net \
--cc=famz@redhat.com \
--cc=kwolf@redhat.com \
--cc=ming.lei@canonical.com \
--cc=mreitz@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).