qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/4] virtio-blk: add multiread support
@ 2014-12-09 16:26 Peter Lieven
  2014-12-09 16:26 ` [Qemu-devel] [PATCH 1/4] block: add accounting for merged requests Peter Lieven
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Peter Lieven @ 2014-12-09 16:26 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, famz, benoit, ming.lei, Peter Lieven, armbru, mreitz,
	stefanha, pbonzini

this series adds the long missing multiread support to virtio-blk.

some remarks:
 - i introduced rd_merged and wr_merged block accounting stats to
   blockstats as a generic interface which can be set from any
   driver that will introduce multirequest merging in the future.
 - the knob to disable request merging is not yet there. I would
   add it to the device properties also as a generic interface
   to have the same switch for any driver that might introduce
   request merging in the future. As there has been no knob in
   the past I would post this as a seperate series as it needs
   some mangling in parameter parsing which might lead to further
   discussions.
 - the old multiwrite interface is still there and might be removed.

RFC v2->v1:
 - added Erics grammar corrections to Patch 4
 - Patch 4
  - reworked the IF tree to a SWITCH statement to make it easier
    to understand in virtio_blk_handle_request
  - stop merging if requests switch from read to write or vice
    versa. e..g, otherwise we could introduce big delays as a small
    read request could be followed by a lot of write requests
    and the read requests sits there until the queue is empty.

RFC v1->RFC v2:
 - completed Patch 1 by the example in qmp-commands.hx [Eric]
 - used bool for merge in Patch 4 [Max]
 - fixed a few typos in the commit msg of Patch 4 [Max]
 - do not start merging and directly pass req[0]->qiov in case of num_reqs == 1
 - avoid allocating memory for the multireq [Kevin]
 - do not import block_int.h and add appropiate iface to block-backend [Kevin]
 - removed debug output and added trace event for multireq [Kevin]
 - fixed alloc hint for the merge qiov [Kevin]
 - currently did not split virtio_submit_multireq into rw code since
   the redundant code would now be much bigger part than in the original patch.
 - added a merge_qiov to VirtioBlockRequest. Abusing the qiov was not possible
   because it is initialized externally with guest memory [Kevin]
 - added a pointer to VirtioBlockRequest to create a linked list
   of VirtioBlockBlockRequests. This list is used to complete all
   requests belonging to a multireq [Kevin]

Peter Lieven (4):
  block: add accounting for merged requests
  hw/virtio-blk: add a constant for max number of merged requests
  block-backend: expose bs->bl.max_transfer_length
  virtio-blk: introduce multiread

 block.c                         |    2 +
 block/accounting.c              |    7 ++
 block/block-backend.c           |    5 +
 block/qapi.c                    |    2 +
 hmp.c                           |    6 +-
 hw/block/dataplane/virtio-blk.c |    6 +-
 hw/block/virtio-blk.c           |  239 ++++++++++++++++++++++++---------------
 include/block/accounting.h      |    3 +
 include/hw/virtio/virtio-blk.h  |   20 +++-
 include/sysemu/block-backend.h  |    1 +
 qapi/block-core.json            |    9 +-
 qmp-commands.hx                 |   22 +++-
 trace-events                    |    1 +
 13 files changed, 213 insertions(+), 110 deletions(-)

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Qemu-devel] [PATCH 1/4] block: add accounting for merged requests
  2014-12-09 16:26 [Qemu-devel] [PATCH 0/4] virtio-blk: add multiread support Peter Lieven
@ 2014-12-09 16:26 ` Peter Lieven
  2014-12-09 16:26 ` [Qemu-devel] [PATCH 2/4] hw/virtio-blk: add a constant for max number of " Peter Lieven
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 14+ messages in thread
From: Peter Lieven @ 2014-12-09 16:26 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, famz, benoit, ming.lei, Peter Lieven, armbru, mreitz,
	stefanha, pbonzini

Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Eric Blake <eblake@redhat.com>
---
 block.c                    |    2 ++
 block/accounting.c         |    7 +++++++
 block/qapi.c               |    2 ++
 hmp.c                      |    6 +++++-
 include/block/accounting.h |    3 +++
 qapi/block-core.json       |    9 ++++++++-
 qmp-commands.hx            |   22 ++++++++++++++++++----
 7 files changed, 45 insertions(+), 6 deletions(-)

diff --git a/block.c b/block.c
index a612594..75450e3 100644
--- a/block.c
+++ b/block.c
@@ -4508,6 +4508,8 @@ static int multiwrite_merge(BlockDriverState *bs, BlockRequest *reqs,
         }
     }
 
+    block_acct_merge_done(&bs->stats, BLOCK_ACCT_WRITE, num_reqs - outidx - 1);
+
     return outidx + 1;
 }
 
diff --git a/block/accounting.c b/block/accounting.c
index edbb1cc..d1afcd9 100644
--- a/block/accounting.c
+++ b/block/accounting.c
@@ -52,3 +52,10 @@ void block_acct_highest_sector(BlockAcctStats *stats, int64_t sector_num,
         stats->wr_highest_sector = sector_num + nb_sectors - 1;
     }
 }
+
+void block_acct_merge_done(BlockAcctStats *stats, enum BlockAcctType type,
+                      int num_requests)
+{
+    assert(type < BLOCK_MAX_IOTYPE);
+    stats->merged[type] += num_requests;
+}
diff --git a/block/qapi.c b/block/qapi.c
index a87a34a..ec28240 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -316,6 +316,8 @@ static BlockStats *bdrv_query_stats(const BlockDriverState *bs)
     s->stats->wr_bytes = bs->stats.nr_bytes[BLOCK_ACCT_WRITE];
     s->stats->rd_operations = bs->stats.nr_ops[BLOCK_ACCT_READ];
     s->stats->wr_operations = bs->stats.nr_ops[BLOCK_ACCT_WRITE];
+    s->stats->rd_merged = bs->stats.merged[BLOCK_ACCT_READ];
+    s->stats->wr_merged = bs->stats.merged[BLOCK_ACCT_WRITE];
     s->stats->wr_highest_offset =
         bs->stats.wr_highest_sector * BDRV_SECTOR_SIZE;
     s->stats->flush_operations = bs->stats.nr_ops[BLOCK_ACCT_FLUSH];
diff --git a/hmp.c b/hmp.c
index 63d7686..baaa9e5 100644
--- a/hmp.c
+++ b/hmp.c
@@ -419,6 +419,8 @@ void hmp_info_blockstats(Monitor *mon, const QDict *qdict)
                        " wr_total_time_ns=%" PRId64
                        " rd_total_time_ns=%" PRId64
                        " flush_total_time_ns=%" PRId64
+                       " rd_merged=%" PRId64
+                       " wr_merged=%" PRId64
                        "\n",
                        stats->value->stats->rd_bytes,
                        stats->value->stats->wr_bytes,
@@ -427,7 +429,9 @@ void hmp_info_blockstats(Monitor *mon, const QDict *qdict)
                        stats->value->stats->flush_operations,
                        stats->value->stats->wr_total_time_ns,
                        stats->value->stats->rd_total_time_ns,
-                       stats->value->stats->flush_total_time_ns);
+                       stats->value->stats->flush_total_time_ns,
+                       stats->value->stats->rd_merged,
+                       stats->value->stats->wr_merged);
     }
 
     qapi_free_BlockStatsList(stats_list);
diff --git a/include/block/accounting.h b/include/block/accounting.h
index 50b42b3..4c406cf 100644
--- a/include/block/accounting.h
+++ b/include/block/accounting.h
@@ -39,6 +39,7 @@ typedef struct BlockAcctStats {
     uint64_t nr_bytes[BLOCK_MAX_IOTYPE];
     uint64_t nr_ops[BLOCK_MAX_IOTYPE];
     uint64_t total_time_ns[BLOCK_MAX_IOTYPE];
+    uint64_t merged[BLOCK_MAX_IOTYPE];
     uint64_t wr_highest_sector;
 } BlockAcctStats;
 
@@ -53,5 +54,7 @@ void block_acct_start(BlockAcctStats *stats, BlockAcctCookie *cookie,
 void block_acct_done(BlockAcctStats *stats, BlockAcctCookie *cookie);
 void block_acct_highest_sector(BlockAcctStats *stats, int64_t sector_num,
                                unsigned int nb_sectors);
+void block_acct_merge_done(BlockAcctStats *stats, enum BlockAcctType type,
+                           int num_requests);
 
 #endif
diff --git a/qapi/block-core.json b/qapi/block-core.json
index a14e6ab..ead6d5b 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -389,13 +389,20 @@
 #                     growable sparse files (like qcow2) that are used on top
 #                     of a physical device.
 #
+# @rd_merged: Number of read requests that have been merged into another
+#             request (Since 2.3).
+#
+# @wr_merged: Number of write requests that have been merged into another
+#             request (Since 2.3).
+#
 # Since: 0.14.0
 ##
 { 'type': 'BlockDeviceStats',
   'data': {'rd_bytes': 'int', 'wr_bytes': 'int', 'rd_operations': 'int',
            'wr_operations': 'int', 'flush_operations': 'int',
            'flush_total_time_ns': 'int', 'wr_total_time_ns': 'int',
-           'rd_total_time_ns': 'int', 'wr_highest_offset': 'int' } }
+           'rd_total_time_ns': 'int', 'wr_highest_offset': 'int',
+           'rd_merged': 'int', 'wr_merged': 'int' } }
 
 ##
 # @BlockStats:
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 718dd92..42f1e59 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -2261,6 +2261,10 @@ Each json-object contain the following:
     - "flush_total_time_ns": total time spend on cache flushes in nano-seconds (json-int)
     - "wr_highest_offset": Highest offset of a sector written since the
                            BlockDriverState has been opened (json-int)
+    - "rd_merged": number of read requests that have been merged into
+                   another request (json-int)
+    - "wr_merged": number of write requests that have been merged into
+                   another request (json-int)
 - "parent": Contains recursively the statistics of the underlying
             protocol (e.g. the host file for a qcow2 image). If there is
             no underlying protocol, this field is omitted
@@ -2284,6 +2288,8 @@ Example:
                   "rd_total_times_ns":3465673657
                   "flush_total_times_ns":49653
                   "flush_operations":61,
+                  "rd_merged":0,
+                  "wr_merged":0
                }
             },
             "stats":{
@@ -2295,7 +2301,9 @@ Example:
                "flush_operations":51,
                "wr_total_times_ns":313253456
                "rd_total_times_ns":3465673657
-               "flush_total_times_ns":49653
+               "flush_total_times_ns":49653,
+               "rd_merged":0,
+               "wr_merged":0
             }
          },
          {
@@ -2309,7 +2317,9 @@ Example:
                "flush_operations":0,
                "wr_total_times_ns":0
                "rd_total_times_ns":0
-               "flush_total_times_ns":0
+               "flush_total_times_ns":0,
+               "rd_merged":0,
+               "wr_merged":0
             }
          },
          {
@@ -2323,7 +2333,9 @@ Example:
                "flush_operations":0,
                "wr_total_times_ns":0
                "rd_total_times_ns":0
-               "flush_total_times_ns":0
+               "flush_total_times_ns":0,
+               "rd_merged":0,
+               "wr_merged":0
             }
          },
          {
@@ -2337,7 +2349,9 @@ Example:
                "flush_operations":0,
                "wr_total_times_ns":0
                "rd_total_times_ns":0
-               "flush_total_times_ns":0
+               "flush_total_times_ns":0,
+               "rd_merged":0,
+               "wr_merged":0
             }
          }
       ]
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [Qemu-devel] [PATCH 2/4] hw/virtio-blk: add a constant for max number of merged requests
  2014-12-09 16:26 [Qemu-devel] [PATCH 0/4] virtio-blk: add multiread support Peter Lieven
  2014-12-09 16:26 ` [Qemu-devel] [PATCH 1/4] block: add accounting for merged requests Peter Lieven
@ 2014-12-09 16:26 ` Peter Lieven
  2014-12-10  4:54   ` Fam Zheng
  2014-12-09 16:26 ` [Qemu-devel] [PATCH 3/4] block-backend: expose bs->bl.max_transfer_length Peter Lieven
  2014-12-09 16:26 ` [Qemu-devel] [PATCH 4/4] virtio-blk: introduce multiread Peter Lieven
  3 siblings, 1 reply; 14+ messages in thread
From: Peter Lieven @ 2014-12-09 16:26 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, famz, benoit, ming.lei, Peter Lieven, armbru, mreitz,
	stefanha, pbonzini

As it was not obvious (at least for me) where the 32 comes from;
add a constant for it.

Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Eric Blake <eblake@redhat.com>
---
 hw/block/virtio-blk.c          |    2 +-
 include/hw/virtio/virtio-blk.h |    4 +++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index b19b102..490f961 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -326,7 +326,7 @@ static void virtio_blk_handle_write(VirtIOBlockReq *req, MultiReqBuffer *mrb)
     block_acct_start(blk_get_stats(req->dev->blk), &req->acct, req->qiov.size,
                      BLOCK_ACCT_WRITE);
 
-    if (mrb->num_writes == 32) {
+    if (mrb->num_writes == VIRTIO_BLK_MAX_MERGE_REQS) {
         virtio_submit_multiwrite(req->dev->blk, mrb);
     }
 
diff --git a/include/hw/virtio/virtio-blk.h b/include/hw/virtio/virtio-blk.h
index 3979dc4..3f2652f 100644
--- a/include/hw/virtio/virtio-blk.h
+++ b/include/hw/virtio/virtio-blk.h
@@ -134,8 +134,10 @@ typedef struct VirtIOBlock {
     struct VirtIOBlockDataPlane *dataplane;
 } VirtIOBlock;
 
+#define VIRTIO_BLK_MAX_MERGE_REQS 32
+
 typedef struct MultiReqBuffer {
-    BlockRequest        blkreq[32];
+    BlockRequest        blkreq[VIRTIO_BLK_MAX_MERGE_REQS];
     unsigned int        num_writes;
 } MultiReqBuffer;
 
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [Qemu-devel] [PATCH 3/4] block-backend: expose bs->bl.max_transfer_length
  2014-12-09 16:26 [Qemu-devel] [PATCH 0/4] virtio-blk: add multiread support Peter Lieven
  2014-12-09 16:26 ` [Qemu-devel] [PATCH 1/4] block: add accounting for merged requests Peter Lieven
  2014-12-09 16:26 ` [Qemu-devel] [PATCH 2/4] hw/virtio-blk: add a constant for max number of " Peter Lieven
@ 2014-12-09 16:26 ` Peter Lieven
  2014-12-09 16:26 ` [Qemu-devel] [PATCH 4/4] virtio-blk: introduce multiread Peter Lieven
  3 siblings, 0 replies; 14+ messages in thread
From: Peter Lieven @ 2014-12-09 16:26 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, famz, benoit, ming.lei, Peter Lieven, armbru, mreitz,
	stefanha, pbonzini

Signed-off-by: Peter Lieven <pl@kamp.de>
---
 block/block-backend.c          |    5 +++++
 include/sysemu/block-backend.h |    1 +
 2 files changed, 6 insertions(+)

diff --git a/block/block-backend.c b/block/block-backend.c
index d0692b1..545580f 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -569,6 +569,11 @@ int blk_get_flags(BlockBackend *blk)
     return bdrv_get_flags(blk->bs);
 }
 
+int blk_get_max_transfer_length(BlockBackend *blk)
+{
+    return blk->bs->bl.max_transfer_length;
+}
+
 void blk_set_guest_block_size(BlockBackend *blk, int align)
 {
     bdrv_set_guest_block_size(blk->bs, align);
diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 52d13c1..b3243cb 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -124,6 +124,7 @@ int blk_is_inserted(BlockBackend *blk);
 void blk_lock_medium(BlockBackend *blk, bool locked);
 void blk_eject(BlockBackend *blk, bool eject_flag);
 int blk_get_flags(BlockBackend *blk);
+int blk_get_max_transfer_length(BlockBackend *blk);
 void blk_set_guest_block_size(BlockBackend *blk, int align);
 void *blk_blockalign(BlockBackend *blk, size_t size);
 bool blk_op_is_blocked(BlockBackend *blk, BlockOpType op, Error **errp);
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [Qemu-devel] [PATCH 4/4] virtio-blk: introduce multiread
  2014-12-09 16:26 [Qemu-devel] [PATCH 0/4] virtio-blk: add multiread support Peter Lieven
                   ` (2 preceding siblings ...)
  2014-12-09 16:26 ` [Qemu-devel] [PATCH 3/4] block-backend: expose bs->bl.max_transfer_length Peter Lieven
@ 2014-12-09 16:26 ` Peter Lieven
  2014-12-10  7:48   ` Fam Zheng
  2014-12-15 15:01   ` Kevin Wolf
  3 siblings, 2 replies; 14+ messages in thread
From: Peter Lieven @ 2014-12-09 16:26 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, famz, benoit, ming.lei, Peter Lieven, armbru, mreitz,
	stefanha, pbonzini

this patch finally introduces multiread support to virtio-blk. While
multiwrite support was there for a long time, read support was missing.

To achieve this the patch does several things which might need further
explanation:

 - the whole merge and multireq logic is moved from block.c into
   virtio-blk. This is move is a preparation for directly creating a
   coroutine out of virtio-blk.

 - requests are only merged if they are strictly sequential, and no
   longer sorted. This simplification decreases overhead and reduces
   latency. It will also merge some requests which were unmergable before.

   The old algorithm took up to 32 requests, sorted them and tried to merge
   them. The outcome was anything between 1 and 32 requests. In case of
   32 requests there were 31 requests unnecessarily delayed.

   On the other hand let's imagine e.g. 16 unmergeable requests followed
   by 32 mergable requests. The latter 32 requests would have been split
   into two 16 byte requests.

   Last the simplified logic allows for a fast path if we have only a
   single request in the multirequest. In this case the request is sent as
   ordinary request without multireq callbacks.

As a first benchmark I installed Ubuntu 14.04.1 on a local SSD. The number of
merged requests is in the same order while the write latency is obviously
decreased by several percent.

cmdline:
qemu-system-x86_64 -m 1024 -smp 2 -enable-kvm -cdrom ubuntu-14.04.1-server-amd64.iso \
 -drive if=virtio,file=/dev/ssd/ubuntu1404,aio=native,cache=none -monitor stdio

Before:
virtio0:
 rd_bytes=151056896 wr_bytes=2683947008 rd_operations=18614 wr_operations=67979
 flush_operations=15335 wr_total_time_ns=540428034217 rd_total_time_ns=11110520068
 flush_total_time_ns=40673685006 rd_merged=0 wr_merged=15531

After:
virtio0:
 rd_bytes=149487104 wr_bytes=2701344768 rd_operations=18148 wr_operations=68578
 flush_operations=15368 wr_total_time_ns=437030089565 rd_total_time_ns=9836288815
 flush_total_time_ns=40597981121 rd_merged=690 wr_merged=14615

Some first numbers of improved read performance while booting:

The Ubuntu 14.04.1 vServer from above:
virtio0:
 rd_bytes=97545216 wr_bytes=119808 rd_operations=5071 wr_operations=26
 flush_operations=2 wr_total_time_ns=8847669 rd_total_time_ns=13952575478
 flush_total_time_ns=3075496 rd_merged=742 wr_merged=0

Windows 2012R2 (booted from iSCSI):
virtio0: rd_bytes=176559104 wr_bytes=61859840 rd_operations=7200 wr_operations=360
 flush_operations=68 wr_total_time_ns=34344992718 rd_total_time_ns=134386844669
 flush_total_time_ns=18115517 rd_merged=641 wr_merged=216

Signed-off-by: Peter Lieven <pl@kamp.de>
---
 hw/block/dataplane/virtio-blk.c |    6 +-
 hw/block/virtio-blk.c           |  239 ++++++++++++++++++++++++---------------
 include/hw/virtio/virtio-blk.h  |   22 ++--
 trace-events                    |    1 +
 4 files changed, 162 insertions(+), 106 deletions(-)

diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index 1222a37..30bf2e7 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -96,9 +96,7 @@ static void handle_notify(EventNotifier *e)
     event_notifier_test_and_clear(&s->host_notifier);
     blk_io_plug(s->conf->conf.blk);
     for (;;) {
-        MultiReqBuffer mrb = {
-            .num_writes = 0,
-        };
+        MultiReqBuffer mrb = {};
         int ret;
 
         /* Disable guest->host notifies to avoid unnecessary vmexits */
@@ -120,7 +118,7 @@ static void handle_notify(EventNotifier *e)
             virtio_blk_handle_request(req, &mrb);
         }
 
-        virtio_submit_multiwrite(s->conf->conf.blk, &mrb);
+        virtio_submit_multireq(s->conf->conf.blk, &mrb);
 
         if (likely(ret == -EAGAIN)) { /* vring emptied */
             /* Re-enable guest->host notifies and stop processing the vring.
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 490f961..cc0076a 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -34,6 +34,8 @@ VirtIOBlockReq *virtio_blk_alloc_request(VirtIOBlock *s)
     req->dev = s;
     req->qiov.size = 0;
     req->next = NULL;
+    req->mr_next = NULL;
+    req->mr_qiov.nalloc = 0;
     return req;
 }
 
@@ -84,20 +86,29 @@ static int virtio_blk_handle_rw_error(VirtIOBlockReq *req, int error,
 
 static void virtio_blk_rw_complete(void *opaque, int ret)
 {
-    VirtIOBlockReq *req = opaque;
+    VirtIOBlockReq *next = opaque;
 
-    trace_virtio_blk_rw_complete(req, ret);
+    while (next) {
+        VirtIOBlockReq *req = next;
+        next = req->mr_next;
+        trace_virtio_blk_rw_complete(req, ret);
 
-    if (ret) {
-        int p = virtio_ldl_p(VIRTIO_DEVICE(req->dev), &req->out.type);
-        bool is_read = !(p & VIRTIO_BLK_T_OUT);
-        if (virtio_blk_handle_rw_error(req, -ret, is_read))
-            return;
-    }
+        if (req->mr_qiov.nalloc) {
+            qemu_iovec_destroy(&req->mr_qiov);
+        }
 
-    virtio_blk_req_complete(req, VIRTIO_BLK_S_OK);
-    block_acct_done(blk_get_stats(req->dev->blk), &req->acct);
-    virtio_blk_free_request(req);
+        if (ret) {
+            int p = virtio_ldl_p(VIRTIO_DEVICE(req->dev), &req->out.type);
+            bool is_read = !(p & VIRTIO_BLK_T_OUT);
+            if (virtio_blk_handle_rw_error(req, -ret, is_read)) {
+                continue;
+            }
+        }
+
+        virtio_blk_req_complete(req, VIRTIO_BLK_S_OK);
+        block_acct_done(blk_get_stats(req->dev->blk), &req->acct);
+        virtio_blk_free_request(req);
+    }
 }
 
 static void virtio_blk_flush_complete(void *opaque, int ret)
@@ -257,24 +268,49 @@ static void virtio_blk_handle_scsi(VirtIOBlockReq *req)
     virtio_blk_free_request(req);
 }
 
-void virtio_submit_multiwrite(BlockBackend *blk, MultiReqBuffer *mrb)
+void virtio_submit_multireq(BlockBackend *blk, MultiReqBuffer *mrb)
 {
-    int i, ret;
+    QEMUIOVector *qiov = &mrb->reqs[0]->qiov;
+    bool is_write = mrb->is_write;
 
-    if (!mrb->num_writes) {
+    if (!mrb->num_reqs) {
         return;
     }
 
-    ret = blk_aio_multiwrite(blk, mrb->blkreq, mrb->num_writes);
-    if (ret != 0) {
-        for (i = 0; i < mrb->num_writes; i++) {
-            if (mrb->blkreq[i].error) {
-                virtio_blk_rw_complete(mrb->blkreq[i].opaque, -EIO);
+    if (mrb->num_reqs > 1) {
+        int i;
+
+        trace_virtio_blk_submit_multireq(mrb, mrb->num_reqs, mrb->sector_num,
+                                         mrb->nb_sectors, is_write);
+
+        qiov = &mrb->reqs[0]->mr_qiov;
+        qemu_iovec_init(qiov, mrb->niov);
+
+        for (i = 0; i < mrb->num_reqs; i++) {
+            qemu_iovec_concat(qiov, &mrb->reqs[i]->qiov, 0,
+                              mrb->reqs[i]->qiov.size);
+            if (i) {
+                mrb->reqs[i - 1]->mr_next = mrb->reqs[i];
             }
         }
+        assert(mrb->nb_sectors == qiov->size / BDRV_SECTOR_SIZE);
+
+        block_acct_merge_done(blk_get_stats(blk),
+                              is_write ? BLOCK_ACCT_WRITE : BLOCK_ACCT_READ,
+                              mrb->num_reqs - 1);
     }
 
-    mrb->num_writes = 0;
+    if (is_write) {
+        blk_aio_writev(blk, mrb->sector_num, qiov, mrb->nb_sectors,
+                       virtio_blk_rw_complete, mrb->reqs[0]);
+    } else {
+        blk_aio_readv(blk, mrb->sector_num, qiov, mrb->nb_sectors,
+                      virtio_blk_rw_complete, mrb->reqs[0]);
+    }
+
+    mrb->num_reqs = 0;
+    mrb->nb_sectors = 0;
+    mrb->niov = 0;
 }
 
 static void virtio_blk_handle_flush(VirtIOBlockReq *req, MultiReqBuffer *mrb)
@@ -285,7 +321,9 @@ static void virtio_blk_handle_flush(VirtIOBlockReq *req, MultiReqBuffer *mrb)
     /*
      * Make sure all outstanding writes are posted to the backing device.
      */
-    virtio_submit_multiwrite(req->dev->blk, mrb);
+    if (mrb->is_write) {
+        virtio_submit_multireq(req->dev->blk, mrb);
+    }
     blk_aio_flush(req->dev->blk, virtio_blk_flush_complete, req);
 }
 
@@ -308,60 +346,6 @@ static bool virtio_blk_sect_range_ok(VirtIOBlock *dev,
     return true;
 }
 
-static void virtio_blk_handle_write(VirtIOBlockReq *req, MultiReqBuffer *mrb)
-{
-    BlockRequest *blkreq;
-    uint64_t sector;
-
-    sector = virtio_ldq_p(VIRTIO_DEVICE(req->dev), &req->out.sector);
-
-    trace_virtio_blk_handle_write(req, sector, req->qiov.size / 512);
-
-    if (!virtio_blk_sect_range_ok(req->dev, sector, req->qiov.size)) {
-        virtio_blk_req_complete(req, VIRTIO_BLK_S_IOERR);
-        virtio_blk_free_request(req);
-        return;
-    }
-
-    block_acct_start(blk_get_stats(req->dev->blk), &req->acct, req->qiov.size,
-                     BLOCK_ACCT_WRITE);
-
-    if (mrb->num_writes == VIRTIO_BLK_MAX_MERGE_REQS) {
-        virtio_submit_multiwrite(req->dev->blk, mrb);
-    }
-
-    blkreq = &mrb->blkreq[mrb->num_writes];
-    blkreq->sector = sector;
-    blkreq->nb_sectors = req->qiov.size / BDRV_SECTOR_SIZE;
-    blkreq->qiov = &req->qiov;
-    blkreq->cb = virtio_blk_rw_complete;
-    blkreq->opaque = req;
-    blkreq->error = 0;
-
-    mrb->num_writes++;
-}
-
-static void virtio_blk_handle_read(VirtIOBlockReq *req)
-{
-    uint64_t sector;
-
-    sector = virtio_ldq_p(VIRTIO_DEVICE(req->dev), &req->out.sector);
-
-    trace_virtio_blk_handle_read(req, sector, req->qiov.size / 512);
-
-    if (!virtio_blk_sect_range_ok(req->dev, sector, req->qiov.size)) {
-        virtio_blk_req_complete(req, VIRTIO_BLK_S_IOERR);
-        virtio_blk_free_request(req);
-        return;
-    }
-
-    block_acct_start(blk_get_stats(req->dev->blk), &req->acct, req->qiov.size,
-                     BLOCK_ACCT_READ);
-    blk_aio_readv(req->dev->blk, sector, &req->qiov,
-                  req->qiov.size / BDRV_SECTOR_SIZE,
-                  virtio_blk_rw_complete, req);
-}
-
 void virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
 {
     uint32_t type;
@@ -396,11 +380,15 @@ void virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
 
     type = virtio_ldl_p(VIRTIO_DEVICE(req->dev), &req->out.type);
 
-    if (type & VIRTIO_BLK_T_FLUSH) {
+    switch (type & 0xff) {
+    case VIRTIO_BLK_T_FLUSH:
         virtio_blk_handle_flush(req, mrb);
-    } else if (type & VIRTIO_BLK_T_SCSI_CMD) {
+        break;
+    case VIRTIO_BLK_T_SCSI_CMD:
         virtio_blk_handle_scsi(req);
-    } else if (type & VIRTIO_BLK_T_GET_ID) {
+        break;
+    case VIRTIO_BLK_T_GET_ID:
+    {
         VirtIOBlock *s = req->dev;
 
         /*
@@ -414,14 +402,81 @@ void virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
         iov_from_buf(in_iov, in_num, 0, serial, size);
         virtio_blk_req_complete(req, VIRTIO_BLK_S_OK);
         virtio_blk_free_request(req);
-    } else if (type & VIRTIO_BLK_T_OUT) {
-        qemu_iovec_init_external(&req->qiov, iov, out_num);
-        virtio_blk_handle_write(req, mrb);
-    } else if (type == VIRTIO_BLK_T_IN || type == VIRTIO_BLK_T_BARRIER) {
-        /* VIRTIO_BLK_T_IN is 0, so we can't just & it. */
-        qemu_iovec_init_external(&req->qiov, in_iov, in_num);
-        virtio_blk_handle_read(req);
-    } else {
+        break;
+    }
+    case VIRTIO_BLK_T_IN:
+    case VIRTIO_BLK_T_OUT:
+    {
+        bool is_write = type & VIRTIO_BLK_T_OUT;
+        int64_t sector_num = virtio_ldq_p(VIRTIO_DEVICE(req->dev),
+                                          &req->out.sector);
+        int max_transfer_length = blk_get_max_transfer_length(req->dev->blk);
+        int nb_sectors = 0;
+        bool merge = true;
+
+        if (!virtio_blk_sect_range_ok(req->dev, sector_num, req->qiov.size)) {
+            virtio_blk_req_complete(req, VIRTIO_BLK_S_IOERR);
+            virtio_blk_free_request(req);
+            return;
+        }
+
+        if (is_write) {
+            qemu_iovec_init_external(&req->qiov, iov, out_num);
+            trace_virtio_blk_handle_write(req, sector_num,
+                                          req->qiov.size / BDRV_SECTOR_SIZE);
+        } else {
+            qemu_iovec_init_external(&req->qiov, in_iov, in_num);
+            trace_virtio_blk_handle_read(req, sector_num,
+                                         req->qiov.size / BDRV_SECTOR_SIZE);
+        }
+
+        nb_sectors = req->qiov.size / BDRV_SECTOR_SIZE;
+
+        block_acct_start(blk_get_stats(req->dev->blk),
+                         &req->acct, req->qiov.size,
+                         is_write ? BLOCK_ACCT_WRITE : BLOCK_ACCT_READ);
+
+        /* merge would exceed maximum number of requests or IOVs */
+        if (mrb->num_reqs == MAX_MERGE_REQS ||
+            mrb->niov + req->qiov.niov + 1 > IOV_MAX) {
+            merge = false;
+        }
+
+        /* merge would exceed maximum transfer length of backend device */
+        if (max_transfer_length &&
+            mrb->nb_sectors + nb_sectors > max_transfer_length) {
+            merge = false;
+        }
+
+        /* requests are not sequential */
+        if (mrb->num_reqs && mrb->sector_num + mrb->nb_sectors != sector_num) {
+            merge = false;
+        }
+
+        /* if we switch from read to write or vise versa we should submit
+         * outstanding requests to avoid unnecessary and potential long delays.
+         * Furthermore we share the same struct for read and write merging so
+         * submission is a must here. */
+        if (is_write != mrb->is_write) {
+            merge = false;
+        }
+
+        if (!merge) {
+            virtio_submit_multireq(req->dev->blk, mrb);
+        }
+
+        if (mrb->num_reqs == 0) {
+            mrb->sector_num = sector_num;
+            mrb->is_write = is_write;
+        }
+
+        mrb->nb_sectors += req->qiov.size / BDRV_SECTOR_SIZE;
+        mrb->reqs[mrb->num_reqs] = req;
+        mrb->niov += req->qiov.niov;
+        mrb->num_reqs++;
+        break;
+    }
+    default:
         virtio_blk_req_complete(req, VIRTIO_BLK_S_UNSUPP);
         virtio_blk_free_request(req);
     }
@@ -431,9 +486,7 @@ static void virtio_blk_handle_output(VirtIODevice *vdev, VirtQueue *vq)
 {
     VirtIOBlock *s = VIRTIO_BLK(vdev);
     VirtIOBlockReq *req;
-    MultiReqBuffer mrb = {
-        .num_writes = 0,
-    };
+    MultiReqBuffer mrb = {};
 
     /* Some guests kick before setting VIRTIO_CONFIG_S_DRIVER_OK so start
      * dataplane here instead of waiting for .set_status().
@@ -447,7 +500,7 @@ static void virtio_blk_handle_output(VirtIODevice *vdev, VirtQueue *vq)
         virtio_blk_handle_request(req, &mrb);
     }
 
-    virtio_submit_multiwrite(s->blk, &mrb);
+    virtio_submit_multireq(s->blk, &mrb);
 
     /*
      * FIXME: Want to check for completions before returning to guest mode,
@@ -460,9 +513,7 @@ static void virtio_blk_dma_restart_bh(void *opaque)
 {
     VirtIOBlock *s = opaque;
     VirtIOBlockReq *req = s->rq;
-    MultiReqBuffer mrb = {
-        .num_writes = 0,
-    };
+    MultiReqBuffer mrb = {};
 
     qemu_bh_delete(s->bh);
     s->bh = NULL;
@@ -475,7 +526,7 @@ static void virtio_blk_dma_restart_bh(void *opaque)
         req = next;
     }
 
-    virtio_submit_multiwrite(s->blk, &mrb);
+    virtio_submit_multireq(s->blk, &mrb);
 }
 
 static void virtio_blk_dma_restart_cb(void *opaque, int running,
diff --git a/include/hw/virtio/virtio-blk.h b/include/hw/virtio/virtio-blk.h
index 3f2652f..0ee9582 100644
--- a/include/hw/virtio/virtio-blk.h
+++ b/include/hw/virtio/virtio-blk.h
@@ -134,13 +134,6 @@ typedef struct VirtIOBlock {
     struct VirtIOBlockDataPlane *dataplane;
 } VirtIOBlock;
 
-#define VIRTIO_BLK_MAX_MERGE_REQS 32
-
-typedef struct MultiReqBuffer {
-    BlockRequest        blkreq[VIRTIO_BLK_MAX_MERGE_REQS];
-    unsigned int        num_writes;
-} MultiReqBuffer;
-
 typedef struct VirtIOBlockReq {
     VirtIOBlock *dev;
     VirtQueueElement elem;
@@ -149,8 +142,21 @@ typedef struct VirtIOBlockReq {
     QEMUIOVector qiov;
     struct VirtIOBlockReq *next;
     BlockAcctCookie acct;
+    QEMUIOVector mr_qiov;
+    struct VirtIOBlockReq *mr_next;
 } VirtIOBlockReq;
 
+#define MAX_MERGE_REQS 32
+
+typedef struct MultiReqBuffer {
+    VirtIOBlockReq *reqs[MAX_MERGE_REQS];
+    unsigned int num_reqs;
+    bool is_write;
+    int niov;
+    int64_t sector_num;
+    int nb_sectors;
+} MultiReqBuffer;
+
 VirtIOBlockReq *virtio_blk_alloc_request(VirtIOBlock *s);
 
 void virtio_blk_free_request(VirtIOBlockReq *req);
@@ -160,6 +166,6 @@ int virtio_blk_handle_scsi_req(VirtIOBlock *blk,
 
 void virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb);
 
-void virtio_submit_multiwrite(BlockBackend *blk, MultiReqBuffer *mrb);
+void virtio_submit_multireq(BlockBackend *blk, MultiReqBuffer *mrb);
 
 #endif
diff --git a/trace-events b/trace-events
index b5722ea..1b2fbf3 100644
--- a/trace-events
+++ b/trace-events
@@ -116,6 +116,7 @@ virtio_blk_req_complete(void *req, int status) "req %p status %d"
 virtio_blk_rw_complete(void *req, int ret) "req %p ret %d"
 virtio_blk_handle_write(void *req, uint64_t sector, size_t nsectors) "req %p sector %"PRIu64" nsectors %zu"
 virtio_blk_handle_read(void *req, uint64_t sector, size_t nsectors) "req %p sector %"PRIu64" nsectors %zu"
+virtio_blk_submit_multireq(void *mrb, int num_reqs, uint64_t sector, size_t nsectors, bool is_write) "mrb %p num_reqs %d sector %"PRIu64" nsectors %zu is_write %d"
 
 # hw/block/dataplane/virtio-blk.c
 virtio_blk_data_plane_start(void *s) "dataplane %p"
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH 2/4] hw/virtio-blk: add a constant for max number of merged requests
  2014-12-09 16:26 ` [Qemu-devel] [PATCH 2/4] hw/virtio-blk: add a constant for max number of " Peter Lieven
@ 2014-12-10  4:54   ` Fam Zheng
  0 siblings, 0 replies; 14+ messages in thread
From: Fam Zheng @ 2014-12-10  4:54 UTC (permalink / raw)
  To: Peter Lieven
  Cc: kwolf, benoit, qemu-devel, ming.lei, armbru, mreitz, stefanha,
	pbonzini

On Tue, 12/09 17:26, Peter Lieven wrote:
> As it was not obvious (at least for me) where the 32 comes from;
> add a constant for it.
> 
> Signed-off-by: Peter Lieven <pl@kamp.de>
> Reviewed-by: Eric Blake <eblake@redhat.com>
> ---
>  hw/block/virtio-blk.c          |    2 +-
>  include/hw/virtio/virtio-blk.h |    4 +++-
>  2 files changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
> index b19b102..490f961 100644
> --- a/hw/block/virtio-blk.c
> +++ b/hw/block/virtio-blk.c
> @@ -326,7 +326,7 @@ static void virtio_blk_handle_write(VirtIOBlockReq *req, MultiReqBuffer *mrb)
>      block_acct_start(blk_get_stats(req->dev->blk), &req->acct, req->qiov.size,
>                       BLOCK_ACCT_WRITE);
>  
> -    if (mrb->num_writes == 32) {
> +    if (mrb->num_writes == VIRTIO_BLK_MAX_MERGE_REQS) {
>          virtio_submit_multiwrite(req->dev->blk, mrb);
>      }
>  
> diff --git a/include/hw/virtio/virtio-blk.h b/include/hw/virtio/virtio-blk.h
> index 3979dc4..3f2652f 100644
> --- a/include/hw/virtio/virtio-blk.h
> +++ b/include/hw/virtio/virtio-blk.h
> @@ -134,8 +134,10 @@ typedef struct VirtIOBlock {
>      struct VirtIOBlockDataPlane *dataplane;
>  } VirtIOBlock;
>  
> +#define VIRTIO_BLK_MAX_MERGE_REQS 32
> +
>  typedef struct MultiReqBuffer {
> -    BlockRequest        blkreq[32];
> +    BlockRequest        blkreq[VIRTIO_BLK_MAX_MERGE_REQS];
>      unsigned int        num_writes;
>  } MultiReqBuffer;
>  
> -- 
> 1.7.9.5
> 
> 

Reviewed-by: Fam Zheng <famz@redhat.com>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] virtio-blk: introduce multiread
  2014-12-09 16:26 ` [Qemu-devel] [PATCH 4/4] virtio-blk: introduce multiread Peter Lieven
@ 2014-12-10  7:48   ` Fam Zheng
  2014-12-11 13:07     ` Peter Lieven
  2014-12-15 15:01   ` Kevin Wolf
  1 sibling, 1 reply; 14+ messages in thread
From: Fam Zheng @ 2014-12-10  7:48 UTC (permalink / raw)
  To: Peter Lieven
  Cc: kwolf, benoit, ming.lei, armbru, qemu-devel, stefanha, pbonzini,
	mreitz

On Tue, 12/09 17:26, Peter Lieven wrote:
> diff --git a/include/hw/virtio/virtio-blk.h b/include/hw/virtio/virtio-blk.h
> index 3f2652f..0ee9582 100644
> --- a/include/hw/virtio/virtio-blk.h
> +++ b/include/hw/virtio/virtio-blk.h
> @@ -134,13 +134,6 @@ typedef struct VirtIOBlock {
>      struct VirtIOBlockDataPlane *dataplane;
>  } VirtIOBlock;
>  
> -#define VIRTIO_BLK_MAX_MERGE_REQS 32
> -
> -typedef struct MultiReqBuffer {
> -    BlockRequest        blkreq[VIRTIO_BLK_MAX_MERGE_REQS];
> -    unsigned int        num_writes;
> -} MultiReqBuffer;
> -
>  typedef struct VirtIOBlockReq {
>      VirtIOBlock *dev;
>      VirtQueueElement elem;
> @@ -149,8 +142,21 @@ typedef struct VirtIOBlockReq {
>      QEMUIOVector qiov;
>      struct VirtIOBlockReq *next;
>      BlockAcctCookie acct;
> +    QEMUIOVector mr_qiov;
> +    struct VirtIOBlockReq *mr_next;
>  } VirtIOBlockReq;
>  
> +#define MAX_MERGE_REQS 32

Why do you need to rename this macro after introducing it in previous patch?

> +
> +typedef struct MultiReqBuffer {
> +    VirtIOBlockReq *reqs[MAX_MERGE_REQS];
> +    unsigned int num_reqs;
> +    bool is_write;
> +    int niov;
> +    int64_t sector_num;
> +    int nb_sectors;
> +} MultiReqBuffer;
> +
>  VirtIOBlockReq *virtio_blk_alloc_request(VirtIOBlock *s);
>  
>  void virtio_blk_free_request(VirtIOBlockReq *req);
> @@ -160,6 +166,6 @@ int virtio_blk_handle_scsi_req(VirtIOBlock *blk,
>  
>  void virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb);
>  
> -void virtio_submit_multiwrite(BlockBackend *blk, MultiReqBuffer *mrb);
> +void virtio_submit_multireq(BlockBackend *blk, MultiReqBuffer *mrb);
>  
>  #endif

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] virtio-blk: introduce multiread
  2014-12-10  7:48   ` Fam Zheng
@ 2014-12-11 13:07     ` Peter Lieven
  0 siblings, 0 replies; 14+ messages in thread
From: Peter Lieven @ 2014-12-11 13:07 UTC (permalink / raw)
  To: Fam Zheng
  Cc: kwolf, benoit, ming.lei, armbru, qemu-devel, stefanha, pbonzini,
	mreitz

On 10.12.2014 08:48, Fam Zheng wrote:
> On Tue, 12/09 17:26, Peter Lieven wrote:
>> diff --git a/include/hw/virtio/virtio-blk.h b/include/hw/virtio/virtio-blk.h
>> index 3f2652f..0ee9582 100644
>> --- a/include/hw/virtio/virtio-blk.h
>> +++ b/include/hw/virtio/virtio-blk.h
>> @@ -134,13 +134,6 @@ typedef struct VirtIOBlock {
>>       struct VirtIOBlockDataPlane *dataplane;
>>   } VirtIOBlock;
>>   
>> -#define VIRTIO_BLK_MAX_MERGE_REQS 32
>> -
>> -typedef struct MultiReqBuffer {
>> -    BlockRequest        blkreq[VIRTIO_BLK_MAX_MERGE_REQS];
>> -    unsigned int        num_writes;
>> -} MultiReqBuffer;
>> -
>>   typedef struct VirtIOBlockReq {
>>       VirtIOBlock *dev;
>>       VirtQueueElement elem;
>> @@ -149,8 +142,21 @@ typedef struct VirtIOBlockReq {
>>       QEMUIOVector qiov;
>>       struct VirtIOBlockReq *next;
>>       BlockAcctCookie acct;
>> +    QEMUIOVector mr_qiov;
>> +    struct VirtIOBlockReq *mr_next;
>>   } VirtIOBlockReq;
>>   
>> +#define MAX_MERGE_REQS 32
> Why do you need to rename this macro after introducing it in previous patch?

Right ;-)

This was the orgininal name and I changed it later in Patch 2 by request.

Waiting for other feedback in will change it in a next revision.

Peter

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] virtio-blk: introduce multiread
  2014-12-09 16:26 ` [Qemu-devel] [PATCH 4/4] virtio-blk: introduce multiread Peter Lieven
  2014-12-10  7:48   ` Fam Zheng
@ 2014-12-15 15:01   ` Kevin Wolf
  2014-12-15 15:43     ` Peter Lieven
  1 sibling, 1 reply; 14+ messages in thread
From: Kevin Wolf @ 2014-12-15 15:01 UTC (permalink / raw)
  To: Peter Lieven
  Cc: famz, benoit, ming.lei, armbru, qemu-devel, stefanha, pbonzini,
	mreitz

Am 09.12.2014 um 17:26 hat Peter Lieven geschrieben:
> this patch finally introduces multiread support to virtio-blk. While
> multiwrite support was there for a long time, read support was missing.
> 
> To achieve this the patch does several things which might need further
> explanation:
> 
>  - the whole merge and multireq logic is moved from block.c into
>    virtio-blk. This is move is a preparation for directly creating a
>    coroutine out of virtio-blk.
> 
>  - requests are only merged if they are strictly sequential, and no
>    longer sorted. This simplification decreases overhead and reduces
>    latency. It will also merge some requests which were unmergable before.
> 
>    The old algorithm took up to 32 requests, sorted them and tried to merge
>    them. The outcome was anything between 1 and 32 requests. In case of
>    32 requests there were 31 requests unnecessarily delayed.
> 
>    On the other hand let's imagine e.g. 16 unmergeable requests followed
>    by 32 mergable requests. The latter 32 requests would have been split
>    into two 16 byte requests.
> 
>    Last the simplified logic allows for a fast path if we have only a
>    single request in the multirequest. In this case the request is sent as
>    ordinary request without multireq callbacks.
> 
> As a first benchmark I installed Ubuntu 14.04.1 on a local SSD. The number of
> merged requests is in the same order while the write latency is obviously
> decreased by several percent.
> 
> cmdline:
> qemu-system-x86_64 -m 1024 -smp 2 -enable-kvm -cdrom ubuntu-14.04.1-server-amd64.iso \
>  -drive if=virtio,file=/dev/ssd/ubuntu1404,aio=native,cache=none -monitor stdio
> 
> Before:
> virtio0:
>  rd_bytes=151056896 wr_bytes=2683947008 rd_operations=18614 wr_operations=67979
>  flush_operations=15335 wr_total_time_ns=540428034217 rd_total_time_ns=11110520068
>  flush_total_time_ns=40673685006 rd_merged=0 wr_merged=15531
> 
> After:
> virtio0:
>  rd_bytes=149487104 wr_bytes=2701344768 rd_operations=18148 wr_operations=68578
>  flush_operations=15368 wr_total_time_ns=437030089565 rd_total_time_ns=9836288815
>  flush_total_time_ns=40597981121 rd_merged=690 wr_merged=14615
> 
> Some first numbers of improved read performance while booting:
> 
> The Ubuntu 14.04.1 vServer from above:
> virtio0:
>  rd_bytes=97545216 wr_bytes=119808 rd_operations=5071 wr_operations=26
>  flush_operations=2 wr_total_time_ns=8847669 rd_total_time_ns=13952575478
>  flush_total_time_ns=3075496 rd_merged=742 wr_merged=0
> 
> Windows 2012R2 (booted from iSCSI):
> virtio0: rd_bytes=176559104 wr_bytes=61859840 rd_operations=7200 wr_operations=360
>  flush_operations=68 wr_total_time_ns=34344992718 rd_total_time_ns=134386844669
>  flush_total_time_ns=18115517 rd_merged=641 wr_merged=216
> 
> Signed-off-by: Peter Lieven <pl@kamp.de>

Looks pretty good. The only thing I'm still unsure about are possible
integer overflows in the merging logic. Maybe you can have another look
there (ideally not only the places I commented on below, but the whole
function).

> @@ -414,14 +402,81 @@ void virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
>          iov_from_buf(in_iov, in_num, 0, serial, size);
>          virtio_blk_req_complete(req, VIRTIO_BLK_S_OK);
>          virtio_blk_free_request(req);
> -    } else if (type & VIRTIO_BLK_T_OUT) {
> -        qemu_iovec_init_external(&req->qiov, iov, out_num);
> -        virtio_blk_handle_write(req, mrb);
> -    } else if (type == VIRTIO_BLK_T_IN || type == VIRTIO_BLK_T_BARRIER) {
> -        /* VIRTIO_BLK_T_IN is 0, so we can't just & it. */
> -        qemu_iovec_init_external(&req->qiov, in_iov, in_num);
> -        virtio_blk_handle_read(req);
> -    } else {
> +        break;
> +    }
> +    case VIRTIO_BLK_T_IN:
> +    case VIRTIO_BLK_T_OUT:
> +    {
> +        bool is_write = type & VIRTIO_BLK_T_OUT;
> +        int64_t sector_num = virtio_ldq_p(VIRTIO_DEVICE(req->dev),
> +                                          &req->out.sector);
> +        int max_transfer_length = blk_get_max_transfer_length(req->dev->blk);
> +        int nb_sectors = 0;
> +        bool merge = true;
> +
> +        if (!virtio_blk_sect_range_ok(req->dev, sector_num, req->qiov.size)) {
> +            virtio_blk_req_complete(req, VIRTIO_BLK_S_IOERR);
> +            virtio_blk_free_request(req);
> +            return;
> +        }
> +
> +        if (is_write) {
> +            qemu_iovec_init_external(&req->qiov, iov, out_num);
> +            trace_virtio_blk_handle_write(req, sector_num,
> +                                          req->qiov.size / BDRV_SECTOR_SIZE);
> +        } else {
> +            qemu_iovec_init_external(&req->qiov, in_iov, in_num);
> +            trace_virtio_blk_handle_read(req, sector_num,
> +                                         req->qiov.size / BDRV_SECTOR_SIZE);
> +        }
> +
> +        nb_sectors = req->qiov.size / BDRV_SECTOR_SIZE;

qiov.size is controlled by the guest, and nb_sectors is only an int. Are
you sure that this can't overflow?

> +        block_acct_start(blk_get_stats(req->dev->blk),
> +                         &req->acct, req->qiov.size,
> +                         is_write ? BLOCK_ACCT_WRITE : BLOCK_ACCT_READ);
> +
> +        /* merge would exceed maximum number of requests or IOVs */
> +        if (mrb->num_reqs == MAX_MERGE_REQS ||
> +            mrb->niov + req->qiov.niov + 1 > IOV_MAX) {
> +            merge = false;
> +        }
> +
> +        /* merge would exceed maximum transfer length of backend device */
> +        if (max_transfer_length &&
> +            mrb->nb_sectors + nb_sectors > max_transfer_length) {
> +            merge = false;
> +        }
> +
> +        /* requests are not sequential */
> +        if (mrb->num_reqs && mrb->sector_num + mrb->nb_sectors != sector_num) {
> +            merge = false;
> +        }
> +
> +        /* if we switch from read to write or vise versa we should submit
> +         * outstanding requests to avoid unnecessary and potential long delays.
> +         * Furthermore we share the same struct for read and write merging so
> +         * submission is a must here. */
> +        if (is_write != mrb->is_write) {
> +            merge = false;
> +        }
> +
> +        if (!merge) {
> +            virtio_submit_multireq(req->dev->blk, mrb);
> +        }
> +
> +        if (mrb->num_reqs == 0) {
> +            mrb->sector_num = sector_num;
> +            mrb->is_write = is_write;
> +        }
> +
> +        mrb->nb_sectors += req->qiov.size / BDRV_SECTOR_SIZE;

This one could also be problematic with respect to overflows.

> +        mrb->reqs[mrb->num_reqs] = req;
> +        mrb->niov += req->qiov.niov;
> +        mrb->num_reqs++;
> +        break;
> +    }
> +    default:
>          virtio_blk_req_complete(req, VIRTIO_BLK_S_UNSUPP);
>          virtio_blk_free_request(req);
>      }

Kevin

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] virtio-blk: introduce multiread
  2014-12-15 15:01   ` Kevin Wolf
@ 2014-12-15 15:43     ` Peter Lieven
  2014-12-15 15:52       ` Peter Lieven
  2014-12-15 15:57       ` Kevin Wolf
  0 siblings, 2 replies; 14+ messages in thread
From: Peter Lieven @ 2014-12-15 15:43 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: famz, benoit, ming.lei, armbru, qemu-devel, stefanha, pbonzini,
	mreitz

On 15.12.2014 16:01, Kevin Wolf wrote:
> Am 09.12.2014 um 17:26 hat Peter Lieven geschrieben:
>> this patch finally introduces multiread support to virtio-blk. While
>> multiwrite support was there for a long time, read support was missing.
>>
>> To achieve this the patch does several things which might need further
>> explanation:
>>
>>   - the whole merge and multireq logic is moved from block.c into
>>     virtio-blk. This is move is a preparation for directly creating a
>>     coroutine out of virtio-blk.
>>
>>   - requests are only merged if they are strictly sequential, and no
>>     longer sorted. This simplification decreases overhead and reduces
>>     latency. It will also merge some requests which were unmergable before.
>>
>>     The old algorithm took up to 32 requests, sorted them and tried to merge
>>     them. The outcome was anything between 1 and 32 requests. In case of
>>     32 requests there were 31 requests unnecessarily delayed.
>>
>>     On the other hand let's imagine e.g. 16 unmergeable requests followed
>>     by 32 mergable requests. The latter 32 requests would have been split
>>     into two 16 byte requests.
>>
>>     Last the simplified logic allows for a fast path if we have only a
>>     single request in the multirequest. In this case the request is sent as
>>     ordinary request without multireq callbacks.
>>
>> As a first benchmark I installed Ubuntu 14.04.1 on a local SSD. The number of
>> merged requests is in the same order while the write latency is obviously
>> decreased by several percent.
>>
>> cmdline:
>> qemu-system-x86_64 -m 1024 -smp 2 -enable-kvm -cdrom ubuntu-14.04.1-server-amd64.iso \
>>   -drive if=virtio,file=/dev/ssd/ubuntu1404,aio=native,cache=none -monitor stdio
>>
>> Before:
>> virtio0:
>>   rd_bytes=151056896 wr_bytes=2683947008 rd_operations=18614 wr_operations=67979
>>   flush_operations=15335 wr_total_time_ns=540428034217 rd_total_time_ns=11110520068
>>   flush_total_time_ns=40673685006 rd_merged=0 wr_merged=15531
>>
>> After:
>> virtio0:
>>   rd_bytes=149487104 wr_bytes=2701344768 rd_operations=18148 wr_operations=68578
>>   flush_operations=15368 wr_total_time_ns=437030089565 rd_total_time_ns=9836288815
>>   flush_total_time_ns=40597981121 rd_merged=690 wr_merged=14615
>>
>> Some first numbers of improved read performance while booting:
>>
>> The Ubuntu 14.04.1 vServer from above:
>> virtio0:
>>   rd_bytes=97545216 wr_bytes=119808 rd_operations=5071 wr_operations=26
>>   flush_operations=2 wr_total_time_ns=8847669 rd_total_time_ns=13952575478
>>   flush_total_time_ns=3075496 rd_merged=742 wr_merged=0
>>
>> Windows 2012R2 (booted from iSCSI):
>> virtio0: rd_bytes=176559104 wr_bytes=61859840 rd_operations=7200 wr_operations=360
>>   flush_operations=68 wr_total_time_ns=34344992718 rd_total_time_ns=134386844669
>>   flush_total_time_ns=18115517 rd_merged=641 wr_merged=216
>>
>> Signed-off-by: Peter Lieven <pl@kamp.de>
> Looks pretty good. The only thing I'm still unsure about are possible
> integer overflows in the merging logic. Maybe you can have another look
> there (ideally not only the places I commented on below, but the whole
> function).
>
>> @@ -414,14 +402,81 @@ void virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
>>           iov_from_buf(in_iov, in_num, 0, serial, size);
>>           virtio_blk_req_complete(req, VIRTIO_BLK_S_OK);
>>           virtio_blk_free_request(req);
>> -    } else if (type & VIRTIO_BLK_T_OUT) {
>> -        qemu_iovec_init_external(&req->qiov, iov, out_num);
>> -        virtio_blk_handle_write(req, mrb);
>> -    } else if (type == VIRTIO_BLK_T_IN || type == VIRTIO_BLK_T_BARRIER) {
>> -        /* VIRTIO_BLK_T_IN is 0, so we can't just & it. */
>> -        qemu_iovec_init_external(&req->qiov, in_iov, in_num);
>> -        virtio_blk_handle_read(req);
>> -    } else {
>> +        break;
>> +    }
>> +    case VIRTIO_BLK_T_IN:
>> +    case VIRTIO_BLK_T_OUT:
>> +    {
>> +        bool is_write = type & VIRTIO_BLK_T_OUT;
>> +        int64_t sector_num = virtio_ldq_p(VIRTIO_DEVICE(req->dev),
>> +                                          &req->out.sector);
>> +        int max_transfer_length = blk_get_max_transfer_length(req->dev->blk);
>> +        int nb_sectors = 0;
>> +        bool merge = true;
>> +
>> +        if (!virtio_blk_sect_range_ok(req->dev, sector_num, req->qiov.size)) {
>> +            virtio_blk_req_complete(req, VIRTIO_BLK_S_IOERR);
>> +            virtio_blk_free_request(req);
>> +            return;
>> +        }
>> +
>> +        if (is_write) {
>> +            qemu_iovec_init_external(&req->qiov, iov, out_num);
>> +            trace_virtio_blk_handle_write(req, sector_num,
>> +                                          req->qiov.size / BDRV_SECTOR_SIZE);
>> +        } else {
>> +            qemu_iovec_init_external(&req->qiov, in_iov, in_num);
>> +            trace_virtio_blk_handle_read(req, sector_num,
>> +                                         req->qiov.size / BDRV_SECTOR_SIZE);
>> +        }
>> +
>> +        nb_sectors = req->qiov.size / BDRV_SECTOR_SIZE;
> qiov.size is controlled by the guest, and nb_sectors is only an int. Are
> you sure that this can't overflow?

In theory, yes. For this to happen in_iov or iov needs to contain
2TB of data on 32-bit systems. But theoretically there could
also be already an overflow in qemu_iovec_init_external where
multiple size_t are summed up in a size_t.

There has been no overflow checking in the merge routine in
the past, but if you feel better, we could add sth like this:

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index cc0076a..e9236da 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -410,8 +410,8 @@ void virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
          bool is_write = type & VIRTIO_BLK_T_OUT;
          int64_t sector_num = virtio_ldq_p(VIRTIO_DEVICE(req->dev),
                                            &req->out.sector);
-        int max_transfer_length = blk_get_max_transfer_length(req->dev->blk);
-        int nb_sectors = 0;
+        int64_t max_transfer_length = blk_get_max_transfer_length(req->dev->blk);
+        int64_t nb_sectors = 0;
          bool merge = true;

          if (!virtio_blk_sect_range_ok(req->dev, sector_num, req->qiov.size)) {
@@ -431,6 +431,7 @@ void virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
          }

          nb_sectors = req->qiov.size / BDRV_SECTOR_SIZE;
+        max_transfer_length = MIN_NON_ZERO(max_transfer_length, INT_MAX);

          block_acct_start(blk_get_stats(req->dev->blk),
                           &req->acct, req->qiov.size,
@@ -443,8 +444,7 @@ void virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
          }

          /* merge would exceed maximum transfer length of backend device */
-        if (max_transfer_length &&
-            mrb->nb_sectors + nb_sectors > max_transfer_length) {
+        if (nb_sectors + mrb->nb_sectors > max_transfer_length) {
              merge = false;
          }



Peter

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] virtio-blk: introduce multiread
  2014-12-15 15:43     ` Peter Lieven
@ 2014-12-15 15:52       ` Peter Lieven
  2014-12-15 16:00         ` Kevin Wolf
  2014-12-15 15:57       ` Kevin Wolf
  1 sibling, 1 reply; 14+ messages in thread
From: Peter Lieven @ 2014-12-15 15:52 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: famz, benoit, ming.lei, armbru, qemu-devel, stefanha, pbonzini,
	mreitz

On 15.12.2014 16:43, Peter Lieven wrote:
> On 15.12.2014 16:01, Kevin Wolf wrote:
>> Am 09.12.2014 um 17:26 hat Peter Lieven geschrieben:
>>> this patch finally introduces multiread support to virtio-blk. While
>>> multiwrite support was there for a long time, read support was missing.
>>>
>>> To achieve this the patch does several things which might need further
>>> explanation:
>>>
>>>   - the whole merge and multireq logic is moved from block.c into
>>>     virtio-blk. This is move is a preparation for directly creating a
>>>     coroutine out of virtio-blk.
>>>
>>>   - requests are only merged if they are strictly sequential, and no
>>>     longer sorted. This simplification decreases overhead and reduces
>>>     latency. It will also merge some requests which were unmergable before.
>>>
>>>     The old algorithm took up to 32 requests, sorted them and tried to merge
>>>     them. The outcome was anything between 1 and 32 requests. In case of
>>>     32 requests there were 31 requests unnecessarily delayed.
>>>
>>>     On the other hand let's imagine e.g. 16 unmergeable requests followed
>>>     by 32 mergable requests. The latter 32 requests would have been split
>>>     into two 16 byte requests.
>>>
>>>     Last the simplified logic allows for a fast path if we have only a
>>>     single request in the multirequest. In this case the request is sent as
>>>     ordinary request without multireq callbacks.
>>>
>>> As a first benchmark I installed Ubuntu 14.04.1 on a local SSD. The number of
>>> merged requests is in the same order while the write latency is obviously
>>> decreased by several percent.
>>>
>>> cmdline:
>>> qemu-system-x86_64 -m 1024 -smp 2 -enable-kvm -cdrom ubuntu-14.04.1-server-amd64.iso \
>>>   -drive if=virtio,file=/dev/ssd/ubuntu1404,aio=native,cache=none -monitor stdio
>>>
>>> Before:
>>> virtio0:
>>>   rd_bytes=151056896 wr_bytes=2683947008 rd_operations=18614 wr_operations=67979
>>>   flush_operations=15335 wr_total_time_ns=540428034217 rd_total_time_ns=11110520068
>>>   flush_total_time_ns=40673685006 rd_merged=0 wr_merged=15531
>>>
>>> After:
>>> virtio0:
>>>   rd_bytes=149487104 wr_bytes=2701344768 rd_operations=18148 wr_operations=68578
>>>   flush_operations=15368 wr_total_time_ns=437030089565 rd_total_time_ns=9836288815
>>>   flush_total_time_ns=40597981121 rd_merged=690 wr_merged=14615
>>>
>>> Some first numbers of improved read performance while booting:
>>>
>>> The Ubuntu 14.04.1 vServer from above:
>>> virtio0:
>>>   rd_bytes=97545216 wr_bytes=119808 rd_operations=5071 wr_operations=26
>>>   flush_operations=2 wr_total_time_ns=8847669 rd_total_time_ns=13952575478
>>>   flush_total_time_ns=3075496 rd_merged=742 wr_merged=0
>>>
>>> Windows 2012R2 (booted from iSCSI):
>>> virtio0: rd_bytes=176559104 wr_bytes=61859840 rd_operations=7200 wr_operations=360
>>>   flush_operations=68 wr_total_time_ns=34344992718 rd_total_time_ns=134386844669
>>>   flush_total_time_ns=18115517 rd_merged=641 wr_merged=216
>>>
>>> Signed-off-by: Peter Lieven <pl@kamp.de>
>> Looks pretty good. The only thing I'm still unsure about are possible
>> integer overflows in the merging logic. Maybe you can have another look
>> there (ideally not only the places I commented on below, but the whole
>> function).
>>
>>> @@ -414,14 +402,81 @@ void virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
>>>           iov_from_buf(in_iov, in_num, 0, serial, size);
>>>           virtio_blk_req_complete(req, VIRTIO_BLK_S_OK);
>>>           virtio_blk_free_request(req);
>>> -    } else if (type & VIRTIO_BLK_T_OUT) {
>>> -        qemu_iovec_init_external(&req->qiov, iov, out_num);
>>> -        virtio_blk_handle_write(req, mrb);
>>> -    } else if (type == VIRTIO_BLK_T_IN || type == VIRTIO_BLK_T_BARRIER) {
>>> -        /* VIRTIO_BLK_T_IN is 0, so we can't just & it. */
>>> -        qemu_iovec_init_external(&req->qiov, in_iov, in_num);
>>> -        virtio_blk_handle_read(req);
>>> -    } else {
>>> +        break;
>>> +    }
>>> +    case VIRTIO_BLK_T_IN:
>>> +    case VIRTIO_BLK_T_OUT:
>>> +    {
>>> +        bool is_write = type & VIRTIO_BLK_T_OUT;
>>> +        int64_t sector_num = virtio_ldq_p(VIRTIO_DEVICE(req->dev),
>>> + &req->out.sector);
>>> +        int max_transfer_length = blk_get_max_transfer_length(req->dev->blk);
>>> +        int nb_sectors = 0;
>>> +        bool merge = true;
>>> +
>>> +        if (!virtio_blk_sect_range_ok(req->dev, sector_num, req->qiov.size)) {
>>> +            virtio_blk_req_complete(req, VIRTIO_BLK_S_IOERR);
>>> +            virtio_blk_free_request(req);
>>> +            return;
>>> +        }
>>> +
>>> +        if (is_write) {
>>> +            qemu_iovec_init_external(&req->qiov, iov, out_num);
>>> +            trace_virtio_blk_handle_write(req, sector_num,
>>> +                                          req->qiov.size / BDRV_SECTOR_SIZE);
>>> +        } else {
>>> +            qemu_iovec_init_external(&req->qiov, in_iov, in_num);
>>> +            trace_virtio_blk_handle_read(req, sector_num,
>>> +                                         req->qiov.size / BDRV_SECTOR_SIZE);
>>> +        }
>>> +
>>> +        nb_sectors = req->qiov.size / BDRV_SECTOR_SIZE;
>> qiov.size is controlled by the guest, and nb_sectors is only an int. Are
>> you sure that this can't overflow?
>
> In theory, yes. For this to happen in_iov or iov needs to contain
> 2TB of data on 32-bit systems. But theoretically there could
> also be already an overflow in qemu_iovec_init_external where
> multiple size_t are summed up in a size_t.
>
> There has been no overflow checking in the merge routine in
> the past, but if you feel better, we could add sth like this:
>
> diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
> index cc0076a..e9236da 100644
> --- a/hw/block/virtio-blk.c
> +++ b/hw/block/virtio-blk.c
> @@ -410,8 +410,8 @@ void virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
>          bool is_write = type & VIRTIO_BLK_T_OUT;
>          int64_t sector_num = virtio_ldq_p(VIRTIO_DEVICE(req->dev),
> &req->out.sector);
> -        int max_transfer_length = blk_get_max_transfer_length(req->dev->blk);
> -        int nb_sectors = 0;
> +        int64_t max_transfer_length = blk_get_max_transfer_length(req->dev->blk);
> +        int64_t nb_sectors = 0;
>          bool merge = true;
>
>          if (!virtio_blk_sect_range_ok(req->dev, sector_num, req->qiov.size)) {
> @@ -431,6 +431,7 @@ void virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
>          }
>
>          nb_sectors = req->qiov.size / BDRV_SECTOR_SIZE;
> +        max_transfer_length = MIN_NON_ZERO(max_transfer_length, INT_MAX);
>
>          block_acct_start(blk_get_stats(req->dev->blk),
>                           &req->acct, req->qiov.size,
> @@ -443,8 +444,7 @@ void virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
>          }
>
>          /* merge would exceed maximum transfer length of backend device */
> -        if (max_transfer_length &&
> -            mrb->nb_sectors + nb_sectors > max_transfer_length) {
> +        if (nb_sectors + mrb->nb_sectors > max_transfer_length) {
>              merge = false;
>          }
>

May also this here:

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index cc0076a..fa647b6 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -333,6 +333,9 @@ static bool virtio_blk_sect_range_ok(VirtIOBlock *dev,
      uint64_t nb_sectors = size >> BDRV_SECTOR_BITS;
      uint64_t total_sectors;

+    if (nb_sectors > INT_MAX) {
+        return false;
+    }
      if (sector & dev->sector_mask) {
          return false;
      }


Thats something that has not been checked for ages as well.

Peter

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] virtio-blk: introduce multiread
  2014-12-15 15:43     ` Peter Lieven
  2014-12-15 15:52       ` Peter Lieven
@ 2014-12-15 15:57       ` Kevin Wolf
  1 sibling, 0 replies; 14+ messages in thread
From: Kevin Wolf @ 2014-12-15 15:57 UTC (permalink / raw)
  To: Peter Lieven
  Cc: famz, benoit, ming.lei, armbru, qemu-devel, stefanha, pbonzini,
	mreitz

Am 15.12.2014 um 16:43 hat Peter Lieven geschrieben:
> On 15.12.2014 16:01, Kevin Wolf wrote:
> >Am 09.12.2014 um 17:26 hat Peter Lieven geschrieben:
> >>this patch finally introduces multiread support to virtio-blk. While
> >>multiwrite support was there for a long time, read support was missing.
> >>
> >>To achieve this the patch does several things which might need further
> >>explanation:
> >>
> >>  - the whole merge and multireq logic is moved from block.c into
> >>    virtio-blk. This is move is a preparation for directly creating a
> >>    coroutine out of virtio-blk.
> >>
> >>  - requests are only merged if they are strictly sequential, and no
> >>    longer sorted. This simplification decreases overhead and reduces
> >>    latency. It will also merge some requests which were unmergable before.
> >>
> >>    The old algorithm took up to 32 requests, sorted them and tried to merge
> >>    them. The outcome was anything between 1 and 32 requests. In case of
> >>    32 requests there were 31 requests unnecessarily delayed.
> >>
> >>    On the other hand let's imagine e.g. 16 unmergeable requests followed
> >>    by 32 mergable requests. The latter 32 requests would have been split
> >>    into two 16 byte requests.
> >>
> >>    Last the simplified logic allows for a fast path if we have only a
> >>    single request in the multirequest. In this case the request is sent as
> >>    ordinary request without multireq callbacks.
> >>
> >>As a first benchmark I installed Ubuntu 14.04.1 on a local SSD. The number of
> >>merged requests is in the same order while the write latency is obviously
> >>decreased by several percent.
> >>
> >>cmdline:
> >>qemu-system-x86_64 -m 1024 -smp 2 -enable-kvm -cdrom ubuntu-14.04.1-server-amd64.iso \
> >>  -drive if=virtio,file=/dev/ssd/ubuntu1404,aio=native,cache=none -monitor stdio
> >>
> >>Before:
> >>virtio0:
> >>  rd_bytes=151056896 wr_bytes=2683947008 rd_operations=18614 wr_operations=67979
> >>  flush_operations=15335 wr_total_time_ns=540428034217 rd_total_time_ns=11110520068
> >>  flush_total_time_ns=40673685006 rd_merged=0 wr_merged=15531
> >>
> >>After:
> >>virtio0:
> >>  rd_bytes=149487104 wr_bytes=2701344768 rd_operations=18148 wr_operations=68578
> >>  flush_operations=15368 wr_total_time_ns=437030089565 rd_total_time_ns=9836288815
> >>  flush_total_time_ns=40597981121 rd_merged=690 wr_merged=14615
> >>
> >>Some first numbers of improved read performance while booting:
> >>
> >>The Ubuntu 14.04.1 vServer from above:
> >>virtio0:
> >>  rd_bytes=97545216 wr_bytes=119808 rd_operations=5071 wr_operations=26
> >>  flush_operations=2 wr_total_time_ns=8847669 rd_total_time_ns=13952575478
> >>  flush_total_time_ns=3075496 rd_merged=742 wr_merged=0
> >>
> >>Windows 2012R2 (booted from iSCSI):
> >>virtio0: rd_bytes=176559104 wr_bytes=61859840 rd_operations=7200 wr_operations=360
> >>  flush_operations=68 wr_total_time_ns=34344992718 rd_total_time_ns=134386844669
> >>  flush_total_time_ns=18115517 rd_merged=641 wr_merged=216
> >>
> >>Signed-off-by: Peter Lieven <pl@kamp.de>
> >Looks pretty good. The only thing I'm still unsure about are possible
> >integer overflows in the merging logic. Maybe you can have another look
> >there (ideally not only the places I commented on below, but the whole
> >function).
> >
> >>@@ -414,14 +402,81 @@ void virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
> >>          iov_from_buf(in_iov, in_num, 0, serial, size);
> >>          virtio_blk_req_complete(req, VIRTIO_BLK_S_OK);
> >>          virtio_blk_free_request(req);
> >>-    } else if (type & VIRTIO_BLK_T_OUT) {
> >>-        qemu_iovec_init_external(&req->qiov, iov, out_num);
> >>-        virtio_blk_handle_write(req, mrb);
> >>-    } else if (type == VIRTIO_BLK_T_IN || type == VIRTIO_BLK_T_BARRIER) {
> >>-        /* VIRTIO_BLK_T_IN is 0, so we can't just & it. */
> >>-        qemu_iovec_init_external(&req->qiov, in_iov, in_num);
> >>-        virtio_blk_handle_read(req);
> >>-    } else {
> >>+        break;
> >>+    }
> >>+    case VIRTIO_BLK_T_IN:
> >>+    case VIRTIO_BLK_T_OUT:
> >>+    {
> >>+        bool is_write = type & VIRTIO_BLK_T_OUT;
> >>+        int64_t sector_num = virtio_ldq_p(VIRTIO_DEVICE(req->dev),
> >>+                                          &req->out.sector);
> >>+        int max_transfer_length = blk_get_max_transfer_length(req->dev->blk);
> >>+        int nb_sectors = 0;
> >>+        bool merge = true;
> >>+
> >>+        if (!virtio_blk_sect_range_ok(req->dev, sector_num, req->qiov.size)) {
> >>+            virtio_blk_req_complete(req, VIRTIO_BLK_S_IOERR);
> >>+            virtio_blk_free_request(req);
> >>+            return;
> >>+        }
> >>+
> >>+        if (is_write) {
> >>+            qemu_iovec_init_external(&req->qiov, iov, out_num);
> >>+            trace_virtio_blk_handle_write(req, sector_num,
> >>+                                          req->qiov.size / BDRV_SECTOR_SIZE);
> >>+        } else {
> >>+            qemu_iovec_init_external(&req->qiov, in_iov, in_num);
> >>+            trace_virtio_blk_handle_read(req, sector_num,
> >>+                                         req->qiov.size / BDRV_SECTOR_SIZE);
> >>+        }
> >>+
> >>+        nb_sectors = req->qiov.size / BDRV_SECTOR_SIZE;
> >qiov.size is controlled by the guest, and nb_sectors is only an int. Are
> >you sure that this can't overflow?
> 
> In theory, yes. For this to happen in_iov or iov needs to contain
> 2TB of data on 32-bit systems. But theoretically there could
> also be already an overflow in qemu_iovec_init_external where
> multiple size_t are summed up in a size_t.

Yes, it won't happen accidentally. A malicious guest could easily do it,
however. There is nothing that checks that the iov doesn't contain a
memory area multiple times.

I haven't checked whether anything bad would happen in practice if
nb_sectors overflows, but better avoid the possibility in the first
place.

> There has been no overflow checking in the merge routine in
> the past, but if you feel better, we could add sth like this:
> 
> diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
> index cc0076a..e9236da 100644
> --- a/hw/block/virtio-blk.c
> +++ b/hw/block/virtio-blk.c
> @@ -410,8 +410,8 @@ void virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
>          bool is_write = type & VIRTIO_BLK_T_OUT;
>          int64_t sector_num = virtio_ldq_p(VIRTIO_DEVICE(req->dev),
>                                            &req->out.sector);
> -        int max_transfer_length = blk_get_max_transfer_length(req->dev->blk);
> -        int nb_sectors = 0;
> +        int64_t max_transfer_length = blk_get_max_transfer_length(req->dev->blk);
> +        int64_t nb_sectors = 0;
>          bool merge = true;
> 
>          if (!virtio_blk_sect_range_ok(req->dev, sector_num, req->qiov.size)) {
> @@ -431,6 +431,7 @@ void virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
>          }
> 
>          nb_sectors = req->qiov.size / BDRV_SECTOR_SIZE;
> +        max_transfer_length = MIN_NON_ZERO(max_transfer_length, INT_MAX);
> 
>          block_acct_start(blk_get_stats(req->dev->blk),
>                           &req->acct, req->qiov.size,
> @@ -443,8 +444,7 @@ void virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
>          }
> 
>          /* merge would exceed maximum transfer length of backend device */
> -        if (max_transfer_length &&
> -            mrb->nb_sectors + nb_sectors > max_transfer_length) {
> +        if (nb_sectors + mrb->nb_sectors > max_transfer_length) {
>              merge = false;
>          }

Yes, this should work.

Kevin

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] virtio-blk: introduce multiread
  2014-12-15 15:52       ` Peter Lieven
@ 2014-12-15 16:00         ` Kevin Wolf
  2014-12-15 16:02           ` Peter Lieven
  0 siblings, 1 reply; 14+ messages in thread
From: Kevin Wolf @ 2014-12-15 16:00 UTC (permalink / raw)
  To: Peter Lieven
  Cc: famz, benoit, ming.lei, armbru, qemu-devel, stefanha, pbonzini,
	mreitz

Am 15.12.2014 um 16:52 hat Peter Lieven geschrieben:
> On 15.12.2014 16:43, Peter Lieven wrote:
> >On 15.12.2014 16:01, Kevin Wolf wrote:
> >>Am 09.12.2014 um 17:26 hat Peter Lieven geschrieben:
> >>>this patch finally introduces multiread support to virtio-blk. While
> >>>multiwrite support was there for a long time, read support was missing.
> >>>
> >>>To achieve this the patch does several things which might need further
> >>>explanation:
> >>>
> >>>  - the whole merge and multireq logic is moved from block.c into
> >>>    virtio-blk. This is move is a preparation for directly creating a
> >>>    coroutine out of virtio-blk.
> >>>
> >>>  - requests are only merged if they are strictly sequential, and no
> >>>    longer sorted. This simplification decreases overhead and reduces
> >>>    latency. It will also merge some requests which were unmergable before.
> >>>
> >>>    The old algorithm took up to 32 requests, sorted them and tried to merge
> >>>    them. The outcome was anything between 1 and 32 requests. In case of
> >>>    32 requests there were 31 requests unnecessarily delayed.
> >>>
> >>>    On the other hand let's imagine e.g. 16 unmergeable requests followed
> >>>    by 32 mergable requests. The latter 32 requests would have been split
> >>>    into two 16 byte requests.
> >>>
> >>>    Last the simplified logic allows for a fast path if we have only a
> >>>    single request in the multirequest. In this case the request is sent as
> >>>    ordinary request without multireq callbacks.
> >>>
> >>>As a first benchmark I installed Ubuntu 14.04.1 on a local SSD. The number of
> >>>merged requests is in the same order while the write latency is obviously
> >>>decreased by several percent.
> >>>
> >>>cmdline:
> >>>qemu-system-x86_64 -m 1024 -smp 2 -enable-kvm -cdrom ubuntu-14.04.1-server-amd64.iso \
> >>>  -drive if=virtio,file=/dev/ssd/ubuntu1404,aio=native,cache=none -monitor stdio
> >>>
> >>>Before:
> >>>virtio0:
> >>>  rd_bytes=151056896 wr_bytes=2683947008 rd_operations=18614 wr_operations=67979
> >>>  flush_operations=15335 wr_total_time_ns=540428034217 rd_total_time_ns=11110520068
> >>>  flush_total_time_ns=40673685006 rd_merged=0 wr_merged=15531
> >>>
> >>>After:
> >>>virtio0:
> >>>  rd_bytes=149487104 wr_bytes=2701344768 rd_operations=18148 wr_operations=68578
> >>>  flush_operations=15368 wr_total_time_ns=437030089565 rd_total_time_ns=9836288815
> >>>  flush_total_time_ns=40597981121 rd_merged=690 wr_merged=14615
> >>>
> >>>Some first numbers of improved read performance while booting:
> >>>
> >>>The Ubuntu 14.04.1 vServer from above:
> >>>virtio0:
> >>>  rd_bytes=97545216 wr_bytes=119808 rd_operations=5071 wr_operations=26
> >>>  flush_operations=2 wr_total_time_ns=8847669 rd_total_time_ns=13952575478
> >>>  flush_total_time_ns=3075496 rd_merged=742 wr_merged=0
> >>>
> >>>Windows 2012R2 (booted from iSCSI):
> >>>virtio0: rd_bytes=176559104 wr_bytes=61859840 rd_operations=7200 wr_operations=360
> >>>  flush_operations=68 wr_total_time_ns=34344992718 rd_total_time_ns=134386844669
> >>>  flush_total_time_ns=18115517 rd_merged=641 wr_merged=216
> >>>
> >>>Signed-off-by: Peter Lieven <pl@kamp.de>
> >>Looks pretty good. The only thing I'm still unsure about are possible
> >>integer overflows in the merging logic. Maybe you can have another look
> >>there (ideally not only the places I commented on below, but the whole
> >>function).
> >>
> >>>@@ -414,14 +402,81 @@ void virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
> >>>          iov_from_buf(in_iov, in_num, 0, serial, size);
> >>>          virtio_blk_req_complete(req, VIRTIO_BLK_S_OK);
> >>>          virtio_blk_free_request(req);
> >>>-    } else if (type & VIRTIO_BLK_T_OUT) {
> >>>-        qemu_iovec_init_external(&req->qiov, iov, out_num);
> >>>-        virtio_blk_handle_write(req, mrb);
> >>>-    } else if (type == VIRTIO_BLK_T_IN || type == VIRTIO_BLK_T_BARRIER) {
> >>>-        /* VIRTIO_BLK_T_IN is 0, so we can't just & it. */
> >>>-        qemu_iovec_init_external(&req->qiov, in_iov, in_num);
> >>>-        virtio_blk_handle_read(req);
> >>>-    } else {
> >>>+        break;
> >>>+    }
> >>>+    case VIRTIO_BLK_T_IN:
> >>>+    case VIRTIO_BLK_T_OUT:
> >>>+    {
> >>>+        bool is_write = type & VIRTIO_BLK_T_OUT;
> >>>+        int64_t sector_num = virtio_ldq_p(VIRTIO_DEVICE(req->dev),
> >>>+ &req->out.sector);
> >>>+        int max_transfer_length = blk_get_max_transfer_length(req->dev->blk);
> >>>+        int nb_sectors = 0;
> >>>+        bool merge = true;
> >>>+
> >>>+        if (!virtio_blk_sect_range_ok(req->dev, sector_num, req->qiov.size)) {
> >>>+            virtio_blk_req_complete(req, VIRTIO_BLK_S_IOERR);
> >>>+            virtio_blk_free_request(req);
> >>>+            return;
> >>>+        }
> >>>+
> >>>+        if (is_write) {
> >>>+            qemu_iovec_init_external(&req->qiov, iov, out_num);
> >>>+            trace_virtio_blk_handle_write(req, sector_num,
> >>>+                                          req->qiov.size / BDRV_SECTOR_SIZE);
> >>>+        } else {
> >>>+            qemu_iovec_init_external(&req->qiov, in_iov, in_num);
> >>>+            trace_virtio_blk_handle_read(req, sector_num,
> >>>+                                         req->qiov.size / BDRV_SECTOR_SIZE);
> >>>+        }
> >>>+
> >>>+        nb_sectors = req->qiov.size / BDRV_SECTOR_SIZE;
> >>qiov.size is controlled by the guest, and nb_sectors is only an int. Are
> >>you sure that this can't overflow?
> >
> >In theory, yes. For this to happen in_iov or iov needs to contain
> >2TB of data on 32-bit systems. But theoretically there could
> >also be already an overflow in qemu_iovec_init_external where
> >multiple size_t are summed up in a size_t.
> >
> >There has been no overflow checking in the merge routine in
> >the past, but if you feel better, we could add sth like this:
> >
> >diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
> >index cc0076a..e9236da 100644
> >--- a/hw/block/virtio-blk.c
> >+++ b/hw/block/virtio-blk.c
> >@@ -410,8 +410,8 @@ void virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
> >         bool is_write = type & VIRTIO_BLK_T_OUT;
> >         int64_t sector_num = virtio_ldq_p(VIRTIO_DEVICE(req->dev),
> >&req->out.sector);
> >-        int max_transfer_length = blk_get_max_transfer_length(req->dev->blk);
> >-        int nb_sectors = 0;
> >+        int64_t max_transfer_length = blk_get_max_transfer_length(req->dev->blk);
> >+        int64_t nb_sectors = 0;
> >         bool merge = true;
> >
> >         if (!virtio_blk_sect_range_ok(req->dev, sector_num, req->qiov.size)) {
> >@@ -431,6 +431,7 @@ void virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
> >         }
> >
> >         nb_sectors = req->qiov.size / BDRV_SECTOR_SIZE;
> >+        max_transfer_length = MIN_NON_ZERO(max_transfer_length, INT_MAX);
> >
> >         block_acct_start(blk_get_stats(req->dev->blk),
> >                          &req->acct, req->qiov.size,
> >@@ -443,8 +444,7 @@ void virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
> >         }
> >
> >         /* merge would exceed maximum transfer length of backend device */
> >-        if (max_transfer_length &&
> >-            mrb->nb_sectors + nb_sectors > max_transfer_length) {
> >+        if (nb_sectors + mrb->nb_sectors > max_transfer_length) {
> >             merge = false;
> >         }
> >
> 
> May also this here:
> 
> diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
> index cc0076a..fa647b6 100644
> --- a/hw/block/virtio-blk.c
> +++ b/hw/block/virtio-blk.c
> @@ -333,6 +333,9 @@ static bool virtio_blk_sect_range_ok(VirtIOBlock *dev,
>      uint64_t nb_sectors = size >> BDRV_SECTOR_BITS;
>      uint64_t total_sectors;
> 
> +    if (nb_sectors > INT_MAX) {
> +        return false;
> +    }
>      if (sector & dev->sector_mask) {
>          return false;
>      }
> 
> 
> Thats something that has not been checked for ages as well.

Adding checks can never hurt, so go for it. ;-)

Kevin

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] virtio-blk: introduce multiread
  2014-12-15 16:00         ` Kevin Wolf
@ 2014-12-15 16:02           ` Peter Lieven
  0 siblings, 0 replies; 14+ messages in thread
From: Peter Lieven @ 2014-12-15 16:02 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: famz@redhat.com, benoit@irqsave.net, ming.lei@canonical.com,
	armbru@redhat.com, qemu-devel@nongnu.org, stefanha@redhat.com,
	pbonzini@redhat.com, mreitz@redhat.com



> Am 15.12.2014 um 17:00 schrieb Kevin Wolf <kwolf@redhat.com>:
> 
> Am 15.12.2014 um 16:52 hat Peter Lieven geschrieben:
>> On 15.12.2014 16:43, Peter Lieven wrote:
>>> On 15.12.2014 16:01, Kevin Wolf wrote:
>>>> Am 09.12.2014 um 17:26 hat Peter Lieven geschrieben:
>>>>> this patch finally introduces multiread support to virtio-blk. While
>>>>> multiwrite support was there for a long time, read support was missing.
>>>>> 
>>>>> To achieve this the patch does several things which might need further
>>>>> explanation:
>>>>> 
>>>>> - the whole merge and multireq logic is moved from block.c into
>>>>>   virtio-blk. This is move is a preparation for directly creating a
>>>>>   coroutine out of virtio-blk.
>>>>> 
>>>>> - requests are only merged if they are strictly sequential, and no
>>>>>   longer sorted. This simplification decreases overhead and reduces
>>>>>   latency. It will also merge some requests which were unmergable before.
>>>>> 
>>>>>   The old algorithm took up to 32 requests, sorted them and tried to merge
>>>>>   them. The outcome was anything between 1 and 32 requests. In case of
>>>>>   32 requests there were 31 requests unnecessarily delayed.
>>>>> 
>>>>>   On the other hand let's imagine e.g. 16 unmergeable requests followed
>>>>>   by 32 mergable requests. The latter 32 requests would have been split
>>>>>   into two 16 byte requests.
>>>>> 
>>>>>   Last the simplified logic allows for a fast path if we have only a
>>>>>   single request in the multirequest. In this case the request is sent as
>>>>>   ordinary request without multireq callbacks.
>>>>> 
>>>>> As a first benchmark I installed Ubuntu 14.04.1 on a local SSD. The number of
>>>>> merged requests is in the same order while the write latency is obviously
>>>>> decreased by several percent.
>>>>> 
>>>>> cmdline:
>>>>> qemu-system-x86_64 -m 1024 -smp 2 -enable-kvm -cdrom ubuntu-14.04.1-server-amd64.iso \
>>>>> -drive if=virtio,file=/dev/ssd/ubuntu1404,aio=native,cache=none -monitor stdio
>>>>> 
>>>>> Before:
>>>>> virtio0:
>>>>> rd_bytes=151056896 wr_bytes=2683947008 rd_operations=18614 wr_operations=67979
>>>>> flush_operations=15335 wr_total_time_ns=540428034217 rd_total_time_ns=11110520068
>>>>> flush_total_time_ns=40673685006 rd_merged=0 wr_merged=15531
>>>>> 
>>>>> After:
>>>>> virtio0:
>>>>> rd_bytes=149487104 wr_bytes=2701344768 rd_operations=18148 wr_operations=68578
>>>>> flush_operations=15368 wr_total_time_ns=437030089565 rd_total_time_ns=9836288815
>>>>> flush_total_time_ns=40597981121 rd_merged=690 wr_merged=14615
>>>>> 
>>>>> Some first numbers of improved read performance while booting:
>>>>> 
>>>>> The Ubuntu 14.04.1 vServer from above:
>>>>> virtio0:
>>>>> rd_bytes=97545216 wr_bytes=119808 rd_operations=5071 wr_operations=26
>>>>> flush_operations=2 wr_total_time_ns=8847669 rd_total_time_ns=13952575478
>>>>> flush_total_time_ns=3075496 rd_merged=742 wr_merged=0
>>>>> 
>>>>> Windows 2012R2 (booted from iSCSI):
>>>>> virtio0: rd_bytes=176559104 wr_bytes=61859840 rd_operations=7200 wr_operations=360
>>>>> flush_operations=68 wr_total_time_ns=34344992718 rd_total_time_ns=134386844669
>>>>> flush_total_time_ns=18115517 rd_merged=641 wr_merged=216
>>>>> 
>>>>> Signed-off-by: Peter Lieven <pl@kamp.de>
>>>> Looks pretty good. The only thing I'm still unsure about are possible
>>>> integer overflows in the merging logic. Maybe you can have another look
>>>> there (ideally not only the places I commented on below, but the whole
>>>> function).
>>>> 
>>>>> @@ -414,14 +402,81 @@ void virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
>>>>>         iov_from_buf(in_iov, in_num, 0, serial, size);
>>>>>         virtio_blk_req_complete(req, VIRTIO_BLK_S_OK);
>>>>>         virtio_blk_free_request(req);
>>>>> -    } else if (type & VIRTIO_BLK_T_OUT) {
>>>>> -        qemu_iovec_init_external(&req->qiov, iov, out_num);
>>>>> -        virtio_blk_handle_write(req, mrb);
>>>>> -    } else if (type == VIRTIO_BLK_T_IN || type == VIRTIO_BLK_T_BARRIER) {
>>>>> -        /* VIRTIO_BLK_T_IN is 0, so we can't just & it. */
>>>>> -        qemu_iovec_init_external(&req->qiov, in_iov, in_num);
>>>>> -        virtio_blk_handle_read(req);
>>>>> -    } else {
>>>>> +        break;
>>>>> +    }
>>>>> +    case VIRTIO_BLK_T_IN:
>>>>> +    case VIRTIO_BLK_T_OUT:
>>>>> +    {
>>>>> +        bool is_write = type & VIRTIO_BLK_T_OUT;
>>>>> +        int64_t sector_num = virtio_ldq_p(VIRTIO_DEVICE(req->dev),
>>>>> + &req->out.sector);
>>>>> +        int max_transfer_length = blk_get_max_transfer_length(req->dev->blk);
>>>>> +        int nb_sectors = 0;
>>>>> +        bool merge = true;
>>>>> +
>>>>> +        if (!virtio_blk_sect_range_ok(req->dev, sector_num, req->qiov.size)) {
>>>>> +            virtio_blk_req_complete(req, VIRTIO_BLK_S_IOERR);
>>>>> +            virtio_blk_free_request(req);
>>>>> +            return;
>>>>> +        }
>>>>> +
>>>>> +        if (is_write) {
>>>>> +            qemu_iovec_init_external(&req->qiov, iov, out_num);
>>>>> +            trace_virtio_blk_handle_write(req, sector_num,
>>>>> +                                          req->qiov.size / BDRV_SECTOR_SIZE);
>>>>> +        } else {
>>>>> +            qemu_iovec_init_external(&req->qiov, in_iov, in_num);
>>>>> +            trace_virtio_blk_handle_read(req, sector_num,
>>>>> +                                         req->qiov.size / BDRV_SECTOR_SIZE);
>>>>> +        }
>>>>> +
>>>>> +        nb_sectors = req->qiov.size / BDRV_SECTOR_SIZE;
>>>> qiov.size is controlled by the guest, and nb_sectors is only an int. Are
>>>> you sure that this can't overflow?
>>> 
>>> In theory, yes. For this to happen in_iov or iov needs to contain
>>> 2TB of data on 32-bit systems. But theoretically there could
>>> also be already an overflow in qemu_iovec_init_external where
>>> multiple size_t are summed up in a size_t.
>>> 
>>> There has been no overflow checking in the merge routine in
>>> the past, but if you feel better, we could add sth like this:
>>> 
>>> diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
>>> index cc0076a..e9236da 100644
>>> --- a/hw/block/virtio-blk.c
>>> +++ b/hw/block/virtio-blk.c
>>> @@ -410,8 +410,8 @@ void virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
>>>        bool is_write = type & VIRTIO_BLK_T_OUT;
>>>        int64_t sector_num = virtio_ldq_p(VIRTIO_DEVICE(req->dev),
>>> &req->out.sector);
>>> -        int max_transfer_length = blk_get_max_transfer_length(req->dev->blk);
>>> -        int nb_sectors = 0;
>>> +        int64_t max_transfer_length = blk_get_max_transfer_length(req->dev->blk);
>>> +        int64_t nb_sectors = 0;
>>>        bool merge = true;
>>> 
>>>        if (!virtio_blk_sect_range_ok(req->dev, sector_num, req->qiov.size)) {
>>> @@ -431,6 +431,7 @@ void virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
>>>        }
>>> 
>>>        nb_sectors = req->qiov.size / BDRV_SECTOR_SIZE;
>>> +        max_transfer_length = MIN_NON_ZERO(max_transfer_length, INT_MAX);
>>> 
>>>        block_acct_start(blk_get_stats(req->dev->blk),
>>>                         &req->acct, req->qiov.size,
>>> @@ -443,8 +444,7 @@ void virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
>>>        }
>>> 
>>>        /* merge would exceed maximum transfer length of backend device */
>>> -        if (max_transfer_length &&
>>> -            mrb->nb_sectors + nb_sectors > max_transfer_length) {
>>> +        if (nb_sectors + mrb->nb_sectors > max_transfer_length) {
>>>            merge = false;
>>>        }
>> 
>> May also this here:
>> 
>> diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
>> index cc0076a..fa647b6 100644
>> --- a/hw/block/virtio-blk.c
>> +++ b/hw/block/virtio-blk.c
>> @@ -333,6 +333,9 @@ static bool virtio_blk_sect_range_ok(VirtIOBlock *dev,
>>     uint64_t nb_sectors = size >> BDRV_SECTOR_BITS;
>>     uint64_t total_sectors;
>> 
>> +    if (nb_sectors > INT_MAX) {
>> +        return false;
>> +    }
>>     if (sector & dev->sector_mask) {
>>         return false;
>>     }
>> 
>> 
>> Thats something that has not been checked for ages as well.
> 
> Adding checks can never hurt, so go for it. ;-)

i will add both checks  and Fams comment and send a v2 tomorrow.

Peter

> 
> Kevin

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2014-12-15 16:03 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-12-09 16:26 [Qemu-devel] [PATCH 0/4] virtio-blk: add multiread support Peter Lieven
2014-12-09 16:26 ` [Qemu-devel] [PATCH 1/4] block: add accounting for merged requests Peter Lieven
2014-12-09 16:26 ` [Qemu-devel] [PATCH 2/4] hw/virtio-blk: add a constant for max number of " Peter Lieven
2014-12-10  4:54   ` Fam Zheng
2014-12-09 16:26 ` [Qemu-devel] [PATCH 3/4] block-backend: expose bs->bl.max_transfer_length Peter Lieven
2014-12-09 16:26 ` [Qemu-devel] [PATCH 4/4] virtio-blk: introduce multiread Peter Lieven
2014-12-10  7:48   ` Fam Zheng
2014-12-11 13:07     ` Peter Lieven
2014-12-15 15:01   ` Kevin Wolf
2014-12-15 15:43     ` Peter Lieven
2014-12-15 15:52       ` Peter Lieven
2014-12-15 16:00         ` Kevin Wolf
2014-12-15 16:02           ` Peter Lieven
2014-12-15 15:57       ` Kevin Wolf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).