qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v9 0/4] The intro of QEMU block I/O throttling
@ 2011-10-28 10:02 Zhi Yong Wu
  2011-10-28 10:02 ` [Qemu-devel] [PATCH v9 1/4] block: add the block queue support Zhi Yong Wu
                   ` (4 more replies)
  0 siblings, 5 replies; 9+ messages in thread
From: Zhi Yong Wu @ 2011-10-28 10:02 UTC (permalink / raw)
  To: kwolf; +Cc: wuzhy, qemu-devel, stefanha

The main goal of the patch is to effectively cap the disk I/O speed or counts of one single VM.It is only one draft, so it unavoidably has some drawbacks, if you catch them, please let me know.

The patch will mainly introduce one block I/O throttling algorithm, one timer and one block queue for each I/O limits enabled drive.

When a block request is coming in, the throttling algorithm will check if its I/O rate or counts exceed the limits; if yes, then it will enqueue to the block queue; The timer will handle the I/O requests in it.

Some available features follow as below:
(1) global bps limit.
   -drive bps=xxx            in bytes/s
(2) only read bps limit
   -drive bps_rd=xxx         in bytes/s
(3) only write bps limit
   -drive bps_wr=xxx         in bytes/s
(4) global iops limit
   -drive iops=xxx           in ios/s
(5) only read iops limit
   -drive iops_rd=xxx        in ios/s
(6) only write iops limit
   -drive iops_wr=xxx        in ios/s
(7) the combination of some limits.
   -drive bps=xxx,iops=xxx

Known Limitations:
(1) #1 can not coexist with #2, #3
(2) #4 can not coexist with #5, #6

Changes since code V8:
    1.) made a lot of changes based on kevin's comments.
    2.) slice_time is dynamically adjusted.
    3.) rebase the latest qemu upstream.

 v8: fix the build per patch based on stefan's comments.

 v7: Mainly simply the block queue.
     Adjust codes based on stefan's comments.

 v6: Mainly fix the aio callback issue for block queue.
     Adjust codes based on Ram Pai's comments.

 v5: add qmp/hmp support.
     Adjust the codes based on stefan's comments
     qmp/hmp: add block_set_io_throttle

 v4: fix memory leaking based on ryan's feedback.

 v3: Added the code for extending slice time, and modified the method to compute wait time for the timer.

 v2: The codes V2 for QEMU disk I/O limits.
     Modified the codes mainly based on stefan's comments.

 v1: Submit the codes for QEMU disk I/O limits.
     Only a code draft.


Zhi Yong Wu (4):
  block: add the block queue support
  block: add the command line support
  block: add the block throttling algorithm
  qmp/hmp: add block_set_io_throttle

 Makefile.objs     |    2 +-
 block.c           |  543 ++++++++++++++++++++++++++++++++++++++++++++++++++---
 block.h           |   24 +++
 block/blk-queue.c |  201 ++++++++++++++++++++
 block/blk-queue.h |   63 ++++++
 block_int.h       |   45 +++++
 blockdev.c        |   83 ++++++++
 blockdev.h        |    2 +
 hmp-commands.hx   |   15 ++
 qemu-config.c     |   24 +++
 qemu-options.hx   |    1 +
 qerror.c          |    4 +
 qerror.h          |    3 +
 qmp-commands.hx   |   53 +++++-
 14 files changed, 1038 insertions(+), 25 deletions(-)
 create mode 100644 block/blk-queue.c
 create mode 100644 block/blk-queue.h

-- 
1.7.6

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Qemu-devel] [PATCH v9 1/4] block: add the block queue support
  2011-10-28 10:02 [Qemu-devel] [PATCH v9 0/4] The intro of QEMU block I/O throttling Zhi Yong Wu
@ 2011-10-28 10:02 ` Zhi Yong Wu
  2011-10-31 13:35   ` Stefan Hajnoczi
  2011-10-28 10:02 ` [Qemu-devel] [PATCH v9 2/4] block: add the command line support Zhi Yong Wu
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 9+ messages in thread
From: Zhi Yong Wu @ 2011-10-28 10:02 UTC (permalink / raw)
  To: kwolf; +Cc: wuzhy, qemu-devel, stefanha

Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
---
 Makefile.objs     |    2 +-
 block.c           |   71 +++++++++++++++++--
 block.h           |   20 +++++
 block/blk-queue.c |  201 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 block/blk-queue.h |   63 +++++++++++++++++
 block_int.h       |   23 ++++++
 6 files changed, 371 insertions(+), 9 deletions(-)
 create mode 100644 block/blk-queue.c
 create mode 100644 block/blk-queue.h

diff --git a/Makefile.objs b/Makefile.objs
index 01587c8..98891b3 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -33,7 +33,7 @@ block-nested-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vv
 block-nested-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-cache.o
 block-nested-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
 block-nested-y += qed-check.o
-block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o
+block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o blk-queue.o
 block-nested-$(CONFIG_WIN32) += raw-win32.o
 block-nested-$(CONFIG_POSIX) += raw-posix.o
 block-nested-$(CONFIG_CURL) += curl.o
diff --git a/block.c b/block.c
index 70aab63..40ab277 100644
--- a/block.c
+++ b/block.c
@@ -1026,6 +1026,7 @@ typedef struct RwCo {
     QEMUIOVector *qiov;
     bool is_write;
     int ret;
+    bool limit_skip;
 } RwCo;
 
 static void coroutine_fn bdrv_rw_co_entry(void *opaque)
@@ -1060,6 +1061,7 @@ static int bdrv_rw_co(BlockDriverState *bs, int64_t sector_num, uint8_t *buf,
         .qiov = &qiov,
         .is_write = is_write,
         .ret = NOT_DONE,
+        .limit_skip = false,
     };
 
     qemu_iovec_init_external(&qiov, &iov, 1);
@@ -1242,6 +1244,64 @@ int bdrv_pwrite_sync(BlockDriverState *bs, int64_t offset,
     return 0;
 }
 
+typedef struct BlockDriverAIOCBCoroutine {
+    BlockDriverAIOCB common;
+    BlockRequest req;
+    bool is_write;
+    QEMUBH *bh;
+    bool cb_skip;
+    bool limit_skip;
+    void *blkq_acb;
+} BlockDriverAIOCBCoroutine;
+
+/* block I/O throttling */
+typedef struct CoroutineCB {
+    BlockDriverState *bs;
+    int64_t sector_num;
+    int nb_sectors;
+    QEMUIOVector *qiov;
+} CoroutineCB;
+
+static void bdrv_io_limits_skip_set(void *opaque,
+                                    BlockAPIType co_type,
+                                    bool cb_skip,
+                                    bool limit_skip) {
+    RwCo *rwco;
+    BlockDriverAIOCBCoroutine *aioco;
+
+    if (co_type == BDRV_API_SYNC) {
+        rwco = opaque;
+        rwco->limit_skip = limit_skip;
+    } else if (co_type == BDRV_API_ASYNC) {
+        aioco = opaque;
+        aioco->cb_skip = cb_skip;
+        aioco->limit_skip = limit_skip;
+    } else {
+        abort();
+    }
+}
+
+void bdrv_io_limits_issue_request(void *opaque,
+                                  BlockAPIType co_type) {
+    BlockQueueAIOCB *blkq_acb = opaque;
+
+    if (blkq_acb->co_type == BDRV_API_CO) {
+        qemu_coroutine_enter(blkq_acb->co, blkq_acb->cocb);
+    } else {
+        CoroutineEntry *entry = NULL;
+        if (co_type == BDRV_API_SYNC) {
+            entry = bdrv_rw_co_entry;
+        } else if (co_type == BDRV_API_ASYNC) {
+            entry = bdrv_co_do_rw;
+        }
+
+        bdrv_io_limits_skip_set(blkq_acb->cocb,
+                                co_type, false, true);
+        Coroutine *co = qemu_coroutine_create(entry);
+        qemu_coroutine_enter(co, blkq_acb->cocb);
+    }
+}
+
 /*
  * Handle a read request in coroutine context
  */
@@ -2650,14 +2710,6 @@ static BlockDriverAIOCB *bdrv_aio_writev_em(BlockDriverState *bs,
     return bdrv_aio_rw_vector(bs, sector_num, qiov, nb_sectors, cb, opaque, 1);
 }
 
-
-typedef struct BlockDriverAIOCBCoroutine {
-    BlockDriverAIOCB common;
-    BlockRequest req;
-    bool is_write;
-    QEMUBH* bh;
-} BlockDriverAIOCBCoroutine;
-
 static void bdrv_aio_co_cancel_em(BlockDriverAIOCB *blockacb)
 {
     qemu_aio_flush();
@@ -2711,6 +2763,9 @@ static BlockDriverAIOCB *bdrv_co_aio_rw_vector(BlockDriverState *bs,
     acb->req.nb_sectors = nb_sectors;
     acb->req.qiov = qiov;
     acb->is_write = is_write;
+    acb->blkq_acb = NULL;
+    acb->cb_skip  = false;
+    acb->limit_skip = false;
 
     co = qemu_coroutine_create(bdrv_co_do_rw);
     qemu_coroutine_enter(co, acb);
diff --git a/block.h b/block.h
index 5a042c9..b7fbc40 100644
--- a/block.h
+++ b/block.h
@@ -82,6 +82,17 @@ typedef enum {
     BDRV_IOS_MAX
 } BlockIOStatus;
 
+/* disk I/O throttling */
+typedef enum {
+    BDRV_BLKQ_ENQ_FIRST, BDRV_BLKQ_ENQ_AGAIN, BDRV_BLKQ_DEQ_PASS,
+    BDRV_BLKQ_PASS, BDRV_IO_MAX
+} BlockQueueRetType;
+
+typedef enum {
+    BDRV_API_SYNC, BDRV_API_ASYNC, BDRV_API_CO,
+    BDRV_API_MAX
+} BlockAPIType;
+
 void bdrv_iostatus_enable(BlockDriverState *bs);
 void bdrv_iostatus_reset(BlockDriverState *bs);
 void bdrv_iostatus_disable(BlockDriverState *bs);
@@ -94,6 +105,8 @@ void bdrv_info(Monitor *mon, QObject **ret_data);
 void bdrv_stats_print(Monitor *mon, const QObject *data);
 void bdrv_info_stats(Monitor *mon, QObject **ret_data);
 
+void bdrv_io_limits_issue_request(void *opaque, BlockAPIType co_type);
+
 void bdrv_init(void);
 void bdrv_init_with_whitelist(void);
 BlockDriver *bdrv_find_protocol(const char *filename);
@@ -183,6 +196,13 @@ typedef struct BlockRequest {
     int error;
 } BlockRequest;
 
+typedef struct BlockQueueAIOCB BlockQueueAIOCB;
+
+typedef int coroutine_fn (BlockRequestHandler)(BlockDriverState *bs,
+                          int64_t sector_num, int nb_sectors,
+                          QEMUIOVector *qiov,
+                          void *opaque, BlockAPIType co_type);
+
 int bdrv_aio_multiwrite(BlockDriverState *bs, BlockRequest *reqs,
     int num_reqs);
 
diff --git a/block/blk-queue.c b/block/blk-queue.c
new file mode 100644
index 0000000..04bcdac
--- /dev/null
+++ b/block/blk-queue.c
@@ -0,0 +1,201 @@
+/*
+ * QEMU System Emulator queue definition for block layer
+ *
+ * Copyright (c) IBM, Corp. 2011
+ *
+ * Authors:
+ *  Zhi Yong Wu  <wuzhy@linux.vnet.ibm.com>
+ *  Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+/* The APIs for block request queue on qemu block layer.
+ */
+
+#include "block.h"
+#include "block_int.h"
+#include "blk-queue.h"
+#include "qemu-common.h"
+
+static void qemu_block_queue_dequeue(BlockQueue *queue,
+                                     BlockQueueAIOCB *request)
+{
+    BlockQueueAIOCB *req, *next;
+
+    assert(queue);
+    QTAILQ_FOREACH_SAFE(req, &queue->requests, entry, next) {
+        if (req == request) {
+            QTAILQ_REMOVE(&queue->requests, req, entry);
+            break;
+        }
+    }
+}
+
+void qemu_block_queue_cancel(BlockDriverAIOCB *acb)
+{
+    BlockQueueAIOCB *request = container_of(acb, BlockQueueAIOCB, common);
+
+    assert(request->common.bs->block_queue);
+    qemu_block_queue_dequeue(request->common.bs->block_queue,
+                             request);
+
+    qemu_aio_release(request);
+}
+
+static AIOPool block_queue_pool = {
+    .aiocb_size         = sizeof(struct BlockQueueAIOCB),
+    .cancel             = qemu_block_queue_cancel,
+};
+
+static void qemu_block_queue_bh(void *opaque)
+{
+    BlockQueueAIOCB *acb = opaque;
+
+    acb->co_type = BDRV_API_MAX;
+    acb->co      = NULL;
+    acb->cocb    = NULL;
+    qemu_bh_delete(acb->bh);
+    qemu_aio_release(acb);
+}
+
+void qemu_block_queue_cb(void *opaque, int ret)
+{
+    BlockQueueAIOCB *acb = opaque;
+    acb->bh = qemu_bh_new(qemu_block_queue_bh, acb);
+    qemu_bh_schedule(acb->bh);
+}
+
+BlockQueue *qemu_new_block_queue(void)
+{
+    BlockQueue *queue;
+
+    queue = g_malloc0(sizeof(BlockQueue));
+
+    QTAILQ_INIT(&queue->requests);
+
+    queue->limit_exceeded = false;
+    queue->flushing       = false;
+
+    return queue;
+}
+
+void qemu_del_block_queue(BlockQueue *queue)
+{
+    BlockQueueAIOCB *request, *next;
+
+    QTAILQ_FOREACH_SAFE(request, &queue->requests, entry, next) {
+        QTAILQ_REMOVE(&queue->requests, request, entry);
+        qemu_aio_release(request);
+    }
+
+    g_free(queue);
+}
+
+BlockQueueAIOCB *qemu_block_queue_enqueue(BlockQueue *queue,
+                        BlockDriverState *bs,
+                        BlockRequestHandler *handler,
+                        int64_t sector_num,
+                        QEMUIOVector *qiov,
+                        int nb_sectors,
+                        BlockDriverCompletionFunc *cb,
+                        void *opaque)
+{
+    BlockDriverAIOCB *acb;
+    BlockQueueAIOCB *request;
+
+    if (queue->flushing) {
+        queue->limit_exceeded = true;
+        return NULL;
+    } else {
+        acb = qemu_aio_get(&block_queue_pool, bs,
+                           cb, opaque);
+        request = container_of(acb, BlockQueueAIOCB, common);
+        request->handler    = handler;
+        request->sector_num = sector_num;
+        request->qiov       = qiov;
+        request->nb_sectors = nb_sectors;
+        request->bh         = NULL;
+        request->co_type    = BDRV_API_MAX;
+        request->co         = NULL;
+        request->cocb       = NULL;
+        QTAILQ_INSERT_TAIL(&queue->requests, request, entry);
+    }
+
+    return request;
+}
+
+static int qemu_block_queue_handler(BlockQueueAIOCB *request)
+{
+    int ret;
+
+    BlockDriverState *bs = request->common.bs;
+    assert(bs);
+
+    if (bs->io_limits_enabled) {
+        ret = request->handler(request->common.bs, request->sector_num,
+                               request->nb_sectors, request->qiov,
+                               request, request->co_type);
+    } else {
+        if (request->co_type == BDRV_API_CO) {
+            qemu_coroutine_enter(request->co, request->cocb);
+        } else {
+            printf("%s, req: %p\n", __func__, (void *)request);
+            bdrv_io_limits_issue_request(request, request->co_type);
+        }
+
+        ret = BDRV_BLKQ_DEQ_PASS;
+    }
+
+    return ret;
+}
+
+void qemu_block_queue_submit(BlockQueue *queue, BlockDriverCompletionFunc *cb)
+{
+    BlockQueueAIOCB *request;
+    queue->flushing = true;
+
+    /*QTAILQ_FOREACH_SAFE(request, &queue->requests, entry, next) {*/
+    while (!QTAILQ_EMPTY(&queue->requests)) {
+        int ret = 0;
+
+        request = QTAILQ_FIRST(&queue->requests);
+        QTAILQ_REMOVE(&queue->requests, request, entry);
+        queue->limit_exceeded = false;
+        ret = qemu_block_queue_handler(request);
+        if (ret == -EIO) {
+            cb(request, -EIO);
+            break;
+        } else if (ret == BDRV_BLKQ_ENQ_AGAIN) {
+            QTAILQ_INSERT_HEAD(&queue->requests, request, entry);
+            break;
+        } else if (ret == BDRV_BLKQ_DEQ_PASS) {
+            cb(request, 0);
+        }
+    }
+
+    printf("%s, leave\n", __func__);
+    queue->limit_exceeded = false;
+    queue->flushing       = false;
+}
+
+bool qemu_block_queue_has_pending(BlockQueue *queue)
+{
+    return !QTAILQ_EMPTY(&queue->requests);
+}
diff --git a/block/blk-queue.h b/block/blk-queue.h
new file mode 100644
index 0000000..4dd3dff
--- /dev/null
+++ b/block/blk-queue.h
@@ -0,0 +1,63 @@
+/*
+ * QEMU System Emulator queue declaration for block layer
+ *
+ * Copyright (c) IBM, Corp. 2011
+ *
+ * Authors:
+ *  Zhi Yong Wu  <wuzhy@linux.vnet.ibm.com>
+ *  Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#ifndef QEMU_BLOCK_QUEUE_H
+#define QEMU_BLOCK_QUEUE_H
+
+#include "block.h"
+#include "block_int.h"
+#include "qemu-queue.h"
+
+typedef struct BlockQueue {
+    QTAILQ_HEAD(requests, BlockQueueAIOCB) requests;
+    bool limit_exceeded;
+    bool flushing;
+} BlockQueue;
+
+BlockQueue *qemu_new_block_queue(void);
+
+void qemu_block_queue_cancel(BlockDriverAIOCB *acb);
+
+void qemu_del_block_queue(BlockQueue *queue);
+
+BlockQueueAIOCB *qemu_block_queue_enqueue(BlockQueue *queue,
+                        BlockDriverState *bs,
+                        BlockRequestHandler *handler,
+                        int64_t sector_num,
+                        QEMUIOVector *qiov,
+                        int nb_sectors,
+                        BlockDriverCompletionFunc *cb,
+                        void *opaque);
+
+void qemu_block_queue_submit(BlockQueue *queue, BlockDriverCompletionFunc *cb);
+
+void qemu_block_queue_cb(void *opaque, int ret);
+
+bool qemu_block_queue_has_pending(BlockQueue *queue);
+
+#endif /* QEMU_BLOCK_QUEUE_H */
diff --git a/block_int.h b/block_int.h
index dac00f5..86a355d 100644
--- a/block_int.h
+++ b/block_int.h
@@ -29,6 +29,7 @@
 #include "qemu-queue.h"
 #include "qemu-coroutine.h"
 #include "qemu-timer.h"
+#include "block/blk-queue.h"
 
 #define BLOCK_FLAG_ENCRYPT	1
 #define BLOCK_FLAG_COMPAT6	4
@@ -49,6 +50,11 @@ typedef struct AIOPool {
     BlockDriverAIOCB *free_aiocb;
 } AIOPool;
 
+typedef struct BlockIOLimit {
+    uint64_t bps[3];
+    uint64_t iops[3];
+} BlockIOLimit;
+
 struct BlockDriver {
     const char *format_name;
     int instance_size;
@@ -183,6 +189,9 @@ struct BlockDriverState {
 
     void *sync_aiocb;
 
+    BlockQueue   *block_queue;
+    bool         io_limits_enabled;
+
     /* I/O stats (display with "info blockstats"). */
     uint64_t nr_bytes[BDRV_MAX_IOTYPE];
     uint64_t nr_ops[BDRV_MAX_IOTYPE];
@@ -219,6 +228,20 @@ struct BlockDriverAIOCB {
     BlockDriverAIOCB *next;
 };
 
+/* block I/O throttling */
+struct BlockQueueAIOCB {
+    BlockDriverAIOCB common;
+    QTAILQ_ENTRY(BlockQueueAIOCB) entry;
+    BlockRequestHandler *handler;
+    int64_t sector_num;
+    QEMUIOVector *qiov;
+    int nb_sectors;
+    QEMUBH *bh;
+    BlockAPIType co_type;
+    void *co;
+    void *cocb;
+};
+
 void get_tmp_filename(char *filename, int size);
 
 void *qemu_aio_get(AIOPool *pool, BlockDriverState *bs,
-- 
1.7.6

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [Qemu-devel] [PATCH v9 2/4] block: add the command line support
  2011-10-28 10:02 [Qemu-devel] [PATCH v9 0/4] The intro of QEMU block I/O throttling Zhi Yong Wu
  2011-10-28 10:02 ` [Qemu-devel] [PATCH v9 1/4] block: add the block queue support Zhi Yong Wu
@ 2011-10-28 10:02 ` Zhi Yong Wu
  2011-10-28 10:02 ` [Qemu-devel] [PATCH v9 3/4] block: add the block throttling algorithm Zhi Yong Wu
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Zhi Yong Wu @ 2011-10-28 10:02 UTC (permalink / raw)
  To: kwolf; +Cc: wuzhy, qemu-devel, stefanha

Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
---
 block.c         |   85 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 block.h         |    4 ++
 block_int.h     |   22 ++++++++++++++
 blockdev.c      |   32 ++++++++++++++++++++
 qemu-config.c   |   24 +++++++++++++++
 qemu-options.hx |    1 +
 6 files changed, 168 insertions(+), 0 deletions(-)

diff --git a/block.c b/block.c
index 40ab277..8f92950 100644
--- a/block.c
+++ b/block.c
@@ -30,6 +30,9 @@
 #include "qemu-objects.h"
 #include "qemu-coroutine.h"
 
+#include "qemu-timer.h"
+#include "block/blk-queue.h"
+
 #ifdef CONFIG_BSD
 #include <sys/types.h>
 #include <sys/stat.h>
@@ -104,6 +107,60 @@ int is_windows_drive(const char *filename)
 }
 #endif
 
+/* throttling disk I/O limits */
+void bdrv_io_limits_disable(BlockDriverState *bs)
+{
+    bs->io_limits_enabled = false;
+
+    if (bs->block_queue) {
+        qemu_block_queue_submit(bs->block_queue, qemu_block_queue_cb);
+        qemu_del_block_queue(bs->block_queue);
+
+        bs->block_queue = NULL;
+    }
+
+    if (bs->block_timer) {
+        qemu_del_timer(bs->block_timer);
+        qemu_free_timer(bs->block_timer);
+        bs->block_timer = NULL;
+    }
+
+    bs->slice_start = 0;
+    bs->slice_end   = 0;
+    bs->slice_time  = 0;
+    memset(&bs->io_disps, 0, sizeof(bs->io_disps));
+}
+
+static void bdrv_block_timer(void *opaque)
+{
+    BlockDriverState *bs = opaque;
+    BlockQueue *queue    = bs->block_queue;
+
+    qemu_block_queue_submit(queue, qemu_block_queue_cb);
+}
+
+void bdrv_io_limits_enable(BlockDriverState *bs)
+{
+    bs->io_limits_enabled = true;
+    bs->block_queue   = qemu_new_block_queue();
+    bs->block_timer   = qemu_new_timer_ns(vm_clock, bdrv_block_timer, bs);
+    bs->slice_time    = 5 * BLOCK_IO_SLICE_TIME;
+    bs->slice_start   = qemu_get_clock_ns(vm_clock);
+    bs->slice_end     = bs->slice_start + bs->slice_time;
+    memset(&bs->io_disps, 0, sizeof(bs->io_disps));
+}
+
+bool bdrv_io_limits_enabled(BlockDriverState *bs)
+{
+    BlockIOLimit *io_limits = &bs->io_limits;
+    return io_limits->bps[BLOCK_IO_LIMIT_READ]
+         || io_limits->bps[BLOCK_IO_LIMIT_WRITE]
+         || io_limits->bps[BLOCK_IO_LIMIT_TOTAL]
+         || io_limits->iops[BLOCK_IO_LIMIT_READ]
+         || io_limits->iops[BLOCK_IO_LIMIT_WRITE]
+         || io_limits->iops[BLOCK_IO_LIMIT_TOTAL];
+}
+
 /* check if the path starts with "<protocol>:" */
 static int path_has_protocol(const char *path)
 {
@@ -684,6 +741,11 @@ int bdrv_open(BlockDriverState *bs, const char *filename, int flags,
         bdrv_dev_change_media_cb(bs, true);
     }
 
+    /* throttling disk I/O limits */
+    if (bs->io_limits_enabled) {
+        bdrv_io_limits_enable(bs);
+    }
+
     return 0;
 
 unlink_and_fail:
@@ -719,6 +781,21 @@ void bdrv_close(BlockDriverState *bs)
 
         bdrv_dev_change_media_cb(bs, false);
     }
+
+    /*throttling disk I/O limits*/
+    bdrv_io_limits_disable(bs);
+
+    /* throttling disk I/O limits */
+    if (bs->block_queue) {
+        qemu_del_block_queue(bs->block_queue);
+        bs->block_queue = NULL;
+    }
+
+    if (bs->block_timer) {
+        qemu_del_timer(bs->block_timer);
+        qemu_free_timer(bs->block_timer);
+        bs->block_timer = NULL;
+    }
 }
 
 void bdrv_close_all(void)
@@ -1576,6 +1653,14 @@ void bdrv_get_geometry_hint(BlockDriverState *bs,
     *psecs = bs->secs;
 }
 
+/* throttling disk io limits */
+void bdrv_set_io_limits(BlockDriverState *bs,
+                            BlockIOLimit *io_limits)
+{
+    bs->io_limits = *io_limits;
+    bs->io_limits_enabled = bdrv_io_limits_enabled(bs);
+}
+
 /* Recognize floppy formats */
 typedef struct FDFormat {
     FDriveType drive;
diff --git a/block.h b/block.h
index b7fbc40..1cd6b8b 100644
--- a/block.h
+++ b/block.h
@@ -105,6 +105,10 @@ void bdrv_info(Monitor *mon, QObject **ret_data);
 void bdrv_stats_print(Monitor *mon, const QObject *data);
 void bdrv_info_stats(Monitor *mon, QObject **ret_data);
 
+/* disk I/O throttling */
+void bdrv_io_limits_enable(BlockDriverState *bs);
+void bdrv_io_limits_disable(BlockDriverState *bs);
+bool bdrv_io_limits_enabled(BlockDriverState *bs);
 void bdrv_io_limits_issue_request(void *opaque, BlockAPIType co_type);
 
 void bdrv_init(void);
diff --git a/block_int.h b/block_int.h
index 86a355d..3a4379c 100644
--- a/block_int.h
+++ b/block_int.h
@@ -34,6 +34,13 @@
 #define BLOCK_FLAG_ENCRYPT	1
 #define BLOCK_FLAG_COMPAT6	4
 
+#define BLOCK_IO_LIMIT_READ     0
+#define BLOCK_IO_LIMIT_WRITE    1
+#define BLOCK_IO_LIMIT_TOTAL    2
+
+#define BLOCK_IO_SLICE_TIME     100000000
+#define NANOSECONDS_PER_SECOND  1000000000.0
+
 #define BLOCK_OPT_SIZE          "size"
 #define BLOCK_OPT_ENCRYPT       "encryption"
 #define BLOCK_OPT_COMPAT6       "compat6"
@@ -55,6 +62,11 @@ typedef struct BlockIOLimit {
     uint64_t iops[3];
 } BlockIOLimit;
 
+typedef struct BlockIODisp {
+    uint64_t bytes[2];
+    uint64_t ios[2];
+} BlockIODisp;
+
 struct BlockDriver {
     const char *format_name;
     int instance_size;
@@ -189,7 +201,14 @@ struct BlockDriverState {
 
     void *sync_aiocb;
 
+    /* the time for latest disk I/O */
+    int64_t slice_time;
+    int64_t slice_start;
+    int64_t slice_end;
+    BlockIOLimit io_limits;
+    BlockIODisp  io_disps;
     BlockQueue   *block_queue;
+    QEMUTimer    *block_timer;
     bool         io_limits_enabled;
 
     /* I/O stats (display with "info blockstats"). */
@@ -248,6 +267,9 @@ void *qemu_aio_get(AIOPool *pool, BlockDriverState *bs,
                    BlockDriverCompletionFunc *cb, void *opaque);
 void qemu_aio_release(void *p);
 
+void bdrv_set_io_limits(BlockDriverState *bs,
+                        BlockIOLimit *io_limits);
+
 #ifdef _WIN32
 int is_windows_drive(const char *filename);
 #endif
diff --git a/blockdev.c b/blockdev.c
index 0827bf7..faf8c56 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -235,6 +235,9 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
     int on_read_error, on_write_error;
     const char *devaddr;
     DriveInfo *dinfo;
+    BlockIOLimit io_limits;
+    bool bps_iol;
+    bool iops_iol;
     int snapshot = 0;
     int ret;
 
@@ -353,6 +356,32 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
         }
     }
 
+    /* disk I/O throttling */
+    io_limits.bps[BLOCK_IO_LIMIT_TOTAL]  =
+                           qemu_opt_get_number(opts, "bps", 0);
+    io_limits.bps[BLOCK_IO_LIMIT_READ]   =
+                           qemu_opt_get_number(opts, "bps_rd", 0);
+    io_limits.bps[BLOCK_IO_LIMIT_WRITE]  =
+                           qemu_opt_get_number(opts, "bps_wr", 0);
+    io_limits.iops[BLOCK_IO_LIMIT_TOTAL] =
+                           qemu_opt_get_number(opts, "iops", 0);
+    io_limits.iops[BLOCK_IO_LIMIT_READ]  =
+                           qemu_opt_get_number(opts, "iops_rd", 0);
+    io_limits.iops[BLOCK_IO_LIMIT_WRITE] =
+                           qemu_opt_get_number(opts, "iops_wr", 0);
+
+    bps_iol  = (io_limits.bps[BLOCK_IO_LIMIT_TOTAL] != 0)
+                 && ((io_limits.bps[BLOCK_IO_LIMIT_READ] != 0)
+                 || (io_limits.bps[BLOCK_IO_LIMIT_WRITE] != 0));
+    iops_iol = (io_limits.iops[BLOCK_IO_LIMIT_TOTAL] != 0)
+                 && ((io_limits.iops[BLOCK_IO_LIMIT_READ] != 0)
+                 || (io_limits.iops[BLOCK_IO_LIMIT_WRITE] != 0));
+    if (bps_iol || iops_iol) {
+        error_report("bps(iops) and bps_rd/bps_wr(iops_rd/iops_wr)"
+                     "cannot be used at the same time");
+        return NULL;
+    }
+
     on_write_error = BLOCK_ERR_STOP_ENOSPC;
     if ((buf = qemu_opt_get(opts, "werror")) != NULL) {
         if (type != IF_IDE && type != IF_SCSI && type != IF_VIRTIO && type != IF_NONE) {
@@ -460,6 +489,9 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
 
     bdrv_set_on_error(dinfo->bdrv, on_read_error, on_write_error);
 
+    /* disk I/O throttling */
+    bdrv_set_io_limits(dinfo->bdrv, &io_limits);
+
     switch(type) {
     case IF_IDE:
     case IF_SCSI:
diff --git a/qemu-config.c b/qemu-config.c
index 90b6b3e..5a201a4 100644
--- a/qemu-config.c
+++ b/qemu-config.c
@@ -85,6 +85,30 @@ static QemuOptsList qemu_drive_opts = {
             .name = "readonly",
             .type = QEMU_OPT_BOOL,
             .help = "open drive file as read-only",
+        },{
+            .name = "iops",
+            .type = QEMU_OPT_NUMBER,
+            .help = "limit total I/O operations per second",
+        },{
+            .name = "iops_rd",
+            .type = QEMU_OPT_NUMBER,
+            .help = "limit read operations per second",
+        },{
+            .name = "iops_wr",
+            .type = QEMU_OPT_NUMBER,
+            .help = "limit write operations per second",
+        },{
+            .name = "bps",
+            .type = QEMU_OPT_NUMBER,
+            .help = "limit total bytes per second",
+        },{
+            .name = "bps_rd",
+            .type = QEMU_OPT_NUMBER,
+            .help = "limit read bytes per second",
+        },{
+            .name = "bps_wr",
+            .type = QEMU_OPT_NUMBER,
+            .help = "limit write bytes per second",
         },
         { /* end of list */ }
     },
diff --git a/qemu-options.hx b/qemu-options.hx
index 5d2a776..9344316 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -136,6 +136,7 @@ DEF("drive", HAS_ARG, QEMU_OPTION_drive,
     "       [,cache=writethrough|writeback|none|directsync|unsafe][,format=f]\n"
     "       [,serial=s][,addr=A][,id=name][,aio=threads|native]\n"
     "       [,readonly=on|off]\n"
+    "       [[,bps=b]|[[,bps_rd=r][,bps_wr=w]]][[,iops=i]|[[,iops_rd=r][,iops_wr=w]]\n"
     "                use 'file' as a drive image\n", QEMU_ARCH_ALL)
 STEXI
 @item -drive @var{option}[,@var{option}[,@var{option}[,...]]]
-- 
1.7.6

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [Qemu-devel] [PATCH v9 3/4] block: add the block throttling algorithm
  2011-10-28 10:02 [Qemu-devel] [PATCH v9 0/4] The intro of QEMU block I/O throttling Zhi Yong Wu
  2011-10-28 10:02 ` [Qemu-devel] [PATCH v9 1/4] block: add the block queue support Zhi Yong Wu
  2011-10-28 10:02 ` [Qemu-devel] [PATCH v9 2/4] block: add the command line support Zhi Yong Wu
@ 2011-10-28 10:02 ` Zhi Yong Wu
  2011-10-28 10:02 ` [Qemu-devel] [PATCH v9 4/4] qmp/hmp: add block_set_io_throttle Zhi Yong Wu
  2011-10-30 10:44 ` [Qemu-devel] [PATCH v9 0/4] The intro of QEMU block I/O throttling Richard Davies
  4 siblings, 0 replies; 9+ messages in thread
From: Zhi Yong Wu @ 2011-10-28 10:02 UTC (permalink / raw)
  To: kwolf; +Cc: wuzhy, qemu-devel, stefanha

Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
---
 block.c |  360 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 files changed, 348 insertions(+), 12 deletions(-)

diff --git a/block.c b/block.c
index 8f92950..2d85b64 100644
--- a/block.c
+++ b/block.c
@@ -63,9 +63,11 @@ static int coroutine_fn bdrv_co_writev_em(BlockDriverState *bs,
                                          int64_t sector_num, int nb_sectors,
                                          QEMUIOVector *iov);
 static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs,
-    int64_t sector_num, int nb_sectors, QEMUIOVector *qiov);
+    int64_t sector_num, int nb_sectors, QEMUIOVector *qiov,
+    void *opaque, BlockAPIType co_type);
 static int coroutine_fn bdrv_co_do_writev(BlockDriverState *bs,
-    int64_t sector_num, int nb_sectors, QEMUIOVector *qiov);
+    int64_t sector_num, int nb_sectors, QEMUIOVector *qiov,
+    void *opaque, BlockAPIType co_type);
 static BlockDriverAIOCB *bdrv_co_aio_rw_vector(BlockDriverState *bs,
                                                int64_t sector_num,
                                                QEMUIOVector *qiov,
@@ -75,6 +77,13 @@ static BlockDriverAIOCB *bdrv_co_aio_rw_vector(BlockDriverState *bs,
                                                bool is_write);
 static void coroutine_fn bdrv_co_do_rw(void *opaque);
 
+static bool bdrv_exceed_bps_limits(BlockDriverState *bs, int nb_sectors,
+        bool is_write, double elapsed_time, uint64_t *wait);
+static bool bdrv_exceed_iops_limits(BlockDriverState *bs, bool is_write,
+        double elapsed_time, uint64_t *wait);
+static bool bdrv_exceed_io_limits(BlockDriverState *bs, int nb_sectors,
+        bool is_write, int64_t *wait);
+
 static QTAILQ_HEAD(, BlockDriverState) bdrv_states =
     QTAILQ_HEAD_INITIALIZER(bdrv_states);
 
@@ -161,6 +170,26 @@ bool bdrv_io_limits_enabled(BlockDriverState *bs)
          || io_limits->iops[BLOCK_IO_LIMIT_TOTAL];
 }
 
+static BlockQueueAIOCB *bdrv_io_limits_perform(BlockDriverState *bs,
+                       BlockRequestHandler *handler, int64_t sector_num,
+                       QEMUIOVector *qiov, int nb_sectors)
+{
+    BlockQueueAIOCB *ret = NULL;
+    int64_t wait_time = -1;
+
+    if (bdrv_exceed_io_limits(bs, nb_sectors, false, &wait_time)) {
+        ret = qemu_block_queue_enqueue(bs->block_queue, bs, handler,
+                                       sector_num, qiov,
+                                       nb_sectors, NULL, NULL);
+        if (wait_time != -1) {
+            qemu_mod_timer(bs->block_timer,
+                           wait_time + qemu_get_clock_ns(vm_clock));
+        }
+    }
+
+    return ret;
+}
+
 /* check if the path starts with "<protocol>:" */
 static int path_has_protocol(const char *path)
 {
@@ -1112,10 +1141,12 @@ static void coroutine_fn bdrv_rw_co_entry(void *opaque)
 
     if (!rwco->is_write) {
         rwco->ret = bdrv_co_do_readv(rwco->bs, rwco->sector_num,
-                                     rwco->nb_sectors, rwco->qiov);
+                                     rwco->nb_sectors, rwco->qiov,
+                                     rwco, BDRV_API_SYNC);
     } else {
         rwco->ret = bdrv_co_do_writev(rwco->bs, rwco->sector_num,
-                                      rwco->nb_sectors, rwco->qiov);
+                                      rwco->nb_sectors, rwco->qiov,
+                                      rwco, BDRV_API_SYNC);
     }
 }
 
@@ -1331,6 +1362,17 @@ typedef struct BlockDriverAIOCBCoroutine {
     void *blkq_acb;
 } BlockDriverAIOCBCoroutine;
 
+static void bdrv_co_rw_bh(void *opaque)
+{
+    BlockDriverAIOCBCoroutine *acb = opaque;
+
+    acb->common.cb(acb->common.opaque, acb->req.error);
+
+    acb->blkq_acb = NULL;
+    qemu_bh_delete(acb->bh);
+    qemu_aio_release(acb);
+}
+
 /* block I/O throttling */
 typedef struct CoroutineCB {
     BlockDriverState *bs;
@@ -1339,6 +1381,25 @@ typedef struct CoroutineCB {
     QEMUIOVector *qiov;
 } CoroutineCB;
 
+static bool bdrv_io_limits_skip_query(void *opaque,
+                                      BlockAPIType co_type) {
+    bool limit_skip;
+    RwCo *rwco;
+    BlockDriverAIOCBCoroutine *aioco;
+
+    if (co_type == BDRV_API_SYNC) {
+        rwco = opaque;
+        limit_skip = rwco->limit_skip;
+    } else if (co_type == BDRV_API_ASYNC) {
+        aioco = opaque;
+        limit_skip = aioco->limit_skip;
+    } else {
+        abort();
+    }
+
+    return limit_skip;
+}
+
 static void bdrv_io_limits_skip_set(void *opaque,
                                     BlockAPIType co_type,
                                     bool cb_skip,
@@ -1379,13 +1440,52 @@ void bdrv_io_limits_issue_request(void *opaque,
     }
 }
 
+static int bdrv_io_limits_intercept(BlockDriverState *bs,
+    BlockRequestHandler *handler, int64_t sector_num,
+    QEMUIOVector *qiov, int nb_sectors, void *opaque, BlockAPIType co_type)
+{
+    BlockQueueAIOCB *blkq_acb = NULL;
+    BlockQueueRetType ret = BDRV_BLKQ_PASS;
+
+    blkq_acb = bdrv_io_limits_perform(bs, handler,
+                                      sector_num, qiov,
+                                      nb_sectors);
+    if (blkq_acb) {
+        blkq_acb->co_type = co_type;
+        blkq_acb->cocb = opaque;
+        if (co_type == BDRV_API_ASYNC) {
+            BlockDriverAIOCBCoroutine *aioco = opaque;
+            aioco->blkq_acb = blkq_acb;
+            aioco->cb_skip  = true;
+        } else if (co_type == BDRV_API_CO) {
+            blkq_acb->co = qemu_coroutine_self();
+        }
+
+        ret = BDRV_BLKQ_ENQ_FIRST;
+    } else {
+        if (bs->block_queue->flushing) {
+            if (bs->block_queue->limit_exceeded) {
+                ret = BDRV_BLKQ_ENQ_AGAIN;
+            } else {
+                blkq_acb = opaque;
+                bdrv_io_limits_issue_request(blkq_acb, co_type);
+                ret = BDRV_BLKQ_DEQ_PASS;
+            }
+        }
+    }
+
+    return ret;
+}
+
 /*
  * Handle a read request in coroutine context
  */
 static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs,
-    int64_t sector_num, int nb_sectors, QEMUIOVector *qiov)
+    int64_t sector_num, int nb_sectors, QEMUIOVector *qiov,
+    void *opaque, BlockAPIType co_type)
 {
     BlockDriver *drv = bs->drv;
+    int ret;
 
     if (!drv) {
         return -ENOMEDIUM;
@@ -1394,6 +1494,28 @@ static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs,
         return -EIO;
     }
 
+    /* throttling disk read I/O */
+    if (bs->io_limits_enabled
+        && !bdrv_io_limits_skip_query(opaque, co_type)) {
+        ret = bdrv_io_limits_intercept(bs, bdrv_co_do_readv,
+                                       sector_num, qiov,
+                                       nb_sectors, opaque, co_type);
+        switch (ret) {
+        case BDRV_BLKQ_ENQ_FIRST:
+            if (co_type == BDRV_API_SYNC) {
+                ret = NOT_DONE;
+            } else if (co_type == BDRV_API_ASYNC) {
+                ret = 0;
+            } else if (co_type == BDRV_API_CO) {
+                qemu_coroutine_yield();
+                break;
+            }
+        case BDRV_BLKQ_ENQ_AGAIN:
+        case BDRV_BLKQ_DEQ_PASS:
+            return ret;
+        }
+    }
+
     return drv->bdrv_co_readv(bs, sector_num, nb_sectors, qiov);
 }
 
@@ -1402,14 +1524,23 @@ int coroutine_fn bdrv_co_readv(BlockDriverState *bs, int64_t sector_num,
 {
     trace_bdrv_co_readv(bs, sector_num, nb_sectors);
 
-    return bdrv_co_do_readv(bs, sector_num, nb_sectors, qiov);
+    CoroutineCB cocb = {
+        .bs = bs,
+        .sector_num = sector_num,
+        .nb_sectors = nb_sectors,
+        .qiov = qiov,
+    };
+
+    return bdrv_co_do_readv(bs, sector_num, nb_sectors,
+                            qiov, &cocb, BDRV_API_CO);
 }
 
 /*
  * Handle a write request in coroutine context
  */
 static int coroutine_fn bdrv_co_do_writev(BlockDriverState *bs,
-    int64_t sector_num, int nb_sectors, QEMUIOVector *qiov)
+    int64_t sector_num, int nb_sectors, QEMUIOVector *qiov,
+    void *opaque, BlockAPIType co_type)
 {
     BlockDriver *drv = bs->drv;
     int ret;
@@ -1424,6 +1555,28 @@ static int coroutine_fn bdrv_co_do_writev(BlockDriverState *bs,
         return -EIO;
     }
 
+    /* throttling disk write I/O */
+    if (bs->io_limits_enabled
+        && !bdrv_io_limits_skip_query(opaque, co_type)) {
+        ret = bdrv_io_limits_intercept(bs, bdrv_co_do_writev,
+                                       sector_num, qiov,
+                                       nb_sectors, opaque, co_type);
+        switch (ret) {
+        case BDRV_BLKQ_ENQ_FIRST:
+            if (co_type == BDRV_API_SYNC) {
+                ret = NOT_DONE;
+            } else if (co_type == BDRV_API_ASYNC) {
+                ret = 0;
+            } else if (co_type == BDRV_API_CO) {
+                qemu_coroutine_yield();
+                break;
+            }
+        case BDRV_BLKQ_ENQ_AGAIN:
+        case BDRV_BLKQ_DEQ_PASS:
+            return ret;
+        }
+    }
+
     ret = drv->bdrv_co_writev(bs, sector_num, nb_sectors, qiov);
 
     if (bs->dirty_bitmap) {
@@ -1442,7 +1595,15 @@ int coroutine_fn bdrv_co_writev(BlockDriverState *bs, int64_t sector_num,
 {
     trace_bdrv_co_writev(bs, sector_num, nb_sectors);
 
-    return bdrv_co_do_writev(bs, sector_num, nb_sectors, qiov);
+    CoroutineCB cocb = {
+        .bs = bs,
+        .sector_num = sector_num,
+        .nb_sectors = nb_sectors,
+        .qiov = qiov,
+    };
+
+    return bdrv_co_do_writev(bs, sector_num, nb_sectors,
+                             qiov, &cocb, BDRV_API_CO);
 }
 
 /**
@@ -2709,6 +2870,170 @@ void bdrv_aio_cancel(BlockDriverAIOCB *acb)
     acb->pool->cancel(acb);
 }
 
+/* block I/O throttling */
+static bool bdrv_exceed_bps_limits(BlockDriverState *bs, int nb_sectors,
+                 bool is_write, double elapsed_time, uint64_t *wait) {
+    uint64_t bps_limit = 0;
+    double   bytes_limit, bytes_disp, bytes_res;
+    double   slice_time, wait_time;
+
+    if (bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL]) {
+        bps_limit = bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL];
+    } else if (bs->io_limits.bps[is_write]) {
+        bps_limit = bs->io_limits.bps[is_write];
+    } else {
+        if (wait) {
+            *wait = 0;
+        }
+
+        return false;
+    }
+
+    slice_time = bs->slice_end - bs->slice_start;
+    slice_time /= (NANOSECONDS_PER_SECOND);
+    bytes_limit = bps_limit * slice_time;
+    bytes_disp  = bs->nr_bytes[is_write] - bs->io_disps.bytes[is_write];
+    if (bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL]) {
+        bytes_disp += bs->nr_bytes[!is_write] - bs->io_disps.bytes[!is_write];
+    }
+
+    bytes_res   = (unsigned) nb_sectors * BDRV_SECTOR_SIZE;
+
+    if (bytes_disp + bytes_res <= bytes_limit) {
+        if (wait) {
+            *wait = 0;
+        }
+
+        return false;
+    }
+
+    /* Calc approx time to dispatch */
+    wait_time = (bytes_disp + bytes_res) / bps_limit - elapsed_time;
+
+    bs->slice_time = wait_time * BLOCK_IO_SLICE_TIME * 10;
+    bs->slice_end += bs->slice_time - 3 * BLOCK_IO_SLICE_TIME;
+    if (wait) {
+        *wait = wait_time * BLOCK_IO_SLICE_TIME * 10;
+    }
+
+    return true;
+}
+
+static bool bdrv_exceed_iops_limits(BlockDriverState *bs, bool is_write,
+                             double elapsed_time, uint64_t *wait) {
+    uint64_t iops_limit = 0;
+    double   ios_limit, ios_disp;
+    double   slice_time, wait_time;
+
+    if (bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL]) {
+        iops_limit = bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL];
+    } else if (bs->io_limits.iops[is_write]) {
+        iops_limit = bs->io_limits.iops[is_write];
+    } else {
+        if (wait) {
+            *wait = 0;
+        }
+
+        return false;
+    }
+
+    slice_time = bs->slice_end - bs->slice_start;
+    slice_time /= (NANOSECONDS_PER_SECOND);
+    ios_limit  = iops_limit * slice_time;
+    ios_disp   = bs->nr_ops[is_write] - bs->io_disps.ios[is_write];
+    if (bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL]) {
+        ios_disp += bs->nr_ops[!is_write] - bs->io_disps.ios[!is_write];
+    }
+
+    if (ios_disp + 1 <= ios_limit) {
+        if (wait) {
+            *wait = 0;
+        }
+
+        return false;
+    }
+
+    /* Calc approx time to dispatch */
+    wait_time = (ios_disp + 1) / iops_limit;
+    if (wait_time > elapsed_time) {
+        wait_time = wait_time - elapsed_time;
+    } else {
+        wait_time = 0;
+    }
+
+    bs->slice_time = wait_time * BLOCK_IO_SLICE_TIME * 10;
+    bs->slice_end += bs->slice_time - 3 * BLOCK_IO_SLICE_TIME;
+    if (wait) {
+        *wait = wait_time * BLOCK_IO_SLICE_TIME * 10;
+    }
+
+    return true;
+}
+
+static bool bdrv_exceed_io_limits(BlockDriverState *bs, int nb_sectors,
+                           bool is_write, int64_t *wait) {
+    int64_t  now, max_wait;
+    uint64_t bps_wait = 0, iops_wait = 0;
+    double   elapsed_time;
+    int      bps_ret, iops_ret;
+
+    now = qemu_get_clock_ns(vm_clock);
+    if ((bs->slice_start < now)
+        && (bs->slice_end > now)) {
+        bs->slice_end = now + bs->slice_time;
+    } else {
+        bs->slice_time  =  5 * BLOCK_IO_SLICE_TIME;
+        bs->slice_start = now;
+        bs->slice_end   = now + bs->slice_time;
+
+        bs->io_disps.bytes[is_write]  = bs->nr_bytes[is_write];
+        bs->io_disps.bytes[!is_write] = bs->nr_bytes[!is_write];
+
+        bs->io_disps.ios[is_write]    = bs->nr_ops[is_write];
+        bs->io_disps.ios[!is_write]   = bs->nr_ops[!is_write];
+    }
+
+    /* If a limit was exceeded, immediately queue this request */
+    if (!bs->block_queue->flushing
+        && qemu_block_queue_has_pending(bs->block_queue)) {
+        if (bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL]
+            || bs->io_limits.bps[is_write] || bs->io_limits.iops[is_write]
+            || bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL]) {
+            if (wait) {
+                *wait = -1;
+            }
+
+            return true;
+        }
+    }
+
+    elapsed_time  = now - bs->slice_start;
+    elapsed_time  /= (NANOSECONDS_PER_SECOND);
+
+    bps_ret  = bdrv_exceed_bps_limits(bs, nb_sectors,
+                                      is_write, elapsed_time, &bps_wait);
+    iops_ret = bdrv_exceed_iops_limits(bs, is_write,
+                                      elapsed_time, &iops_wait);
+    if (bps_ret || iops_ret) {
+        max_wait = bps_wait > iops_wait ? bps_wait : iops_wait;
+        if (wait) {
+            *wait = max_wait;
+        }
+
+        now = qemu_get_clock_ns(vm_clock);
+        if (bs->slice_end < now + max_wait) {
+            bs->slice_end = now + max_wait;
+        }
+
+        return true;
+    }
+
+    if (wait) {
+        *wait = 0;
+    }
+
+    return false;
+}
 
 /**************************************************************/
 /* async block device emulation */
@@ -2798,6 +3123,15 @@ static BlockDriverAIOCB *bdrv_aio_writev_em(BlockDriverState *bs,
 static void bdrv_aio_co_cancel_em(BlockDriverAIOCB *blockacb)
 {
     qemu_aio_flush();
+
+    BlockDriverAIOCBCoroutine *co_acb =
+            container_of(blockacb, BlockDriverAIOCBCoroutine, common);
+    if (co_acb->blkq_acb) {
+        BlockDriverAIOCB *acb_com = (BlockDriverAIOCB *)co_acb->blkq_acb;
+        assert(acb_com->bs);
+        acb_com->pool->cancel(acb_com);
+        co_acb->blkq_acb = NULL;
+    }
 }
 
 static AIOPool bdrv_em_co_aio_pool = {
@@ -2822,14 +3156,16 @@ static void coroutine_fn bdrv_co_do_rw(void *opaque)
 
     if (!acb->is_write) {
         acb->req.error = bdrv_co_do_readv(bs, acb->req.sector,
-            acb->req.nb_sectors, acb->req.qiov);
+            acb->req.nb_sectors, acb->req.qiov, acb, BDRV_API_ASYNC);
     } else {
         acb->req.error = bdrv_co_do_writev(bs, acb->req.sector,
-            acb->req.nb_sectors, acb->req.qiov);
+            acb->req.nb_sectors, acb->req.qiov, acb, BDRV_API_ASYNC);
     }
 
-    acb->bh = qemu_bh_new(bdrv_co_em_bh, acb);
-    qemu_bh_schedule(acb->bh);
+    if (!acb->cb_skip) {
+        acb->bh = qemu_bh_new(bdrv_co_rw_bh, acb);
+        qemu_bh_schedule(acb->bh);
+    }
 }
 
 static BlockDriverAIOCB *bdrv_co_aio_rw_vector(BlockDriverState *bs,
-- 
1.7.6

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [Qemu-devel] [PATCH v9 4/4] qmp/hmp: add block_set_io_throttle
  2011-10-28 10:02 [Qemu-devel] [PATCH v9 0/4] The intro of QEMU block I/O throttling Zhi Yong Wu
                   ` (2 preceding siblings ...)
  2011-10-28 10:02 ` [Qemu-devel] [PATCH v9 3/4] block: add the block throttling algorithm Zhi Yong Wu
@ 2011-10-28 10:02 ` Zhi Yong Wu
  2011-10-30 10:44 ` [Qemu-devel] [PATCH v9 0/4] The intro of QEMU block I/O throttling Richard Davies
  4 siblings, 0 replies; 9+ messages in thread
From: Zhi Yong Wu @ 2011-10-28 10:02 UTC (permalink / raw)
  To: kwolf; +Cc: wuzhy, qemu-devel, stefanha

Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
---
 block.c         |   27 ++++++++++++++++++++++++---
 blockdev.c      |   51 +++++++++++++++++++++++++++++++++++++++++++++++++++
 blockdev.h      |    2 ++
 hmp-commands.hx |   15 +++++++++++++++
 qerror.c        |    4 ++++
 qerror.h        |    3 +++
 qmp-commands.hx |   53 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
 7 files changed, 151 insertions(+), 4 deletions(-)

diff --git a/block.c b/block.c
index 2d85b64..395bbd7 100644
--- a/block.c
+++ b/block.c
@@ -2164,6 +2164,16 @@ static void bdrv_print_dict(QObject *obj, void *opaque)
                             qdict_get_bool(qdict, "ro"),
                             qdict_get_str(qdict, "drv"),
                             qdict_get_bool(qdict, "encrypted"));
+
+        monitor_printf(mon, " bps=%" PRId64 " bps_rd=%" PRId64
+                            " bps_wr=%" PRId64 " iops=%" PRId64
+                            " iops_rd=%" PRId64 " iops_wr=%" PRId64,
+                            qdict_get_int(qdict, "bps"),
+                            qdict_get_int(qdict, "bps_rd"),
+                            qdict_get_int(qdict, "bps_wr"),
+                            qdict_get_int(qdict, "iops"),
+                            qdict_get_int(qdict, "iops_rd"),
+                            qdict_get_int(qdict, "iops_wr"));
     } else {
         monitor_printf(mon, " [not inserted]");
     }
@@ -2214,10 +2224,22 @@ void bdrv_info(Monitor *mon, QObject **ret_data)
             QObject *obj;
 
             obj = qobject_from_jsonf("{ 'file': %s, 'ro': %i, 'drv': %s, "
-                                     "'encrypted': %i }",
+                                     "'encrypted': %i, "
+                                     "'bps': %" PRId64 ","
+                                     "'bps_rd': %" PRId64 ","
+                                     "'bps_wr': %" PRId64 ","
+                                     "'iops': %" PRId64 ","
+                                     "'iops_rd': %" PRId64 ","
+                                     "'iops_wr': %" PRId64 "}",
                                      bs->filename, bs->read_only,
                                      bs->drv->format_name,
-                                     bdrv_is_encrypted(bs));
+                                     bdrv_is_encrypted(bs),
+                                     bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL],
+                                     bs->io_limits.bps[BLOCK_IO_LIMIT_READ],
+                                     bs->io_limits.bps[BLOCK_IO_LIMIT_WRITE],
+                                     bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL],
+                                     bs->io_limits.iops[BLOCK_IO_LIMIT_READ],
+                                     bs->io_limits.iops[BLOCK_IO_LIMIT_WRITE]);
             if (bs->backing_file[0] != '\0') {
                 QDict *qdict = qobject_to_qdict(obj);
                 qdict_put(qdict, "backing_file",
@@ -2405,7 +2427,6 @@ void bdrv_debug_event(BlockDriverState *bs, BlkDebugEvent event)
     }
 
     return drv->bdrv_debug_event(bs, event);
-
 }
 
 /**************************************************************/
diff --git a/blockdev.c b/blockdev.c
index faf8c56..9eed973 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -745,6 +745,57 @@ int do_change_block(Monitor *mon, const char *device,
     return monitor_read_bdrv_key_start(mon, bs, NULL, NULL);
 }
 
+/* throttling disk I/O limits */
+int do_block_set_io_throttle(Monitor *mon,
+                       const QDict *qdict, QObject **ret_data)
+{
+    const char *devname = qdict_get_str(qdict, "device");
+    int64_t bps        = qdict_get_try_int(qdict, "bps", -1);
+    int64_t bps_rd     = qdict_get_try_int(qdict, "bps_rd", -1);
+    int64_t bps_wr     = qdict_get_try_int(qdict, "bps_wr", -1);
+    int64_t iops       = qdict_get_try_int(qdict, "iops", -1);
+    int64_t iops_rd    = qdict_get_try_int(qdict, "iops_rd", -1);
+    int64_t iops_wr    = qdict_get_try_int(qdict, "iops_wr", -1);
+    BlockDriverState *bs;
+
+    bs = bdrv_find(devname);
+    if (!bs) {
+        qerror_report(QERR_DEVICE_NOT_FOUND, devname);
+        return -1;
+    }
+
+    if ((bps == -1) || (bps_rd == -1) || (bps_wr == -1)
+        || (iops == -1) || (iops_rd == -1) || (iops_wr == -1)) {
+        qerror_report(QERR_MISSING_PARAMETER,
+                      "bps/bps_rd/bps_wr/iops/iops_rd/iops_wr");
+        return -1;
+    }
+
+    if ((bps != 0 && (bps_rd != 0  || bps_wr != 0))
+        || (iops != 0 && (iops_rd != 0 || iops_wr != 0))) {
+        qerror_report(QERR_INVALID_PARAMETER_COMBINATION);
+        return -1;
+    }
+
+    bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL] = bps;
+    bs->io_limits.bps[BLOCK_IO_LIMIT_READ]  = bps_rd;
+    bs->io_limits.bps[BLOCK_IO_LIMIT_WRITE] = bps_wr;
+    bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL] = iops;
+    bs->io_limits.iops[BLOCK_IO_LIMIT_READ]  = iops_rd;
+    bs->io_limits.iops[BLOCK_IO_LIMIT_WRITE] = iops_wr;
+    bs->slice_time = BLOCK_IO_SLICE_TIME;
+
+    if (!bs->io_limits_enabled && bdrv_io_limits_enabled(bs)) {
+        bdrv_io_limits_enable(bs);
+    } else if (bs->io_limits_enabled && !bdrv_io_limits_enabled(bs)) {
+        bdrv_io_limits_disable(bs);
+    } else {
+        qemu_mod_timer(bs->block_timer, qemu_get_clock_ns(vm_clock));
+    }
+
+    return 0;
+}
+
 int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data)
 {
     const char *id = qdict_get_str(qdict, "id");
diff --git a/blockdev.h b/blockdev.h
index 3587786..1b48a75 100644
--- a/blockdev.h
+++ b/blockdev.h
@@ -63,6 +63,8 @@ int do_block_set_passwd(Monitor *mon, const QDict *qdict, QObject **ret_data);
 int do_change_block(Monitor *mon, const char *device,
                     const char *filename, const char *fmt);
 int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data);
+int do_block_set_io_throttle(Monitor *mon,
+                             const QDict *qdict, QObject **ret_data);
 int do_snapshot_blkdev(Monitor *mon, const QDict *qdict, QObject **ret_data);
 int do_block_resize(Monitor *mon, const QDict *qdict, QObject **ret_data);
 
diff --git a/hmp-commands.hx b/hmp-commands.hx
index ab08d58..a2fdf68 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1207,6 +1207,21 @@ ETEXI
     },
 
 STEXI
+@item block_set_io_throttle @var{device} @var{bps} @var{bps_rd} @var{bps_wr} @var{iops} @var{iops_rd} @var{iops_wr}
+@findex block_set_io_throttle
+Change I/O throttle limits for a block drive to @var{bps} @var{bps_rd} @var{bps_wr} @var{iops} @var{iops_rd} @var{iops_wr}
+ETEXI
+
+    {
+        .name       = "block_set_io_throttle",
+        .args_type  = "device:B,bps:i?,bps_rd:i?,bps_wr:i?,iops:i?,iops_rd:i?,iops_wr:i?",
+        .params     = "device [bps] [bps_rd] [bps_wr] [iops] [iops_rd] [iops_wr]",
+        .help       = "change I/O throttle limits for a block drive",
+        .user_print = monitor_user_noop,
+        .mhandler.cmd_new = do_block_set_io_throttle,
+    },
+
+STEXI
 @item block_passwd @var{device} @var{password}
 @findex block_passwd
 Set the encrypted device @var{device} password to @var{password}
diff --git a/qerror.c b/qerror.c
index 68998d4..6fe53b6 100644
--- a/qerror.c
+++ b/qerror.c
@@ -234,6 +234,10 @@ static const QErrorStringTable qerror_table[] = {
         .error_fmt = QERR_QGA_COMMAND_FAILED,
         .desc      = "Guest agent command failed, error was '%(message)'",
     },
+    {
+        .error_fmt = QERR_INVALID_PARAMETER_COMBINATION,
+        .desc      = "Invalid paramter combination",
+    },
     {}
 };
 
diff --git a/qerror.h b/qerror.h
index d4bfcfd..777a36a 100644
--- a/qerror.h
+++ b/qerror.h
@@ -198,4 +198,7 @@ QError *qobject_to_qerror(const QObject *obj);
 #define QERR_QGA_COMMAND_FAILED \
     "{ 'class': 'QgaCommandFailed', 'data': { 'message': %s } }"
 
+#define QERR_INVALID_PARAMETER_COMBINATION \
+    "{ 'class': 'InvalidParameterCombination', 'data': {} }"
+
 #endif /* QERROR_H */
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 4328e8b..6aa90b7 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -853,6 +853,44 @@ Example:
 EQMP
 
     {
+        .name       = "block_set_io_throttle",
+        .args_type  = "device:B,bps:i?,bps_rd:i?,bps_wr:i?,iops:i?,iops_rd:i?,iops_wr:i?",
+        .params     = "device [bps] [bps_rd] [bps_wr] [iops] [iops_rd] [iops_wr]",
+        .help       = "change I/O throttle limits for a block drive",
+        .user_print = monitor_user_noop,
+        .mhandler.cmd_new = do_block_set_io_throttle,
+    },
+
+SQMP
+block_set_io_throttle
+------------
+
+Change I/O throttle limits for a block drive.
+
+Arguments:
+
+- "device": device name (json-string)
+- "bps":  total throughput limit in bytes per second(json-int, optional)
+- "bps_rd":  read throughput limit in bytes per second(json-int, optional)
+- "bps_wr":  read throughput limit in bytes per second(json-int, optional)
+- "iops":  total I/O operations per second(json-int, optional)
+- "iops_rd":  read I/O operations per second(json-int, optional)
+- "iops_wr":  write I/O operations per second(json-int, optional)
+
+Example:
+
+-> { "execute": "block_set_io_throttle", "arguments": { "device": "virtio0",
+                                               "bps": "1000000",
+                                               "bps_rd": "0",
+                                               "bps_wr": "0",
+                                               "iops": "0",
+                                               "iops_rd": "0",
+                                               "iops_wr": "0" } }
+<- { "return": {} }
+
+EQMP
+
+    {
         .name       = "set_password",
         .args_type  = "protocol:s,password:s,connected:s?",
         .params     = "protocol password action-if-connected",
@@ -1154,6 +1192,13 @@ Each json-object contain the following:
                                 "tftp", "vdi", "vmdk", "vpc", "vvfat"
          - "backing_file": backing file name (json-string, optional)
          - "encrypted": true if encrypted, false otherwise (json-bool)
+         - "bps": limit total bytes per second (json-int)
+         - "bps_rd": limit read bytes per second (json-int)
+         - "bps_wr": limit write bytes per second (json-int)
+         - "iops": limit total I/O operations per second (json-int)
+         - "iops_rd": limit read operations per second (json-int)
+         - "iops_wr": limit write operations per second (json-int)
+
 - "io-status": I/O operation status, only present if the device supports it
                and the VM is configured to stop on errors. It's always reset
                to "ok" when the "cont" command is issued (json_string, optional)
@@ -1173,7 +1218,13 @@ Example:
                "ro":false,
                "drv":"qcow2",
                "encrypted":false,
-               "file":"disks/test.img"
+               "file":"disks/test.img",
+               "bps":1000000,
+               "bps_rd":0,
+               "bps_wr":0,
+               "iops":1000000,
+               "iops_rd":0,
+               "iops_wr":0,
             },
             "type":"unknown"
          },
-- 
1.7.6

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH v9 0/4] The intro of QEMU block I/O throttling
  2011-10-28 10:02 [Qemu-devel] [PATCH v9 0/4] The intro of QEMU block I/O throttling Zhi Yong Wu
                   ` (3 preceding siblings ...)
  2011-10-28 10:02 ` [Qemu-devel] [PATCH v9 4/4] qmp/hmp: add block_set_io_throttle Zhi Yong Wu
@ 2011-10-30 10:44 ` Richard Davies
  2011-10-31  2:20   ` Zhi Yong Wu
  4 siblings, 1 reply; 9+ messages in thread
From: Richard Davies @ 2011-10-30 10:44 UTC (permalink / raw)
  To: Zhi Yong Wu, kwolf; +Cc: qemu-devel, stefanha

Hi,

I've been following the evolution of this patch with great interest for use
in our qemu-kvm based IaaS public cloud.

I am not a qemu developer, but have watched this patch go through many
rounds of review and we are very much hoping that it makes it into QEMU 1.0.

In a multi-customer multi-VM public cloud environment, disk i/o limiting is
absolutely key, and qemu itself is a much better place to do it than with a
mix of cgroups and other technologies for each different storage backend.
I understand that it is also more efficient:

  http://www.linux-kvm.org/wiki/images/7/72/2011-forum-keep-a-limit-on-it-io-throttling-in-qemu.pdf

As I say, I am not a qemu developer, so cannot practically help, but wanted
to post to emphasize the importance of this technology as a consumer of qemu.

Best regards,

Richard.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH v9 0/4] The intro of QEMU block I/O throttling
  2011-10-30 10:44 ` [Qemu-devel] [PATCH v9 0/4] The intro of QEMU block I/O throttling Richard Davies
@ 2011-10-31  2:20   ` Zhi Yong Wu
  0 siblings, 0 replies; 9+ messages in thread
From: Zhi Yong Wu @ 2011-10-31  2:20 UTC (permalink / raw)
  To: Richard Davies; +Cc: kwolf, Zhiyong Wu, Marcelo Tosatti, qemu-devel, stefanha

On Sun, Oct 30, 2011 at 6:44 PM, Richard Davies
<richard.davies@elastichosts.com> wrote:
> Hi,
>
> I've been following the evolution of this patch with great interest for use
> in our qemu-kvm based IaaS public cloud.
>
> I am not a qemu developer, but have watched this patch go through many
> rounds of review and we are very much hoping that it makes it into QEMU 1.0.
Yeah, we also hope that it can be merged into QEMU 1.0.
By the way, the issue below has been resolved.
1.) When bps/iops limits are specified to a small value such as 511
bytes/s, this VM will hang up. We are considering how to handle this
senario.

Any comment is appreciated if you have other concerns.

>
> In a multi-customer multi-VM public cloud environment, disk i/o limiting is
> absolutely key, and qemu itself is a much better place to do it than with a
> mix of cgroups and other technologies for each different storage backend.
> I understand that it is also more efficient:
>
>  http://www.linux-kvm.org/wiki/images/7/72/2011-forum-keep-a-limit-on-it-io-throttling-in-qemu.pdf
>
> As I say, I am not a qemu developer, so cannot practically help, but wanted
> to post to emphasize the importance of this technology as a consumer of qemu.
No matter...

>
> Best regards,
>
> Richard.
>
>



-- 
Regards,

Zhi Yong Wu

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH v9 1/4] block: add the block queue support
  2011-10-28 10:02 ` [Qemu-devel] [PATCH v9 1/4] block: add the block queue support Zhi Yong Wu
@ 2011-10-31 13:35   ` Stefan Hajnoczi
  2011-11-01  7:50     ` Zhi Yong Wu
  0 siblings, 1 reply; 9+ messages in thread
From: Stefan Hajnoczi @ 2011-10-31 13:35 UTC (permalink / raw)
  To: Zhi Yong Wu; +Cc: kwolf, qemu-devel, stefanha

On Fri, Oct 28, 2011 at 11:02 AM, Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> wrote:
> +static void bdrv_io_limits_skip_set(void *opaque,
> +                                    BlockAPIType co_type,
> +                                    bool cb_skip,
> +                                    bool limit_skip) {
> +    RwCo *rwco;
> +    BlockDriverAIOCBCoroutine *aioco;
> +
> +    if (co_type == BDRV_API_SYNC) {
> +        rwco = opaque;
> +        rwco->limit_skip = limit_skip;
> +    } else if (co_type == BDRV_API_ASYNC) {
> +        aioco = opaque;
> +        aioco->cb_skip = cb_skip;
> +        aioco->limit_skip = limit_skip;
> +    } else {
> +        abort();
> +    }
> +}

The main question I have about this series is why have different cases
for sync, aio, and coroutines?  Perhaps I'm missing something but this
should all be much simpler.

All read/write requests are processed in a coroutine
(bdrv_co_do_readv()/bdrv_co_do_writev()).  That is the place to
perform I/O throttling.  Throttling should not be aware of sync, aio,
vs coroutines.

Since all requests have coroutines you could use CoQueue and the
actual queue waiting code in bdrv_co_do_readv()/bdrv_co_do_writev()
becomes:

if (bdrv_exceeds_io_limit(bs, sector_num, qiov, is_write)) {
    qemu_co_queue_wait(&bs->throttled_reqs);

    /* Wait until this request is allowed to start */
    while (bdrv_exceeds_io_limit(bs, sector_num, qiov, is_write)) {
        /* Re-inserting at the head of the CoQueue is equivalent to
         * the queue->flushing/queue->limit_exceeded behavior in your
         * patch.
         */
        qemu_co_queue_wait_insert_head(&bs->throttled_reqs);
    }
}

I think block/blk-queue.h isn't needed if you use the existing CoQueue
structure.

This is the main point I want to raise, just a few minor comments
below which may not be relevant if you can drop BlockQueue.

> +static int qemu_block_queue_handler(BlockQueueAIOCB *request)
> +{
> +    int ret;
> +
> +    BlockDriverState *bs = request->common.bs;
> +    assert(bs);
> +
> +    if (bs->io_limits_enabled) {

I'm not sure why the BlockQueue needs to reach into BlockDriverState.
Now BlockDriverState and BlockQueue know about and depend on each
other.  It's usually nicer to keep the relationship unidirectional, if
possible.

> +        ret = request->handler(request->common.bs, request->sector_num,
> +                               request->nb_sectors, request->qiov,
> +                               request, request->co_type);
> +    } else {
> +        if (request->co_type == BDRV_API_CO) {
> +            qemu_coroutine_enter(request->co, request->cocb);
> +        } else {
> +            printf("%s, req: %p\n", __func__, (void *)request);

Debug output should be removed.

> +            bdrv_io_limits_issue_request(request, request->co_type);
> +        }
> +
> +        ret = BDRV_BLKQ_DEQ_PASS;
> +    }
> +
> +    return ret;
> +}
> +
> +void qemu_block_queue_submit(BlockQueue *queue, BlockDriverCompletionFunc *cb)
> +{
> +    BlockQueueAIOCB *request;
> +    queue->flushing = true;
> +
> +    /*QTAILQ_FOREACH_SAFE(request, &queue->requests, entry, next) {*/

Commented out code should be removed.

> +    while (!QTAILQ_EMPTY(&queue->requests)) {
> +        int ret = 0;
> +
> +        request = QTAILQ_FIRST(&queue->requests);
> +        QTAILQ_REMOVE(&queue->requests, request, entry);
> +        queue->limit_exceeded = false;
> +        ret = qemu_block_queue_handler(request);
> +        if (ret == -EIO) {
> +            cb(request, -EIO);
> +            break;
> +        } else if (ret == BDRV_BLKQ_ENQ_AGAIN) {
> +            QTAILQ_INSERT_HEAD(&queue->requests, request, entry);
> +            break;
> +        } else if (ret == BDRV_BLKQ_DEQ_PASS) {
> +            cb(request, 0);
> +        }

What if ret is not -EIO, BDRV_BLKQ_ENQ_AGAIN, or BDRV_BLKQ_DEQ_PASS?
I think the -EIO case should be the default that calls cb(request,
ret).

> +    }
> +
> +    printf("%s, leave\n", __func__);

Debug code should be removed.

Stefan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH v9 1/4] block: add the block queue support
  2011-10-31 13:35   ` Stefan Hajnoczi
@ 2011-11-01  7:50     ` Zhi Yong Wu
  0 siblings, 0 replies; 9+ messages in thread
From: Zhi Yong Wu @ 2011-11-01  7:50 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: kwolf, Zhi Yong Wu, qemu-devel, stefanha

On Mon, Oct 31, 2011 at 9:35 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> On Fri, Oct 28, 2011 at 11:02 AM, Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> wrote:
>> +static void bdrv_io_limits_skip_set(void *opaque,
>> +                                    BlockAPIType co_type,
>> +                                    bool cb_skip,
>> +                                    bool limit_skip) {
>> +    RwCo *rwco;
>> +    BlockDriverAIOCBCoroutine *aioco;
>> +
>> +    if (co_type == BDRV_API_SYNC) {
>> +        rwco = opaque;
>> +        rwco->limit_skip = limit_skip;
>> +    } else if (co_type == BDRV_API_ASYNC) {
>> +        aioco = opaque;
>> +        aioco->cb_skip = cb_skip;
>> +        aioco->limit_skip = limit_skip;
>> +    } else {
>> +        abort();
>> +    }
>> +}
>
I have sent out v10. It discard the queue and request defined by us,
and rebase it to CoQueue, and let Coroutine represent one I/O request.
The code logic is now much simpler.

> The main question I have about this series is why have different cases
> for sync, aio, and coroutines?  Perhaps I'm missing something but this
> should all be much simpler.
>
> All read/write requests are processed in a coroutine
> (bdrv_co_do_readv()/bdrv_co_do_writev()).  That is the place to
> perform I/O throttling.  Throttling should not be aware of sync, aio,
> vs coroutines.
>
> Since all requests have coroutines you could use CoQueue and the
> actual queue waiting code in bdrv_co_do_readv()/bdrv_co_do_writev()
> becomes:
>
> if (bdrv_exceeds_io_limit(bs, sector_num, qiov, is_write)) {
>    qemu_co_queue_wait(&bs->throttled_reqs);
>
>    /* Wait until this request is allowed to start */
>    while (bdrv_exceeds_io_limit(bs, sector_num, qiov, is_write)) {
>        /* Re-inserting at the head of the CoQueue is equivalent to
>         * the queue->flushing/queue->limit_exceeded behavior in your
>         * patch.
>         */
>        qemu_co_queue_wait_insert_head(&bs->throttled_reqs);
>    }
> }
>
> I think block/blk-queue.h isn't needed if you use the existing CoQueue
> structure.
>
> This is the main point I want to raise, just a few minor comments
> below which may not be relevant if you can drop BlockQueue.
>
>> +static int qemu_block_queue_handler(BlockQueueAIOCB *request)
>> +{
>> +    int ret;
>> +
>> +    BlockDriverState *bs = request->common.bs;
>> +    assert(bs);
>> +
>> +    if (bs->io_limits_enabled) {
>
> I'm not sure why the BlockQueue needs to reach into BlockDriverState.
> Now BlockDriverState and BlockQueue know about and depend on each
> other.  It's usually nicer to keep the relationship unidirectional, if
> possible.
>
>> +        ret = request->handler(request->common.bs, request->sector_num,
>> +                               request->nb_sectors, request->qiov,
>> +                               request, request->co_type);
>> +    } else {
>> +        if (request->co_type == BDRV_API_CO) {
>> +            qemu_coroutine_enter(request->co, request->cocb);
>> +        } else {
>> +            printf("%s, req: %p\n", __func__, (void *)request);
>
> Debug output should be removed.
>
>> +            bdrv_io_limits_issue_request(request, request->co_type);
>> +        }
>> +
>> +        ret = BDRV_BLKQ_DEQ_PASS;
>> +    }
>> +
>> +    return ret;
>> +}
>> +
>> +void qemu_block_queue_submit(BlockQueue *queue, BlockDriverCompletionFunc *cb)
>> +{
>> +    BlockQueueAIOCB *request;
>> +    queue->flushing = true;
>> +
>> +    /*QTAILQ_FOREACH_SAFE(request, &queue->requests, entry, next) {*/
>
> Commented out code should be removed.
>
>> +    while (!QTAILQ_EMPTY(&queue->requests)) {
>> +        int ret = 0;
>> +
>> +        request = QTAILQ_FIRST(&queue->requests);
>> +        QTAILQ_REMOVE(&queue->requests, request, entry);
>> +        queue->limit_exceeded = false;
>> +        ret = qemu_block_queue_handler(request);
>> +        if (ret == -EIO) {
>> +            cb(request, -EIO);
>> +            break;
>> +        } else if (ret == BDRV_BLKQ_ENQ_AGAIN) {
>> +            QTAILQ_INSERT_HEAD(&queue->requests, request, entry);
>> +            break;
>> +        } else if (ret == BDRV_BLKQ_DEQ_PASS) {
>> +            cb(request, 0);
>> +        }
>
> What if ret is not -EIO, BDRV_BLKQ_ENQ_AGAIN, or BDRV_BLKQ_DEQ_PASS?
> I think the -EIO case should be the default that calls cb(request,
> ret).
>
>> +    }
>> +
>> +    printf("%s, leave\n", __func__);
>
> Debug code should be removed.
>
> Stefan
>
>



-- 
Regards,

Zhi Yong Wu

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2011-11-01  7:50 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-10-28 10:02 [Qemu-devel] [PATCH v9 0/4] The intro of QEMU block I/O throttling Zhi Yong Wu
2011-10-28 10:02 ` [Qemu-devel] [PATCH v9 1/4] block: add the block queue support Zhi Yong Wu
2011-10-31 13:35   ` Stefan Hajnoczi
2011-11-01  7:50     ` Zhi Yong Wu
2011-10-28 10:02 ` [Qemu-devel] [PATCH v9 2/4] block: add the command line support Zhi Yong Wu
2011-10-28 10:02 ` [Qemu-devel] [PATCH v9 3/4] block: add the block throttling algorithm Zhi Yong Wu
2011-10-28 10:02 ` [Qemu-devel] [PATCH v9 4/4] qmp/hmp: add block_set_io_throttle Zhi Yong Wu
2011-10-30 10:44 ` [Qemu-devel] [PATCH v9 0/4] The intro of QEMU block I/O throttling Richard Davies
2011-10-31  2:20   ` Zhi Yong Wu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).