[Qemu-devel] [PATCH v10 0/3] The intro of QEMU block I/O throttling

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH v10 0/3] The intro of QEMU block I/O throttling
@ 2011-11-01  7:40 Zhi Yong Wu
  2011-11-01  7:40 ` [Qemu-devel] [PATCH v10 1/3] block: add the command line support Zhi Yong Wu
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Zhi Yong Wu @ 2011-11-01  7:40 UTC (permalink / raw)
  To: kwolf; +Cc: zwu.kernel, wuzhy, qemu-devel, stefanha

The main goal of the patch is to effectively cap the disk I/O speed or counts of one single VM.It is only one draft, so it unavoidably has some drawbacks, if you catch them, please let me know.

The patch will mainly introduce one block I/O throttling algorithm, one timer and one block queue for each I/O limits enabled drive.

When a block request is coming in, the throttling algorithm will check if its I/O rate or counts exceed the limits; if yes, then it will enqueue to the block queue; The timer will handle the I/O requests in it.

Some available features follow as below:
(1) global bps limit.
   -drive bps=xxx            in bytes/s
(2) only read bps limit
   -drive bps_rd=xxx         in bytes/s
(3) only write bps limit
   -drive bps_wr=xxx         in bytes/s
(4) global iops limit
   -drive iops=xxx           in ios/s
(5) only read iops limit
   -drive iops_rd=xxx        in ios/s
(6) only write iops limit
   -drive iops_wr=xxx        in ios/s
(7) the combination of some limits.
   -drive bps=xxx,iops=xxx

Known Limitations:
(1) #1 can not coexist with #2, #3
(2) #4 can not coexist with #5, #6

Changes since code V9:
     Greately simply the logic and rebase request queue to CoQueue based on Stefan's comments.

 v9: made a lot of changes based on kevin's comments.
     slice_time is dynamically adjusted based on wait_time.
     rebase the latest qemu upstream.

 v8: fix the build per patch based on stefan's comments.

 v7: Mainly simply the block queue.
     Adjust codes based on stefan's comments.

 v6: Mainly fix the aio callback issue for block queue.
     Adjust codes based on Ram Pai's comments.

 v5: add qmp/hmp support.
     Adjust the codes based on stefan's comments
     qmp/hmp: add block_set_io_throttle

 v4: fix memory leaking based on ryan's feedback.

 v3: Added the code for extending slice time, and modified the method to compute wait time for the timer.

 v2: The codes V2 for QEMU disk I/O limits.
     Modified the codes mainly based on stefan's comments.

 v1: Submit the codes for QEMU disk I/O limits.
     Only a code draft.


Zhi Yong Wu (3):
  block: add the command line support
  block: add I/O throttling algorithm
  hmp/qmp: add block_set_io_throttle

 block.c               |  283 +++++++++++++++++++++++++++++++++++++++++++++++++
 block.h               |    5 +
 block_int.h           |   30 +++++
 blockdev.c            |   83 ++++++++++++++
 blockdev.h            |    2 +
 hmp-commands.hx       |   15 +++
 hmp.c                 |   10 ++
 qapi-schema.json      |   16 +++-
 qemu-config.c         |   24 ++++
 qemu-coroutine-lock.c |    8 ++
 qemu-coroutine.h      |    6 +
 qemu-options.hx       |    1 +
 qerror.c              |    4 +
 qerror.h              |    3 +
 qmp-commands.hx       |   53 +++++++++-
 15 files changed, 541 insertions(+), 2 deletions(-)

-- 
1.7.6

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Qemu-devel] [PATCH v10 1/3] block: add the command line support
  2011-11-01  7:40 [Qemu-devel] [PATCH v10 0/3] The intro of QEMU block I/O throttling Zhi Yong Wu
@ 2011-11-01  7:40 ` Zhi Yong Wu
  2011-11-01 22:43   ` Ryan Harper
  2011-11-01  7:40 ` [Qemu-devel] [PATCH v10 2/3] block: add I/O throttling algorithm Zhi Yong Wu
  2011-11-01  7:40 ` [Qemu-devel] [PATCH v10 3/3] hmp/qmp: add block_set_io_throttle Zhi Yong Wu
  2 siblings, 1 reply; 9+ messages in thread
From: Zhi Yong Wu @ 2011-11-01  7:40 UTC (permalink / raw)
  To: kwolf; +Cc: zwu.kernel, wuzhy, qemu-devel, stefanha

Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
---
 block.c         |   40 ++++++++++++++++++++++++++++++++++++++++
 block.h         |    4 ++++
 block_int.h     |   29 +++++++++++++++++++++++++++++
 blockdev.c      |   32 ++++++++++++++++++++++++++++++++
 qemu-config.c   |   24 ++++++++++++++++++++++++
 qemu-options.hx |    1 +
 6 files changed, 130 insertions(+), 0 deletions(-)

diff --git a/block.c b/block.c
index 9bb236c..8f08dc5 100644
--- a/block.c
+++ b/block.c
@@ -30,6 +30,7 @@
 #include "qjson.h"
 #include "qemu-coroutine.h"
 #include "qmp-commands.h"
+#include "qemu-timer.h"
 
 #ifdef CONFIG_BSD
 #include <sys/types.h>
@@ -105,6 +106,37 @@ int is_windows_drive(const char *filename)
 }
 #endif
 
+/* throttling disk I/O limits */
+static void bdrv_block_timer(void *opaque)
+{
+    BlockDriverState *bs = opaque;
+
+    qemu_co_queue_next(&bs->throttled_reqs);
+}
+
+void bdrv_io_limits_enable(BlockDriverState *bs)
+{
+    bs->io_limits_enabled = true;
+    qemu_co_queue_init(&bs->throttled_reqs);
+
+    bs->block_timer   = qemu_new_timer_ns(vm_clock, bdrv_block_timer, bs);
+    bs->slice_time    = 5 * BLOCK_IO_SLICE_TIME;
+    bs->slice_start   = qemu_get_clock_ns(vm_clock);
+    bs->slice_end     = bs->slice_start + bs->slice_time;
+    memset(&bs->io_disps, 0, sizeof(bs->io_disps));
+}
+
+bool bdrv_io_limits_enabled(BlockDriverState *bs)
+{
+    BlockIOLimit *io_limits = &bs->io_limits;
+    return io_limits->bps[BLOCK_IO_LIMIT_READ]
+         || io_limits->bps[BLOCK_IO_LIMIT_WRITE]
+         || io_limits->bps[BLOCK_IO_LIMIT_TOTAL]
+         || io_limits->iops[BLOCK_IO_LIMIT_READ]
+         || io_limits->iops[BLOCK_IO_LIMIT_WRITE]
+         || io_limits->iops[BLOCK_IO_LIMIT_TOTAL];
+}
+
 /* check if the path starts with "<protocol>:" */
 static int path_has_protocol(const char *path)
 {
@@ -1519,6 +1551,14 @@ void bdrv_get_geometry_hint(BlockDriverState *bs,
     *psecs = bs->secs;
 }
 
+/* throttling disk io limits */
+void bdrv_set_io_limits(BlockDriverState *bs,
+                            BlockIOLimit *io_limits)
+{
+    bs->io_limits = *io_limits;
+    bs->io_limits_enabled = bdrv_io_limits_enabled(bs);
+}
+
 /* Recognize floppy formats */
 typedef struct FDFormat {
     FDriveType drive;
diff --git a/block.h b/block.h
index 38cd748..bc8315d 100644
--- a/block.h
+++ b/block.h
@@ -89,6 +89,10 @@ void bdrv_info(Monitor *mon, QObject **ret_data);
 void bdrv_stats_print(Monitor *mon, const QObject *data);
 void bdrv_info_stats(Monitor *mon, QObject **ret_data);
 
+/* disk I/O throttling */
+void bdrv_io_limits_enable(BlockDriverState *bs);
+bool bdrv_io_limits_enabled(BlockDriverState *bs);
+
 void bdrv_init(void);
 void bdrv_init_with_whitelist(void);
 BlockDriver *bdrv_find_protocol(const char *filename);
diff --git a/block_int.h b/block_int.h
index f4547f6..b835ef6 100644
--- a/block_int.h
+++ b/block_int.h
@@ -34,6 +34,12 @@
 #define BLOCK_FLAG_ENCRYPT	1
 #define BLOCK_FLAG_COMPAT6	4
 
+#define BLOCK_IO_LIMIT_READ     0
+#define BLOCK_IO_LIMIT_WRITE    1
+#define BLOCK_IO_LIMIT_TOTAL    2
+
+#define BLOCK_IO_SLICE_TIME     100000000
+
 #define BLOCK_OPT_SIZE          "size"
 #define BLOCK_OPT_ENCRYPT       "encryption"
 #define BLOCK_OPT_COMPAT6       "compat6"
@@ -50,6 +56,16 @@ typedef struct AIOPool {
     BlockDriverAIOCB *free_aiocb;
 } AIOPool;
 
+typedef struct BlockIOLimit {
+    uint64_t bps[3];
+    uint64_t iops[3];
+} BlockIOLimit;
+
+typedef struct BlockIODisp {
+    uint64_t bytes[2];
+    uint64_t ios[2];
+} BlockIODisp;
+
 struct BlockDriver {
     const char *format_name;
     int instance_size;
@@ -184,6 +200,16 @@ struct BlockDriverState {
 
     void *sync_aiocb;
 
+    /* the time for latest disk I/O */
+    int64_t slice_time;
+    int64_t slice_start;
+    int64_t slice_end;
+    BlockIOLimit io_limits;
+    BlockIODisp  io_disps;
+    CoQueue      throttled_reqs;
+    QEMUTimer    *block_timer;
+    bool         io_limits_enabled;
+
     /* I/O stats (display with "info blockstats"). */
     uint64_t nr_bytes[BDRV_MAX_IOTYPE];
     uint64_t nr_ops[BDRV_MAX_IOTYPE];
@@ -227,6 +253,9 @@ void *qemu_aio_get(AIOPool *pool, BlockDriverState *bs,
                    BlockDriverCompletionFunc *cb, void *opaque);
 void qemu_aio_release(void *p);
 
+void bdrv_set_io_limits(BlockDriverState *bs,
+                        BlockIOLimit *io_limits);
+
 #ifdef _WIN32
 int is_windows_drive(const char *filename);
 #endif
diff --git a/blockdev.c b/blockdev.c
index 0827bf7..faf8c56 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -235,6 +235,9 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
     int on_read_error, on_write_error;
     const char *devaddr;
     DriveInfo *dinfo;
+    BlockIOLimit io_limits;
+    bool bps_iol;
+    bool iops_iol;
     int snapshot = 0;
     int ret;
 
@@ -353,6 +356,32 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
         }
     }
 
+    /* disk I/O throttling */
+    io_limits.bps[BLOCK_IO_LIMIT_TOTAL]  =
+                           qemu_opt_get_number(opts, "bps", 0);
+    io_limits.bps[BLOCK_IO_LIMIT_READ]   =
+                           qemu_opt_get_number(opts, "bps_rd", 0);
+    io_limits.bps[BLOCK_IO_LIMIT_WRITE]  =
+                           qemu_opt_get_number(opts, "bps_wr", 0);
+    io_limits.iops[BLOCK_IO_LIMIT_TOTAL] =
+                           qemu_opt_get_number(opts, "iops", 0);
+    io_limits.iops[BLOCK_IO_LIMIT_READ]  =
+                           qemu_opt_get_number(opts, "iops_rd", 0);
+    io_limits.iops[BLOCK_IO_LIMIT_WRITE] =
+                           qemu_opt_get_number(opts, "iops_wr", 0);
+
+    bps_iol  = (io_limits.bps[BLOCK_IO_LIMIT_TOTAL] != 0)
+                 && ((io_limits.bps[BLOCK_IO_LIMIT_READ] != 0)
+                 || (io_limits.bps[BLOCK_IO_LIMIT_WRITE] != 0));
+    iops_iol = (io_limits.iops[BLOCK_IO_LIMIT_TOTAL] != 0)
+                 && ((io_limits.iops[BLOCK_IO_LIMIT_READ] != 0)
+                 || (io_limits.iops[BLOCK_IO_LIMIT_WRITE] != 0));
+    if (bps_iol || iops_iol) {
+        error_report("bps(iops) and bps_rd/bps_wr(iops_rd/iops_wr)"
+                     "cannot be used at the same time");
+        return NULL;
+    }
+
     on_write_error = BLOCK_ERR_STOP_ENOSPC;
     if ((buf = qemu_opt_get(opts, "werror")) != NULL) {
         if (type != IF_IDE && type != IF_SCSI && type != IF_VIRTIO && type != IF_NONE) {
@@ -460,6 +489,9 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
 
     bdrv_set_on_error(dinfo->bdrv, on_read_error, on_write_error);
 
+    /* disk I/O throttling */
+    bdrv_set_io_limits(dinfo->bdrv, &io_limits);
+
     switch(type) {
     case IF_IDE:
     case IF_SCSI:
diff --git a/qemu-config.c b/qemu-config.c
index 597d7e1..1aa080f 100644
--- a/qemu-config.c
+++ b/qemu-config.c
@@ -85,6 +85,30 @@ static QemuOptsList qemu_drive_opts = {
             .name = "readonly",
             .type = QEMU_OPT_BOOL,
             .help = "open drive file as read-only",
+        },{
+            .name = "iops",
+            .type = QEMU_OPT_NUMBER,
+            .help = "limit total I/O operations per second",
+        },{
+            .name = "iops_rd",
+            .type = QEMU_OPT_NUMBER,
+            .help = "limit read operations per second",
+        },{
+            .name = "iops_wr",
+            .type = QEMU_OPT_NUMBER,
+            .help = "limit write operations per second",
+        },{
+            .name = "bps",
+            .type = QEMU_OPT_NUMBER,
+            .help = "limit total bytes per second",
+        },{
+            .name = "bps_rd",
+            .type = QEMU_OPT_NUMBER,
+            .help = "limit read bytes per second",
+        },{
+            .name = "bps_wr",
+            .type = QEMU_OPT_NUMBER,
+            .help = "limit write bytes per second",
         },
         { /* end of list */ }
     },
diff --git a/qemu-options.hx b/qemu-options.hx
index 681eaf1..25a7be7 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -136,6 +136,7 @@ DEF("drive", HAS_ARG, QEMU_OPTION_drive,
     "       [,cache=writethrough|writeback|none|directsync|unsafe][,format=f]\n"
     "       [,serial=s][,addr=A][,id=name][,aio=threads|native]\n"
     "       [,readonly=on|off]\n"
+    "       [[,bps=b]|[[,bps_rd=r][,bps_wr=w]]][[,iops=i]|[[,iops_rd=r][,iops_wr=w]]\n"
     "                use 'file' as a drive image\n", QEMU_ARCH_ALL)
 STEXI
 @item -drive @var{option}[,@var{option}[,@var{option}[,...]]]
-- 
1.7.6

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH v10 1/3] block: add the command line support
  2011-11-01  7:40 ` [Qemu-devel] [PATCH v10 1/3] block: add the command line support Zhi Yong Wu
@ 2011-11-01 22:43   ` Ryan Harper
  2011-11-02  3:44     ` Zhi Yong Wu
  0 siblings, 1 reply; 9+ messages in thread
From: Ryan Harper @ 2011-11-01 22:43 UTC (permalink / raw)
  To: Zhi Yong Wu; +Cc: kwolf, zwu.kernel, qemu-devel, stefanha

* Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> [2011-11-01 02:44]:
> Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
> ---
>  block.c         |   40 ++++++++++++++++++++++++++++++++++++++++
>  block.h         |    4 ++++
>  block_int.h     |   29 +++++++++++++++++++++++++++++
>  blockdev.c      |   32 ++++++++++++++++++++++++++++++++
>  qemu-config.c   |   24 ++++++++++++++++++++++++
>  qemu-options.hx |    1 +
>  6 files changed, 130 insertions(+), 0 deletions(-)

I suggest changing the Subject of the patch to include something about
blkio limits.  This will in searching git log later that this patch was
introducing blkio command line support.

> 
> diff --git a/block.c b/block.c
> index 9bb236c..8f08dc5 100644
> --- a/block.c
> +++ b/block.c
> @@ -30,6 +30,7 @@
>  #include "qjson.h"
>  #include "qemu-coroutine.h"
>  #include "qmp-commands.h"
> +#include "qemu-timer.h"
> 
>  #ifdef CONFIG_BSD
>  #include <sys/types.h>
> @@ -105,6 +106,37 @@ int is_windows_drive(const char *filename)
>  }
>  #endif
> 
> +/* throttling disk I/O limits */
> +static void bdrv_block_timer(void *opaque)
> +{
> +    BlockDriverState *bs = opaque;
> +
> +    qemu_co_queue_next(&bs->throttled_reqs);
> +}
> +
> +void bdrv_io_limits_enable(BlockDriverState *bs)
> +{
> +    bs->io_limits_enabled = true;

We may want to move this to the end of the function so we don't raise
the flag that we're enabled until we're done setting everything up.
I don't think we'll race right now, but no reason not to move this to
the end.

> +    qemu_co_queue_init(&bs->throttled_reqs);
> +
> +    bs->block_timer   = qemu_new_timer_ns(vm_clock, bdrv_block_timer, bs);
> +    bs->slice_time    = 5 * BLOCK_IO_SLICE_TIME;
> +    bs->slice_start   = qemu_get_clock_ns(vm_clock);
> +    bs->slice_end     = bs->slice_start + bs->slice_time;
> +    memset(&bs->io_disps, 0, sizeof(bs->io_disps));
> +}
> +
> +bool bdrv_io_limits_enabled(BlockDriverState *bs)
> +{
> +    BlockIOLimit *io_limits = &bs->io_limits;
> +    return io_limits->bps[BLOCK_IO_LIMIT_READ]
> +         || io_limits->bps[BLOCK_IO_LIMIT_WRITE]
> +         || io_limits->bps[BLOCK_IO_LIMIT_TOTAL]
> +         || io_limits->iops[BLOCK_IO_LIMIT_READ]
> +         || io_limits->iops[BLOCK_IO_LIMIT_WRITE]
> +         || io_limits->iops[BLOCK_IO_LIMIT_TOTAL];
> +}
> +
>  /* check if the path starts with "<protocol>:" */
>  static int path_has_protocol(const char *path)
>  {
> @@ -1519,6 +1551,14 @@ void bdrv_get_geometry_hint(BlockDriverState *bs,
>      *psecs = bs->secs;
>  }
> 
> +/* throttling disk io limits */
> +void bdrv_set_io_limits(BlockDriverState *bs,
> +                            BlockIOLimit *io_limits)
> +{
> +    bs->io_limits = *io_limits;
> +    bs->io_limits_enabled = bdrv_io_limits_enabled(bs);
> +}
> +
>  /* Recognize floppy formats */
>  typedef struct FDFormat {
>      FDriveType drive;
> diff --git a/block.h b/block.h
> index 38cd748..bc8315d 100644
> --- a/block.h
> +++ b/block.h
> @@ -89,6 +89,10 @@ void bdrv_info(Monitor *mon, QObject **ret_data);
>  void bdrv_stats_print(Monitor *mon, const QObject *data);
>  void bdrv_info_stats(Monitor *mon, QObject **ret_data);
> 
> +/* disk I/O throttling */
> +void bdrv_io_limits_enable(BlockDriverState *bs);
> +bool bdrv_io_limits_enabled(BlockDriverState *bs);
> +
>  void bdrv_init(void);
>  void bdrv_init_with_whitelist(void);
>  BlockDriver *bdrv_find_protocol(const char *filename);
> diff --git a/block_int.h b/block_int.h
> index f4547f6..b835ef6 100644
> --- a/block_int.h
> +++ b/block_int.h
> @@ -34,6 +34,12 @@
>  #define BLOCK_FLAG_ENCRYPT	1
>  #define BLOCK_FLAG_COMPAT6	4
> 
> +#define BLOCK_IO_LIMIT_READ     0
> +#define BLOCK_IO_LIMIT_WRITE    1
> +#define BLOCK_IO_LIMIT_TOTAL    2
> +
> +#define BLOCK_IO_SLICE_TIME     100000000
> +
>  #define BLOCK_OPT_SIZE          "size"
>  #define BLOCK_OPT_ENCRYPT       "encryption"
>  #define BLOCK_OPT_COMPAT6       "compat6"
> @@ -50,6 +56,16 @@ typedef struct AIOPool {
>      BlockDriverAIOCB *free_aiocb;
>  } AIOPool;
> 
> +typedef struct BlockIOLimit {
> +    uint64_t bps[3];
> +    uint64_t iops[3];
> +} BlockIOLimit;
> +
> +typedef struct BlockIODisp {
> +    uint64_t bytes[2];
> +    uint64_t ios[2];
> +} BlockIODisp;
> +
>  struct BlockDriver {
>      const char *format_name;
>      int instance_size;
> @@ -184,6 +200,16 @@ struct BlockDriverState {
> 
>      void *sync_aiocb;
> 
> +    /* the time for latest disk I/O */
> +    int64_t slice_time;
> +    int64_t slice_start;
> +    int64_t slice_end;
> +    BlockIOLimit io_limits;
> +    BlockIODisp  io_disps;
> +    CoQueue      throttled_reqs;
> +    QEMUTimer    *block_timer;
> +    bool         io_limits_enabled;
> +
>      /* I/O stats (display with "info blockstats"). */
>      uint64_t nr_bytes[BDRV_MAX_IOTYPE];
>      uint64_t nr_ops[BDRV_MAX_IOTYPE];
> @@ -227,6 +253,9 @@ void *qemu_aio_get(AIOPool *pool, BlockDriverState *bs,
>                     BlockDriverCompletionFunc *cb, void *opaque);
>  void qemu_aio_release(void *p);
> 
> +void bdrv_set_io_limits(BlockDriverState *bs,
> +                        BlockIOLimit *io_limits);
> +
>  #ifdef _WIN32
>  int is_windows_drive(const char *filename);
>  #endif
> diff --git a/blockdev.c b/blockdev.c
> index 0827bf7..faf8c56 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -235,6 +235,9 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
>      int on_read_error, on_write_error;
>      const char *devaddr;
>      DriveInfo *dinfo;
> +    BlockIOLimit io_limits;
> +    bool bps_iol;
> +    bool iops_iol;
>      int snapshot = 0;
>      int ret;
> 
> @@ -353,6 +356,32 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
>          }
>      }
> 
> +    /* disk I/O throttling */
> +    io_limits.bps[BLOCK_IO_LIMIT_TOTAL]  =
> +                           qemu_opt_get_number(opts, "bps", 0);
> +    io_limits.bps[BLOCK_IO_LIMIT_READ]   =
> +                           qemu_opt_get_number(opts, "bps_rd", 0);
> +    io_limits.bps[BLOCK_IO_LIMIT_WRITE]  =
> +                           qemu_opt_get_number(opts, "bps_wr", 0);
> +    io_limits.iops[BLOCK_IO_LIMIT_TOTAL] =
> +                           qemu_opt_get_number(opts, "iops", 0);
> +    io_limits.iops[BLOCK_IO_LIMIT_READ]  =
> +                           qemu_opt_get_number(opts, "iops_rd", 0);
> +    io_limits.iops[BLOCK_IO_LIMIT_WRITE] =
> +                           qemu_opt_get_number(opts, "iops_wr", 0);
> +
> +    bps_iol  = (io_limits.bps[BLOCK_IO_LIMIT_TOTAL] != 0)
> +                 && ((io_limits.bps[BLOCK_IO_LIMIT_READ] != 0)
> +                 || (io_limits.bps[BLOCK_IO_LIMIT_WRITE] != 0));
> +    iops_iol = (io_limits.iops[BLOCK_IO_LIMIT_TOTAL] != 0)
> +                 && ((io_limits.iops[BLOCK_IO_LIMIT_READ] != 0)
> +                 || (io_limits.iops[BLOCK_IO_LIMIT_WRITE] != 0));
> +    if (bps_iol || iops_iol) {
> +        error_report("bps(iops) and bps_rd/bps_wr(iops_rd/iops_wr)"
> +                     "cannot be used at the same time");
> +        return NULL;
> +    }
> +


How about extracting this section into the bdrv_set_io_limits() function or some
other common setting function.  This should be the same code that's used when
modifying/setting these values via the monitor as well (minus how you obtain the
input values - option parsings vs. monitor values).

>      on_write_error = BLOCK_ERR_STOP_ENOSPC;
>      if ((buf = qemu_opt_get(opts, "werror")) != NULL) {
>          if (type != IF_IDE && type != IF_SCSI && type != IF_VIRTIO && type != IF_NONE) {
> @@ -460,6 +489,9 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
> 
>      bdrv_set_on_error(dinfo->bdrv, on_read_error, on_write_error);
> 
> +    /* disk I/O throttling */
> +    bdrv_set_io_limits(dinfo->bdrv, &io_limits);
> +
>      switch(type) {
>      case IF_IDE:
>      case IF_SCSI:
> diff --git a/qemu-config.c b/qemu-config.c
> index 597d7e1..1aa080f 100644
> --- a/qemu-config.c
> +++ b/qemu-config.c
> @@ -85,6 +85,30 @@ static QemuOptsList qemu_drive_opts = {
>              .name = "readonly",
>              .type = QEMU_OPT_BOOL,
>              .help = "open drive file as read-only",
> +        },{
> +            .name = "iops",
> +            .type = QEMU_OPT_NUMBER,
> +            .help = "limit total I/O operations per second",
> +        },{
> +            .name = "iops_rd",
> +            .type = QEMU_OPT_NUMBER,
> +            .help = "limit read operations per second",
> +        },{
> +            .name = "iops_wr",
> +            .type = QEMU_OPT_NUMBER,
> +            .help = "limit write operations per second",
> +        },{
> +            .name = "bps",
> +            .type = QEMU_OPT_NUMBER,
> +            .help = "limit total bytes per second",
> +        },{
> +            .name = "bps_rd",
> +            .type = QEMU_OPT_NUMBER,
> +            .help = "limit read bytes per second",
> +        },{
> +            .name = "bps_wr",
> +            .type = QEMU_OPT_NUMBER,
> +            .help = "limit write bytes per second",
>          },
>          { /* end of list */ }
>      },
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 681eaf1..25a7be7 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -136,6 +136,7 @@ DEF("drive", HAS_ARG, QEMU_OPTION_drive,
>      "       [,cache=writethrough|writeback|none|directsync|unsafe][,format=f]\n"
>      "       [,serial=s][,addr=A][,id=name][,aio=threads|native]\n"
>      "       [,readonly=on|off]\n"
> +    "       [[,bps=b]|[[,bps_rd=r][,bps_wr=w]]][[,iops=i]|[[,iops_rd=r][,iops_wr=w]]\n"
>      "                use 'file' as a drive image\n", QEMU_ARCH_ALL)
>  STEXI
>  @item -drive @var{option}[,@var{option}[,@var{option}[,...]]]
> -- 
> 1.7.6
> 

-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ryanh@us.ibm.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH v10 1/3] block: add the command line support
  2011-11-01 22:43   ` Ryan Harper
@ 2011-11-02  3:44     ` Zhi Yong Wu
  0 siblings, 0 replies; 9+ messages in thread
From: Zhi Yong Wu @ 2011-11-02  3:44 UTC (permalink / raw)
  To: Ryan Harper; +Cc: kwolf, Zhi Yong Wu, qemu-devel, stefanha

On Wed, Nov 2, 2011 at 6:43 AM, Ryan Harper <ryanh@us.ibm.com> wrote:
> * Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> [2011-11-01 02:44]:
>> Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
>> ---
>>  block.c         |   40 ++++++++++++++++++++++++++++++++++++++++
>>  block.h         |    4 ++++
>>  block_int.h     |   29 +++++++++++++++++++++++++++++
>>  blockdev.c      |   32 ++++++++++++++++++++++++++++++++
>>  qemu-config.c   |   24 ++++++++++++++++++++++++
>>  qemu-options.hx |    1 +
>>  6 files changed, 130 insertions(+), 0 deletions(-)
>
> I suggest changing the Subject of the patch to include something about
> blkio limits.  This will in searching git log later that this patch was
> introducing blkio command line support.
Good suggestion
>
>>
>> diff --git a/block.c b/block.c
>> index 9bb236c..8f08dc5 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -30,6 +30,7 @@
>>  #include "qjson.h"
>>  #include "qemu-coroutine.h"
>>  #include "qmp-commands.h"
>> +#include "qemu-timer.h"
>>
>>  #ifdef CONFIG_BSD
>>  #include <sys/types.h>
>> @@ -105,6 +106,37 @@ int is_windows_drive(const char *filename)
>>  }
>>  #endif
>>
>> +/* throttling disk I/O limits */
>> +static void bdrv_block_timer(void *opaque)
>> +{
>> +    BlockDriverState *bs = opaque;
>> +
>> +    qemu_co_queue_next(&bs->throttled_reqs);
>> +}
>> +
>> +void bdrv_io_limits_enable(BlockDriverState *bs)
>> +{
>> +    bs->io_limits_enabled = true;
>
> We may want to move this to the end of the function so we don't raise
> the flag that we're enabled until we're done setting everything up.
> I don't think we'll race right now, but no reason not to move this to
> the end.
OK.
>
>> +    qemu_co_queue_init(&bs->throttled_reqs);
>> +
>> +    bs->block_timer   = qemu_new_timer_ns(vm_clock, bdrv_block_timer, bs);
>> +    bs->slice_time    = 5 * BLOCK_IO_SLICE_TIME;
>> +    bs->slice_start   = qemu_get_clock_ns(vm_clock);
>> +    bs->slice_end     = bs->slice_start + bs->slice_time;
>> +    memset(&bs->io_disps, 0, sizeof(bs->io_disps));
>> +}
>> +
>> +bool bdrv_io_limits_enabled(BlockDriverState *bs)
>> +{
>> +    BlockIOLimit *io_limits = &bs->io_limits;
>> +    return io_limits->bps[BLOCK_IO_LIMIT_READ]
>> +         || io_limits->bps[BLOCK_IO_LIMIT_WRITE]
>> +         || io_limits->bps[BLOCK_IO_LIMIT_TOTAL]
>> +         || io_limits->iops[BLOCK_IO_LIMIT_READ]
>> +         || io_limits->iops[BLOCK_IO_LIMIT_WRITE]
>> +         || io_limits->iops[BLOCK_IO_LIMIT_TOTAL];
>> +}
>> +
>>  /* check if the path starts with "<protocol>:" */
>>  static int path_has_protocol(const char *path)
>>  {
>> @@ -1519,6 +1551,14 @@ void bdrv_get_geometry_hint(BlockDriverState *bs,
>>      *psecs = bs->secs;
>>  }
>>
>> +/* throttling disk io limits */
>> +void bdrv_set_io_limits(BlockDriverState *bs,
>> +                            BlockIOLimit *io_limits)
>> +{
>> +    bs->io_limits = *io_limits;
>> +    bs->io_limits_enabled = bdrv_io_limits_enabled(bs);
>> +}
>> +
>>  /* Recognize floppy formats */
>>  typedef struct FDFormat {
>>      FDriveType drive;
>> diff --git a/block.h b/block.h
>> index 38cd748..bc8315d 100644
>> --- a/block.h
>> +++ b/block.h
>> @@ -89,6 +89,10 @@ void bdrv_info(Monitor *mon, QObject **ret_data);
>>  void bdrv_stats_print(Monitor *mon, const QObject *data);
>>  void bdrv_info_stats(Monitor *mon, QObject **ret_data);
>>
>> +/* disk I/O throttling */
>> +void bdrv_io_limits_enable(BlockDriverState *bs);
>> +bool bdrv_io_limits_enabled(BlockDriverState *bs);
>> +
>>  void bdrv_init(void);
>>  void bdrv_init_with_whitelist(void);
>>  BlockDriver *bdrv_find_protocol(const char *filename);
>> diff --git a/block_int.h b/block_int.h
>> index f4547f6..b835ef6 100644
>> --- a/block_int.h
>> +++ b/block_int.h
>> @@ -34,6 +34,12 @@
>>  #define BLOCK_FLAG_ENCRYPT   1
>>  #define BLOCK_FLAG_COMPAT6   4
>>
>> +#define BLOCK_IO_LIMIT_READ     0
>> +#define BLOCK_IO_LIMIT_WRITE    1
>> +#define BLOCK_IO_LIMIT_TOTAL    2
>> +
>> +#define BLOCK_IO_SLICE_TIME     100000000
>> +
>>  #define BLOCK_OPT_SIZE          "size"
>>  #define BLOCK_OPT_ENCRYPT       "encryption"
>>  #define BLOCK_OPT_COMPAT6       "compat6"
>> @@ -50,6 +56,16 @@ typedef struct AIOPool {
>>      BlockDriverAIOCB *free_aiocb;
>>  } AIOPool;
>>
>> +typedef struct BlockIOLimit {
>> +    uint64_t bps[3];
>> +    uint64_t iops[3];
>> +} BlockIOLimit;
>> +
>> +typedef struct BlockIODisp {
>> +    uint64_t bytes[2];
>> +    uint64_t ios[2];
>> +} BlockIODisp;
>> +
>>  struct BlockDriver {
>>      const char *format_name;
>>      int instance_size;
>> @@ -184,6 +200,16 @@ struct BlockDriverState {
>>
>>      void *sync_aiocb;
>>
>> +    /* the time for latest disk I/O */
>> +    int64_t slice_time;
>> +    int64_t slice_start;
>> +    int64_t slice_end;
>> +    BlockIOLimit io_limits;
>> +    BlockIODisp  io_disps;
>> +    CoQueue      throttled_reqs;
>> +    QEMUTimer    *block_timer;
>> +    bool         io_limits_enabled;
>> +
>>      /* I/O stats (display with "info blockstats"). */
>>      uint64_t nr_bytes[BDRV_MAX_IOTYPE];
>>      uint64_t nr_ops[BDRV_MAX_IOTYPE];
>> @@ -227,6 +253,9 @@ void *qemu_aio_get(AIOPool *pool, BlockDriverState *bs,
>>                     BlockDriverCompletionFunc *cb, void *opaque);
>>  void qemu_aio_release(void *p);
>>
>> +void bdrv_set_io_limits(BlockDriverState *bs,
>> +                        BlockIOLimit *io_limits);
>> +
>>  #ifdef _WIN32
>>  int is_windows_drive(const char *filename);
>>  #endif
>> diff --git a/blockdev.c b/blockdev.c
>> index 0827bf7..faf8c56 100644
>> --- a/blockdev.c
>> +++ b/blockdev.c
>> @@ -235,6 +235,9 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
>>      int on_read_error, on_write_error;
>>      const char *devaddr;
>>      DriveInfo *dinfo;
>> +    BlockIOLimit io_limits;
>> +    bool bps_iol;
>> +    bool iops_iol;
>>      int snapshot = 0;
>>      int ret;
>>
>> @@ -353,6 +356,32 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
>>          }
>>      }
>>
>> +    /* disk I/O throttling */
>> +    io_limits.bps[BLOCK_IO_LIMIT_TOTAL]  =
>> +                           qemu_opt_get_number(opts, "bps", 0);
>> +    io_limits.bps[BLOCK_IO_LIMIT_READ]   =
>> +                           qemu_opt_get_number(opts, "bps_rd", 0);
>> +    io_limits.bps[BLOCK_IO_LIMIT_WRITE]  =
>> +                           qemu_opt_get_number(opts, "bps_wr", 0);
>> +    io_limits.iops[BLOCK_IO_LIMIT_TOTAL] =
>> +                           qemu_opt_get_number(opts, "iops", 0);
>> +    io_limits.iops[BLOCK_IO_LIMIT_READ]  =
>> +                           qemu_opt_get_number(opts, "iops_rd", 0);
>> +    io_limits.iops[BLOCK_IO_LIMIT_WRITE] =
>> +                           qemu_opt_get_number(opts, "iops_wr", 0);
>> +
>> +    bps_iol  = (io_limits.bps[BLOCK_IO_LIMIT_TOTAL] != 0)
>> +                 && ((io_limits.bps[BLOCK_IO_LIMIT_READ] != 0)
>> +                 || (io_limits.bps[BLOCK_IO_LIMIT_WRITE] != 0));
>> +    iops_iol = (io_limits.iops[BLOCK_IO_LIMIT_TOTAL] != 0)
>> +                 && ((io_limits.iops[BLOCK_IO_LIMIT_READ] != 0)
>> +                 || (io_limits.iops[BLOCK_IO_LIMIT_WRITE] != 0));
>> +    if (bps_iol || iops_iol) {
>> +        error_report("bps(iops) and bps_rd/bps_wr(iops_rd/iops_wr)"
>> +                     "cannot be used at the same time");
>> +        return NULL;
>> +    }
>> +
>
>
> How about extracting this section into the bdrv_set_io_limits() function or some
> other common setting function.  This should be the same code that's used when
Good idea.
> modifying/setting these values via the monitor as well (minus how you obtain the
> input values - option parsings vs. monitor values).
For monitor, they are obtained via the way below:
In blockdev.c:do_block_set_io_throttle(),
    int64_t bps        = qdict_get_try_int(qdict, "bps", -1);
    int64_t bps_rd     = qdict_get_try_int(qdict, "bps_rd", -1);
    int64_t bps_wr     = qdict_get_try_int(qdict, "bps_wr", -1);
    int64_t iops       = qdict_get_try_int(qdict, "iops", -1);
    int64_t iops_rd    = qdict_get_try_int(qdict, "iops_rd", -1);
    int64_t iops_wr    = qdict_get_try_int(qdict, "iops_wr", -1);
When they are not specified, the handling method is a bit different.

Actually they are also limited when these values are modified.
    if ((bps != 0 && (bps_rd != 0  || bps_wr != 0))
        || (iops != 0 && (iops_rd != 0 || iops_wr != 0))) {
        qerror_report(QERR_INVALID_PARAMETER_COMBINATION);
        return -1;
    }

But i also think that it is one good idea to put them in one common function.

>
>>      on_write_error = BLOCK_ERR_STOP_ENOSPC;
>>      if ((buf = qemu_opt_get(opts, "werror")) != NULL) {
>>          if (type != IF_IDE && type != IF_SCSI && type != IF_VIRTIO && type != IF_NONE) {
>> @@ -460,6 +489,9 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
>>
>>      bdrv_set_on_error(dinfo->bdrv, on_read_error, on_write_error);
>>
>> +    /* disk I/O throttling */
>> +    bdrv_set_io_limits(dinfo->bdrv, &io_limits);
>> +
>>      switch(type) {
>>      case IF_IDE:
>>      case IF_SCSI:
>> diff --git a/qemu-config.c b/qemu-config.c
>> index 597d7e1..1aa080f 100644
>> --- a/qemu-config.c
>> +++ b/qemu-config.c
>> @@ -85,6 +85,30 @@ static QemuOptsList qemu_drive_opts = {
>>              .name = "readonly",
>>              .type = QEMU_OPT_BOOL,
>>              .help = "open drive file as read-only",
>> +        },{
>> +            .name = "iops",
>> +            .type = QEMU_OPT_NUMBER,
>> +            .help = "limit total I/O operations per second",
>> +        },{
>> +            .name = "iops_rd",
>> +            .type = QEMU_OPT_NUMBER,
>> +            .help = "limit read operations per second",
>> +        },{
>> +            .name = "iops_wr",
>> +            .type = QEMU_OPT_NUMBER,
>> +            .help = "limit write operations per second",
>> +        },{
>> +            .name = "bps",
>> +            .type = QEMU_OPT_NUMBER,
>> +            .help = "limit total bytes per second",
>> +        },{
>> +            .name = "bps_rd",
>> +            .type = QEMU_OPT_NUMBER,
>> +            .help = "limit read bytes per second",
>> +        },{
>> +            .name = "bps_wr",
>> +            .type = QEMU_OPT_NUMBER,
>> +            .help = "limit write bytes per second",
>>          },
>>          { /* end of list */ }
>>      },
>> diff --git a/qemu-options.hx b/qemu-options.hx
>> index 681eaf1..25a7be7 100644
>> --- a/qemu-options.hx
>> +++ b/qemu-options.hx
>> @@ -136,6 +136,7 @@ DEF("drive", HAS_ARG, QEMU_OPTION_drive,
>>      "       [,cache=writethrough|writeback|none|directsync|unsafe][,format=f]\n"
>>      "       [,serial=s][,addr=A][,id=name][,aio=threads|native]\n"
>>      "       [,readonly=on|off]\n"
>> +    "       [[,bps=b]|[[,bps_rd=r][,bps_wr=w]]][[,iops=i]|[[,iops_rd=r][,iops_wr=w]]\n"
>>      "                use 'file' as a drive image\n", QEMU_ARCH_ALL)
>>  STEXI
>>  @item -drive @var{option}[,@var{option}[,@var{option}[,...]]]
>> --
>> 1.7.6
>>
>
> --
> Ryan Harper
> Software Engineer; Linux Technology Center
> IBM Corp., Austin, Tx
> ryanh@us.ibm.com
>
>
>



-- 
Regards,

Zhi Yong Wu

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Qemu-devel] [PATCH v10 2/3] block: add I/O throttling algorithm
  2011-11-01  7:40 [Qemu-devel] [PATCH v10 0/3] The intro of QEMU block I/O throttling Zhi Yong Wu
  2011-11-01  7:40 ` [Qemu-devel] [PATCH v10 1/3] block: add the command line support Zhi Yong Wu
@ 2011-11-01  7:40 ` Zhi Yong Wu
  2011-11-01 11:33   ` Stefan Hajnoczi
  2011-11-01  7:40 ` [Qemu-devel] [PATCH v10 3/3] hmp/qmp: add block_set_io_throttle Zhi Yong Wu
  2 siblings, 1 reply; 9+ messages in thread
From: Zhi Yong Wu @ 2011-11-01  7:40 UTC (permalink / raw)
  To: kwolf; +Cc: zwu.kernel, wuzhy, qemu-devel, stefanha

Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
---
 block.c               |  228 +++++++++++++++++++++++++++++++++++++++++++++++++
 block.h               |    1 +
 block_int.h           |    1 +
 qemu-coroutine-lock.c |    8 ++
 qemu-coroutine.h      |    6 ++
 5 files changed, 244 insertions(+), 0 deletions(-)

diff --git a/block.c b/block.c
index 8f08dc5..cb89372 100644
--- a/block.c
+++ b/block.c
@@ -74,6 +74,13 @@ static BlockDriverAIOCB *bdrv_co_aio_rw_vector(BlockDriverState *bs,
                                                bool is_write);
 static void coroutine_fn bdrv_co_do_rw(void *opaque);
 
+static bool bdrv_exceed_bps_limits(BlockDriverState *bs, int nb_sectors,
+        bool is_write, double elapsed_time, uint64_t *wait);
+static bool bdrv_exceed_iops_limits(BlockDriverState *bs, bool is_write,
+        double elapsed_time, uint64_t *wait);
+static bool bdrv_exceed_io_limits(BlockDriverState *bs, int nb_sectors,
+        bool is_write, int64_t *wait);
+
 static QTAILQ_HEAD(, BlockDriverState) bdrv_states =
     QTAILQ_HEAD_INITIALIZER(bdrv_states);
 
@@ -107,6 +114,28 @@ int is_windows_drive(const char *filename)
 #endif
 
 /* throttling disk I/O limits */
+void bdrv_io_limits_disable(BlockDriverState *bs)
+{
+    bs->io_limits_enabled = false;
+
+    if (!qemu_co_queue_empty(&bs->throttled_reqs)) {
+        while (qemu_co_queue_next(&bs->throttled_reqs));
+    }
+
+    qemu_co_queue_init(&bs->throttled_reqs);
+
+    if (bs->block_timer) {
+        qemu_del_timer(bs->block_timer);
+        qemu_free_timer(bs->block_timer);
+        bs->block_timer = NULL;
+    }
+
+    bs->slice_start = 0;
+    bs->slice_end   = 0;
+    bs->slice_time  = 0;
+    memset(&bs->io_disps, 0, sizeof(bs->io_disps));
+}
+
 static void bdrv_block_timer(void *opaque)
 {
     BlockDriverState *bs = opaque;
@@ -137,6 +166,33 @@ bool bdrv_io_limits_enabled(BlockDriverState *bs)
          || io_limits->iops[BLOCK_IO_LIMIT_TOTAL];
 }
 
+static void bdrv_io_limits_intercept(BlockDriverState *bs,
+                                     int nb_sectors)
+{
+    int64_t wait_time = -1;
+
+    if (!qemu_co_queue_empty(&bs->throttled_reqs)) {
+        qemu_co_queue_wait(&bs->throttled_reqs);
+        goto resume;
+    } else if (bdrv_exceed_io_limits(bs, nb_sectors, false, &wait_time)) {
+        if (wait_time != -1) {
+            qemu_mod_timer(bs->block_timer,
+                           wait_time + qemu_get_clock_ns(vm_clock));
+        }
+
+        qemu_co_queue_wait(&bs->throttled_reqs);
+
+resume:
+        while (bdrv_exceed_io_limits(bs, nb_sectors, false, &wait_time)) {
+            qemu_mod_timer(bs->block_timer,
+                           wait_time + qemu_get_clock_ns(vm_clock));
+            qemu_co_queue_wait_insert_head(&bs->throttled_reqs);
+        }
+
+        qemu_co_queue_next(&bs->throttled_reqs);
+    }
+}
+
 /* check if the path starts with "<protocol>:" */
 static int path_has_protocol(const char *path)
 {
@@ -719,6 +775,11 @@ int bdrv_open(BlockDriverState *bs, const char *filename, int flags,
         bdrv_dev_change_media_cb(bs, true);
     }
 
+    /* throttling disk I/O limits */
+    if (bs->io_limits_enabled) {
+        bdrv_io_limits_enable(bs);
+    }
+
     return 0;
 
 unlink_and_fail:
@@ -754,6 +815,9 @@ void bdrv_close(BlockDriverState *bs)
 
         bdrv_dev_change_media_cb(bs, false);
     }
+
+    /*throttling disk I/O limits*/
+    bdrv_io_limits_disable(bs);
 }
 
 void bdrv_close_all(void)
@@ -1292,6 +1356,11 @@ static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs,
         return -EIO;
     }
 
+    /* throttling disk read I/O */
+    if (bs->io_limits_enabled) {
+        bdrv_io_limits_intercept(bs, nb_sectors);
+    }
+
     return drv->bdrv_co_readv(bs, sector_num, nb_sectors, qiov);
 }
 
@@ -1322,6 +1391,11 @@ static int coroutine_fn bdrv_co_do_writev(BlockDriverState *bs,
         return -EIO;
     }
 
+    /* throttling disk write I/O */
+    if (bs->io_limits_enabled) {
+        bdrv_io_limits_intercept(bs, nb_sectors);
+    }
+
     ret = drv->bdrv_co_writev(bs, sector_num, nb_sectors, qiov);
 
     if (bs->dirty_bitmap) {
@@ -2513,6 +2587,160 @@ void bdrv_aio_cancel(BlockDriverAIOCB *acb)
     acb->pool->cancel(acb);
 }
 
+/* block I/O throttling */
+static bool bdrv_exceed_bps_limits(BlockDriverState *bs, int nb_sectors,
+                 bool is_write, double elapsed_time, uint64_t *wait) {
+    uint64_t bps_limit = 0;
+    double   bytes_limit, bytes_disp, bytes_res;
+    double   slice_time, wait_time;
+
+    if (bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL]) {
+        bps_limit = bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL];
+    } else if (bs->io_limits.bps[is_write]) {
+        bps_limit = bs->io_limits.bps[is_write];
+    } else {
+        if (wait) {
+            *wait = 0;
+        }
+
+        return false;
+    }
+
+    slice_time = bs->slice_end - bs->slice_start;
+    slice_time /= (NANOSECONDS_PER_SECOND);
+    bytes_limit = bps_limit * slice_time;
+    bytes_disp  = bs->nr_bytes[is_write] - bs->io_disps.bytes[is_write];
+    if (bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL]) {
+        bytes_disp += bs->nr_bytes[!is_write] - bs->io_disps.bytes[!is_write];
+    }
+
+    bytes_res   = (unsigned) nb_sectors * BDRV_SECTOR_SIZE;
+
+    if (bytes_disp + bytes_res <= bytes_limit) {
+        if (wait) {
+            *wait = 0;
+        }
+
+        return false;
+    }
+
+    /* Calc approx time to dispatch */
+    wait_time = (bytes_disp + bytes_res) / bps_limit - elapsed_time;
+
+    bs->slice_time = wait_time * BLOCK_IO_SLICE_TIME * 10;
+    bs->slice_end += bs->slice_time - 3 * BLOCK_IO_SLICE_TIME;
+    if (wait) {
+        *wait = wait_time * BLOCK_IO_SLICE_TIME * 10;
+    }
+
+    return true;
+}
+
+static bool bdrv_exceed_iops_limits(BlockDriverState *bs, bool is_write,
+                             double elapsed_time, uint64_t *wait) {
+    uint64_t iops_limit = 0;
+    double   ios_limit, ios_disp;
+    double   slice_time, wait_time;
+
+    if (bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL]) {
+        iops_limit = bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL];
+    } else if (bs->io_limits.iops[is_write]) {
+        iops_limit = bs->io_limits.iops[is_write];
+    } else {
+        if (wait) {
+            *wait = 0;
+        }
+
+        return false;
+    }
+
+    slice_time = bs->slice_end - bs->slice_start;
+    slice_time /= (NANOSECONDS_PER_SECOND);
+    ios_limit  = iops_limit * slice_time;
+    ios_disp   = bs->nr_ops[is_write] - bs->io_disps.ios[is_write];
+    if (bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL]) {
+        ios_disp += bs->nr_ops[!is_write] - bs->io_disps.ios[!is_write];
+    }
+
+    if (ios_disp + 1 <= ios_limit) {
+        if (wait) {
+            *wait = 0;
+        }
+
+        return false;
+    }
+
+    /* Calc approx time to dispatch */
+    wait_time = (ios_disp + 1) / iops_limit;
+    if (wait_time > elapsed_time) {
+        wait_time = wait_time - elapsed_time;
+    } else {
+        wait_time = 0;
+    }
+
+    bs->slice_time = wait_time * BLOCK_IO_SLICE_TIME * 10;
+    bs->slice_end += bs->slice_time - 3 * BLOCK_IO_SLICE_TIME;
+    if (wait) {
+        *wait = wait_time * BLOCK_IO_SLICE_TIME * 10;
+    }
+
+    return true;
+}
+
+static bool bdrv_exceed_io_limits(BlockDriverState *bs, int nb_sectors,
+                           bool is_write, int64_t *wait) {
+    int64_t  now, max_wait;
+    uint64_t bps_wait = 0, iops_wait = 0;
+    double   elapsed_time;
+    int      bps_ret, iops_ret;
+
+    now = qemu_get_clock_ns(vm_clock);
+    if ((bs->slice_start < now)
+        && (bs->slice_end > now)) {
+        bs->slice_end = now + bs->slice_time;
+    } else {
+        bs->slice_time  =  5 * BLOCK_IO_SLICE_TIME;
+        bs->slice_start = now;
+        bs->slice_end   = now + bs->slice_time;
+
+        bs->io_disps.bytes[is_write]  = bs->nr_bytes[is_write];
+        bs->io_disps.bytes[!is_write] = bs->nr_bytes[!is_write];
+
+        bs->io_disps.ios[is_write]    = bs->nr_ops[is_write];
+        bs->io_disps.ios[!is_write]   = bs->nr_ops[!is_write];
+    }
+
+    elapsed_time  = now - bs->slice_start;
+    elapsed_time  /= (NANOSECONDS_PER_SECOND);
+
+    bps_ret  = bdrv_exceed_bps_limits(bs, nb_sectors,
+                                      is_write, elapsed_time, &bps_wait);
+    iops_ret = bdrv_exceed_iops_limits(bs, is_write,
+                                      elapsed_time, &iops_wait);
+    if (bps_ret || iops_ret) {
+        max_wait = bps_wait > iops_wait ? bps_wait : iops_wait;
+        if (!qemu_co_queue_empty(&bs->throttled_reqs)) {
+            max_wait = -1;
+        }
+
+        if (wait) {
+            *wait = max_wait;
+        }
+
+        now = qemu_get_clock_ns(vm_clock);
+        if (bs->slice_end < now + max_wait) {
+            bs->slice_end = now + max_wait;
+        }
+
+        return true;
+    }
+
+    if (wait) {
+        *wait = 0;
+    }
+
+    return false;
+}
 
 /**************************************************************/
 /* async block device emulation */
diff --git a/block.h b/block.h
index bc8315d..9b5b35f 100644
--- a/block.h
+++ b/block.h
@@ -91,6 +91,7 @@ void bdrv_info_stats(Monitor *mon, QObject **ret_data);
 
 /* disk I/O throttling */
 void bdrv_io_limits_enable(BlockDriverState *bs);
+void bdrv_io_limits_disable(BlockDriverState *bs);
 bool bdrv_io_limits_enabled(BlockDriverState *bs);
 
 void bdrv_init(void);
diff --git a/block_int.h b/block_int.h
index b835ef6..fc53f5b 100644
--- a/block_int.h
+++ b/block_int.h
@@ -39,6 +39,7 @@
 #define BLOCK_IO_LIMIT_TOTAL    2
 
 #define BLOCK_IO_SLICE_TIME     100000000
+#define NANOSECONDS_PER_SECOND  1000000000.0
 
 #define BLOCK_OPT_SIZE          "size"
 #define BLOCK_OPT_ENCRYPT       "encryption"
diff --git a/qemu-coroutine-lock.c b/qemu-coroutine-lock.c
index 6b58160..9549c07 100644
--- a/qemu-coroutine-lock.c
+++ b/qemu-coroutine-lock.c
@@ -61,6 +61,14 @@ void coroutine_fn qemu_co_queue_wait(CoQueue *queue)
     assert(qemu_in_coroutine());
 }
 
+void coroutine_fn qemu_co_queue_wait_insert_head(CoQueue *queue)
+{
+    Coroutine *self = qemu_coroutine_self();
+    QTAILQ_INSERT_HEAD(&queue->entries, self, co_queue_next);
+    qemu_coroutine_yield();
+    assert(qemu_in_coroutine());
+}
+
 bool qemu_co_queue_next(CoQueue *queue)
 {
     Coroutine *next;
diff --git a/qemu-coroutine.h b/qemu-coroutine.h
index b8fc4f4..8a2e5d2 100644
--- a/qemu-coroutine.h
+++ b/qemu-coroutine.h
@@ -118,6 +118,12 @@ void qemu_co_queue_init(CoQueue *queue);
 void coroutine_fn qemu_co_queue_wait(CoQueue *queue);
 
 /**
+ * Adds the current coroutine to the head of the CoQueue and transfers control to the
+ * caller of the coroutine.
+ */
+void coroutine_fn qemu_co_queue_wait_insert_head(CoQueue *queue);
+
+/**
  * Restarts the next coroutine in the CoQueue and removes it from the queue.
  *
  * Returns true if a coroutine was restarted, false if the queue is empty.
-- 
1.7.6

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH v10 2/3] block: add I/O throttling algorithm
  2011-11-01  7:40 ` [Qemu-devel] [PATCH v10 2/3] block: add I/O throttling algorithm Zhi Yong Wu
@ 2011-11-01 11:33   ` Stefan Hajnoczi
  2011-11-01 11:40     ` Zhi Yong Wu
  0 siblings, 1 reply; 9+ messages in thread
From: Stefan Hajnoczi @ 2011-11-01 11:33 UTC (permalink / raw)
  To: Zhi Yong Wu; +Cc: kwolf, zwu.kernel, qemu-devel, stefanha

On Tue, Nov 1, 2011 at 7:40 AM, Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> wrote:
> +static void bdrv_io_limits_intercept(BlockDriverState *bs,
> +                                     int nb_sectors)
> +{
> +    int64_t wait_time = -1;
> +
> +    if (!qemu_co_queue_empty(&bs->throttled_reqs)) {
> +        qemu_co_queue_wait(&bs->throttled_reqs);
> +        goto resume;
> +    } else if (bdrv_exceed_io_limits(bs, nb_sectors, false, &wait_time)) {
> +        if (wait_time != -1) {
> +            qemu_mod_timer(bs->block_timer,
> +                           wait_time + qemu_get_clock_ns(vm_clock));
> +        }
> +
> +        qemu_co_queue_wait(&bs->throttled_reqs);
> +
> +resume:
> +        while (bdrv_exceed_io_limits(bs, nb_sectors, false, &wait_time)) {

is_write needs to be passed in to bdrv_exceed_io_limits().  Currently
this accounts every I/O as a read.

> +            qemu_mod_timer(bs->block_timer,
> +                           wait_time + qemu_get_clock_ns(vm_clock));

Do you need if (wait_time != -1) here?

Stefan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH v10 2/3] block: add I/O throttling algorithm
  2011-11-01 11:33   ` Stefan Hajnoczi
@ 2011-11-01 11:40     ` Zhi Yong Wu
  0 siblings, 0 replies; 9+ messages in thread
From: Zhi Yong Wu @ 2011-11-01 11:40 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: kwolf, Zhi Yong Wu, qemu-devel, stefanha

On Tue, Nov 1, 2011 at 7:33 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> On Tue, Nov 1, 2011 at 7:40 AM, Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> wrote:
>> +static void bdrv_io_limits_intercept(BlockDriverState *bs,
>> +                                     int nb_sectors)
>> +{
>> +    int64_t wait_time = -1;
>> +
>> +    if (!qemu_co_queue_empty(&bs->throttled_reqs)) {
>> +        qemu_co_queue_wait(&bs->throttled_reqs);
>> +        goto resume;
>> +    } else if (bdrv_exceed_io_limits(bs, nb_sectors, false, &wait_time)) {
>> +        if (wait_time != -1) {
>> +            qemu_mod_timer(bs->block_timer,
>> +                           wait_time + qemu_get_clock_ns(vm_clock));
>> +        }
>> +
>> +        qemu_co_queue_wait(&bs->throttled_reqs);
>> +
>> +resume:
>> +        while (bdrv_exceed_io_limits(bs, nb_sectors, false, &wait_time)) {
>
> is_write needs to be passed in to bdrv_exceed_io_limits().  Currently
> this accounts every I/O as a read.
Sorry, It is one stupid error.

>
>> +            qemu_mod_timer(bs->block_timer,
>> +                           wait_time + qemu_get_clock_ns(vm_clock));
>
> Do you need if (wait_time != -1) here?
Actually i think that we can drop the condition in our code.

>
> Stefan
>



-- 
Regards,

Zhi Yong Wu

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Qemu-devel] [PATCH v10 3/3] hmp/qmp: add block_set_io_throttle
  2011-11-01  7:40 [Qemu-devel] [PATCH v10 0/3] The intro of QEMU block I/O throttling Zhi Yong Wu
  2011-11-01  7:40 ` [Qemu-devel] [PATCH v10 1/3] block: add the command line support Zhi Yong Wu
  2011-11-01  7:40 ` [Qemu-devel] [PATCH v10 2/3] block: add I/O throttling algorithm Zhi Yong Wu
@ 2011-11-01  7:40 ` Zhi Yong Wu
  2 siblings, 0 replies; 9+ messages in thread
From: Zhi Yong Wu @ 2011-11-01  7:40 UTC (permalink / raw)
  To: kwolf; +Cc: zwu.kernel, wuzhy, qemu-devel, stefanha

Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
---
 block.c          |   15 +++++++++++++++
 blockdev.c       |   51 +++++++++++++++++++++++++++++++++++++++++++++++++++
 blockdev.h       |    2 ++
 hmp-commands.hx  |   15 +++++++++++++++
 hmp.c            |   10 ++++++++++
 qapi-schema.json |   16 +++++++++++++++-
 qerror.c         |    4 ++++
 qerror.h         |    3 +++
 qmp-commands.hx  |   53 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
 9 files changed, 167 insertions(+), 2 deletions(-)

diff --git a/block.c b/block.c
index cb89372..7d856c2 100644
--- a/block.c
+++ b/block.c
@@ -1976,6 +1976,21 @@ BlockInfoList *qmp_query_block(Error **errp)
                 info->value->inserted->has_backing_file = true;
                 info->value->inserted->backing_file = g_strdup(bs->backing_file);
             }
+
+            if (bs->io_limits_enabled) {
+                info->value->inserted->bps =
+                               bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL];
+                info->value->inserted->bps_rd =
+                               bs->io_limits.bps[BLOCK_IO_LIMIT_READ];
+                info->value->inserted->bps_wr =
+                               bs->io_limits.bps[BLOCK_IO_LIMIT_WRITE];
+                info->value->inserted->iops =
+                               bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL];
+                info->value->inserted->iops_rd =
+                               bs->io_limits.iops[BLOCK_IO_LIMIT_READ];
+                info->value->inserted->iops_wr =
+                               bs->io_limits.iops[BLOCK_IO_LIMIT_WRITE];
+            }
         }
 
         /* XXX: waiting for the qapi to support GSList */
diff --git a/blockdev.c b/blockdev.c
index faf8c56..9eed973 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -745,6 +745,57 @@ int do_change_block(Monitor *mon, const char *device,
     return monitor_read_bdrv_key_start(mon, bs, NULL, NULL);
 }
 
+/* throttling disk I/O limits */
+int do_block_set_io_throttle(Monitor *mon,
+                       const QDict *qdict, QObject **ret_data)
+{
+    const char *devname = qdict_get_str(qdict, "device");
+    int64_t bps        = qdict_get_try_int(qdict, "bps", -1);
+    int64_t bps_rd     = qdict_get_try_int(qdict, "bps_rd", -1);
+    int64_t bps_wr     = qdict_get_try_int(qdict, "bps_wr", -1);
+    int64_t iops       = qdict_get_try_int(qdict, "iops", -1);
+    int64_t iops_rd    = qdict_get_try_int(qdict, "iops_rd", -1);
+    int64_t iops_wr    = qdict_get_try_int(qdict, "iops_wr", -1);
+    BlockDriverState *bs;
+
+    bs = bdrv_find(devname);
+    if (!bs) {
+        qerror_report(QERR_DEVICE_NOT_FOUND, devname);
+        return -1;
+    }
+
+    if ((bps == -1) || (bps_rd == -1) || (bps_wr == -1)
+        || (iops == -1) || (iops_rd == -1) || (iops_wr == -1)) {
+        qerror_report(QERR_MISSING_PARAMETER,
+                      "bps/bps_rd/bps_wr/iops/iops_rd/iops_wr");
+        return -1;
+    }
+
+    if ((bps != 0 && (bps_rd != 0  || bps_wr != 0))
+        || (iops != 0 && (iops_rd != 0 || iops_wr != 0))) {
+        qerror_report(QERR_INVALID_PARAMETER_COMBINATION);
+        return -1;
+    }
+
+    bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL] = bps;
+    bs->io_limits.bps[BLOCK_IO_LIMIT_READ]  = bps_rd;
+    bs->io_limits.bps[BLOCK_IO_LIMIT_WRITE] = bps_wr;
+    bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL] = iops;
+    bs->io_limits.iops[BLOCK_IO_LIMIT_READ]  = iops_rd;
+    bs->io_limits.iops[BLOCK_IO_LIMIT_WRITE] = iops_wr;
+    bs->slice_time = BLOCK_IO_SLICE_TIME;
+
+    if (!bs->io_limits_enabled && bdrv_io_limits_enabled(bs)) {
+        bdrv_io_limits_enable(bs);
+    } else if (bs->io_limits_enabled && !bdrv_io_limits_enabled(bs)) {
+        bdrv_io_limits_disable(bs);
+    } else {
+        qemu_mod_timer(bs->block_timer, qemu_get_clock_ns(vm_clock));
+    }
+
+    return 0;
+}
+
 int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data)
 {
     const char *id = qdict_get_str(qdict, "id");
diff --git a/blockdev.h b/blockdev.h
index 3587786..1b48a75 100644
--- a/blockdev.h
+++ b/blockdev.h
@@ -63,6 +63,8 @@ int do_block_set_passwd(Monitor *mon, const QDict *qdict, QObject **ret_data);
 int do_change_block(Monitor *mon, const char *device,
                     const char *filename, const char *fmt);
 int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data);
+int do_block_set_io_throttle(Monitor *mon,
+                             const QDict *qdict, QObject **ret_data);
 int do_snapshot_blkdev(Monitor *mon, const QDict *qdict, QObject **ret_data);
 int do_block_resize(Monitor *mon, const QDict *qdict, QObject **ret_data);
 
diff --git a/hmp-commands.hx b/hmp-commands.hx
index 089c1ac..48f3c21 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1207,6 +1207,21 @@ ETEXI
     },
 
 STEXI
+@item block_set_io_throttle @var{device} @var{bps} @var{bps_rd} @var{bps_wr} @var{iops} @var{iops_rd} @var{iops_wr}
+@findex block_set_io_throttle
+Change I/O throttle limits for a block drive to @var{bps} @var{bps_rd} @var{bps_wr} @var{iops} @var{iops_rd} @var{iops_wr}
+ETEXI
+
+    {
+        .name       = "block_set_io_throttle",
+        .args_type  = "device:B,bps:i?,bps_rd:i?,bps_wr:i?,iops:i?,iops_rd:i?,iops_wr:i?",
+        .params     = "device [bps] [bps_rd] [bps_wr] [iops] [iops_rd] [iops_wr]",
+        .help       = "change I/O throttle limits for a block drive",
+        .user_print = monitor_user_noop,
+        .mhandler.cmd_new = do_block_set_io_throttle,
+    },
+
+STEXI
 @item block_passwd @var{device} @var{password}
 @findex block_passwd
 Set the encrypted device @var{device} password to @var{password}
diff --git a/hmp.c b/hmp.c
index 443d3a7..dfab7ad 100644
--- a/hmp.c
+++ b/hmp.c
@@ -216,6 +216,16 @@ void hmp_info_block(Monitor *mon)
                            info->value->inserted->ro,
                            info->value->inserted->drv,
                            info->value->inserted->encrypted);
+
+            monitor_printf(mon, " bps=%" PRId64 " bps_rd=%" PRId64
+                            " bps_wr=%" PRId64 " iops=%" PRId64
+                            " iops_rd=%" PRId64 " iops_wr=%" PRId64,
+                            info->value->inserted->bps,
+                            info->value->inserted->bps_rd,
+                            info->value->inserted->bps_wr,
+                            info->value->inserted->iops,
+                            info->value->inserted->iops_rd,
+                            info->value->inserted->iops_wr);
         } else {
             monitor_printf(mon, " [not inserted]");
         }
diff --git a/qapi-schema.json b/qapi-schema.json
index cb1ba77..734076b 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -370,13 +370,27 @@
 #
 # @encrypted: true if the backing device is encrypted
 #
+# @bps:  #optional if total throughput limit in bytes per second is specified
+#
+# @bps_rd:  #optional if read throughput limit in bytes per second is specified
+#
+# @bps_wr:  #optional if write throughput limit in bytes per second is specified
+#
+# @iops:  #optional if total I/O operations per second is specified
+#
+# @iops_rd:  #optional if read I/O operations per second is specified
+#
+# @iops_wr:  #optional if write I/O operations per second is specified
+#
 # Since: 0.14.0
 #
 # Notes: This interface is only found in @BlockInfo.
 ##
 { 'type': 'BlockDeviceInfo',
   'data': { 'file': 'str', 'ro': 'bool', 'drv': 'str',
-            '*backing_file': 'str', 'encrypted': 'bool' } }
+            '*backing_file': 'str', 'encrypted': 'bool',
+            'bps': 'int', 'bps_rd': 'int', 'bps_wr': 'int',
+            'iops': 'int', 'iops_rd': 'int', 'iops_wr': 'int'} }
 
 ##
 # @BlockDeviceIoStatus:
diff --git a/qerror.c b/qerror.c
index 4b48b39..807fb55 100644
--- a/qerror.c
+++ b/qerror.c
@@ -238,6 +238,10 @@ static const QErrorStringTable qerror_table[] = {
         .error_fmt = QERR_QGA_COMMAND_FAILED,
         .desc      = "Guest agent command failed, error was '%(message)'",
     },
+    {
+        .error_fmt = QERR_INVALID_PARAMETER_COMBINATION,
+        .desc      = "Invalid paramter combination",
+    },
     {}
 };
 
diff --git a/qerror.h b/qerror.h
index d4bfcfd..777a36a 100644
--- a/qerror.h
+++ b/qerror.h
@@ -198,4 +198,7 @@ QError *qobject_to_qerror(const QObject *obj);
 #define QERR_QGA_COMMAND_FAILED \
     "{ 'class': 'QgaCommandFailed', 'data': { 'message': %s } }"
 
+#define QERR_INVALID_PARAMETER_COMBINATION \
+    "{ 'class': 'InvalidParameterCombination', 'data': {} }"
+
 #endif /* QERROR_H */
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 97975a5..cdc3c18 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -851,6 +851,44 @@ Example:
 EQMP
 
     {
+        .name       = "block_set_io_throttle",
+        .args_type  = "device:B,bps:i?,bps_rd:i?,bps_wr:i?,iops:i?,iops_rd:i?,iops_wr:i?",
+        .params     = "device [bps] [bps_rd] [bps_wr] [iops] [iops_rd] [iops_wr]",
+        .help       = "change I/O throttle limits for a block drive",
+        .user_print = monitor_user_noop,
+        .mhandler.cmd_new = do_block_set_io_throttle,
+    },
+
+SQMP
+block_set_io_throttle
+------------
+
+Change I/O throttle limits for a block drive.
+
+Arguments:
+
+- "device": device name (json-string)
+- "bps":  total throughput limit in bytes per second(json-int, optional)
+- "bps_rd":  read throughput limit in bytes per second(json-int, optional)
+- "bps_wr":  read throughput limit in bytes per second(json-int, optional)
+- "iops":  total I/O operations per second(json-int, optional)
+- "iops_rd":  read I/O operations per second(json-int, optional)
+- "iops_wr":  write I/O operations per second(json-int, optional)
+
+Example:
+
+-> { "execute": "block_set_io_throttle", "arguments": { "device": "virtio0",
+                                               "bps": "1000000",
+                                               "bps_rd": "0",
+                                               "bps_wr": "0",
+                                               "iops": "0",
+                                               "iops_rd": "0",
+                                               "iops_wr": "0" } }
+<- { "return": {} }
+
+EQMP
+
+    {
         .name       = "set_password",
         .args_type  = "protocol:s,password:s,connected:s?",
         .params     = "protocol password action-if-connected",
@@ -1152,6 +1190,13 @@ Each json-object contain the following:
                                 "tftp", "vdi", "vmdk", "vpc", "vvfat"
          - "backing_file": backing file name (json-string, optional)
          - "encrypted": true if encrypted, false otherwise (json-bool)
+         - "bps": limit total bytes per second (json-int)
+         - "bps_rd": limit read bytes per second (json-int)
+         - "bps_wr": limit write bytes per second (json-int)
+         - "iops": limit total I/O operations per second (json-int)
+         - "iops_rd": limit read operations per second (json-int)
+         - "iops_wr": limit write operations per second (json-int)
+
 - "io-status": I/O operation status, only present if the device supports it
                and the VM is configured to stop on errors. It's always reset
                to "ok" when the "cont" command is issued (json_string, optional)
@@ -1171,7 +1216,13 @@ Example:
                "ro":false,
                "drv":"qcow2",
                "encrypted":false,
-               "file":"disks/test.img"
+               "file":"disks/test.img",
+               "bps":1000000,
+               "bps_rd":0,
+               "bps_wr":0,
+               "iops":1000000,
+               "iops_rd":0,
+               "iops_wr":0,
             },
             "type":"unknown"
          },
-- 
1.7.6

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [Qemu-devel] [PATCH v10 2/3] block: add I/O throttling algorithm
@ 2011-11-01 11:53 Zhi Yong Wu
  0 siblings, 0 replies; 9+ messages in thread
From: Zhi Yong Wu @ 2011-11-01 11:53 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, zwu.kernel, Zhi Yong Wu, stefanha

Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
---
 block.c               |  230 +++++++++++++++++++++++++++++++++++++++++++++++++
 block.h               |    1 +
 block_int.h           |    1 +
 qemu-coroutine-lock.c |    8 ++
 qemu-coroutine.h      |    6 ++
 5 files changed, 246 insertions(+), 0 deletions(-)

diff --git a/block.c b/block.c
index 8f08dc5..08b6ec6 100644
--- a/block.c
+++ b/block.c
@@ -74,6 +74,13 @@ static BlockDriverAIOCB *bdrv_co_aio_rw_vector(BlockDriverState *bs,
                                                bool is_write);
 static void coroutine_fn bdrv_co_do_rw(void *opaque);
 
+static bool bdrv_exceed_bps_limits(BlockDriverState *bs, int nb_sectors,
+        bool is_write, double elapsed_time, uint64_t *wait);
+static bool bdrv_exceed_iops_limits(BlockDriverState *bs, bool is_write,
+        double elapsed_time, uint64_t *wait);
+static bool bdrv_exceed_io_limits(BlockDriverState *bs, int nb_sectors,
+        bool is_write, int64_t *wait);
+
 static QTAILQ_HEAD(, BlockDriverState) bdrv_states =
     QTAILQ_HEAD_INITIALIZER(bdrv_states);
 
@@ -107,6 +114,28 @@ int is_windows_drive(const char *filename)
 #endif
 
 /* throttling disk I/O limits */
+void bdrv_io_limits_disable(BlockDriverState *bs)
+{
+    bs->io_limits_enabled = false;
+
+    if (!qemu_co_queue_empty(&bs->throttled_reqs)) {
+        while (qemu_co_queue_next(&bs->throttled_reqs));
+    }
+
+    qemu_co_queue_init(&bs->throttled_reqs);
+
+    if (bs->block_timer) {
+        qemu_del_timer(bs->block_timer);
+        qemu_free_timer(bs->block_timer);
+        bs->block_timer = NULL;
+    }
+
+    bs->slice_start = 0;
+    bs->slice_end   = 0;
+    bs->slice_time  = 0;
+    memset(&bs->io_disps, 0, sizeof(bs->io_disps));
+}
+
 static void bdrv_block_timer(void *opaque)
 {
     BlockDriverState *bs = opaque;
@@ -137,6 +166,35 @@ bool bdrv_io_limits_enabled(BlockDriverState *bs)
          || io_limits->iops[BLOCK_IO_LIMIT_TOTAL];
 }
 
+static void bdrv_io_limits_intercept(BlockDriverState *bs,
+                                     bool is_write, int nb_sectors)
+{
+    int64_t wait_time = -1;
+
+    if (!qemu_co_queue_empty(&bs->throttled_reqs)) {
+        qemu_co_queue_wait(&bs->throttled_reqs);
+        goto resume;
+    } else if (bdrv_exceed_io_limits(bs, nb_sectors, is_write, &wait_time)) {
+        if (wait_time != -1) {
+            qemu_mod_timer(bs->block_timer,
+                           wait_time + qemu_get_clock_ns(vm_clock));
+        }
+
+        qemu_co_queue_wait(&bs->throttled_reqs);
+
+resume:
+        while (bdrv_exceed_io_limits(bs, nb_sectors, is_write, &wait_time)) {
+            if (wait_time != -1) {
+                qemu_mod_timer(bs->block_timer,
+                               wait_time + qemu_get_clock_ns(vm_clock));
+            }
+            qemu_co_queue_wait_insert_head(&bs->throttled_reqs);
+        }
+
+        qemu_co_queue_next(&bs->throttled_reqs);
+    }
+}
+
 /* check if the path starts with "<protocol>:" */
 static int path_has_protocol(const char *path)
 {
@@ -719,6 +777,11 @@ int bdrv_open(BlockDriverState *bs, const char *filename, int flags,
         bdrv_dev_change_media_cb(bs, true);
     }
 
+    /* throttling disk I/O limits */
+    if (bs->io_limits_enabled) {
+        bdrv_io_limits_enable(bs);
+    }
+
     return 0;
 
 unlink_and_fail:
@@ -754,6 +817,9 @@ void bdrv_close(BlockDriverState *bs)
 
         bdrv_dev_change_media_cb(bs, false);
     }
+
+    /*throttling disk I/O limits*/
+    bdrv_io_limits_disable(bs);
 }
 
 void bdrv_close_all(void)
@@ -1292,6 +1358,11 @@ static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs,
         return -EIO;
     }
 
+    /* throttling disk read I/O */
+    if (bs->io_limits_enabled) {
+        bdrv_io_limits_intercept(bs, false, nb_sectors);
+    }
+
     return drv->bdrv_co_readv(bs, sector_num, nb_sectors, qiov);
 }
 
@@ -1322,6 +1393,11 @@ static int coroutine_fn bdrv_co_do_writev(BlockDriverState *bs,
         return -EIO;
     }
 
+    /* throttling disk write I/O */
+    if (bs->io_limits_enabled) {
+        bdrv_io_limits_intercept(bs, true, nb_sectors);
+    }
+
     ret = drv->bdrv_co_writev(bs, sector_num, nb_sectors, qiov);
 
     if (bs->dirty_bitmap) {
@@ -2513,6 +2589,160 @@ void bdrv_aio_cancel(BlockDriverAIOCB *acb)
     acb->pool->cancel(acb);
 }
 
+/* block I/O throttling */
+static bool bdrv_exceed_bps_limits(BlockDriverState *bs, int nb_sectors,
+                 bool is_write, double elapsed_time, uint64_t *wait) {
+    uint64_t bps_limit = 0;
+    double   bytes_limit, bytes_disp, bytes_res;
+    double   slice_time, wait_time;
+
+    if (bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL]) {
+        bps_limit = bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL];
+    } else if (bs->io_limits.bps[is_write]) {
+        bps_limit = bs->io_limits.bps[is_write];
+    } else {
+        if (wait) {
+            *wait = 0;
+        }
+
+        return false;
+    }
+
+    slice_time = bs->slice_end - bs->slice_start;
+    slice_time /= (NANOSECONDS_PER_SECOND);
+    bytes_limit = bps_limit * slice_time;
+    bytes_disp  = bs->nr_bytes[is_write] - bs->io_disps.bytes[is_write];
+    if (bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL]) {
+        bytes_disp += bs->nr_bytes[!is_write] - bs->io_disps.bytes[!is_write];
+    }
+
+    bytes_res   = (unsigned) nb_sectors * BDRV_SECTOR_SIZE;
+
+    if (bytes_disp + bytes_res <= bytes_limit) {
+        if (wait) {
+            *wait = 0;
+        }
+
+        return false;
+    }
+
+    /* Calc approx time to dispatch */
+    wait_time = (bytes_disp + bytes_res) / bps_limit - elapsed_time;
+
+    bs->slice_time = wait_time * BLOCK_IO_SLICE_TIME * 10;
+    bs->slice_end += bs->slice_time - 3 * BLOCK_IO_SLICE_TIME;
+    if (wait) {
+        *wait = wait_time * BLOCK_IO_SLICE_TIME * 10;
+    }
+
+    return true;
+}
+
+static bool bdrv_exceed_iops_limits(BlockDriverState *bs, bool is_write,
+                             double elapsed_time, uint64_t *wait) {
+    uint64_t iops_limit = 0;
+    double   ios_limit, ios_disp;
+    double   slice_time, wait_time;
+
+    if (bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL]) {
+        iops_limit = bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL];
+    } else if (bs->io_limits.iops[is_write]) {
+        iops_limit = bs->io_limits.iops[is_write];
+    } else {
+        if (wait) {
+            *wait = 0;
+        }
+
+        return false;
+    }
+
+    slice_time = bs->slice_end - bs->slice_start;
+    slice_time /= (NANOSECONDS_PER_SECOND);
+    ios_limit  = iops_limit * slice_time;
+    ios_disp   = bs->nr_ops[is_write] - bs->io_disps.ios[is_write];
+    if (bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL]) {
+        ios_disp += bs->nr_ops[!is_write] - bs->io_disps.ios[!is_write];
+    }
+
+    if (ios_disp + 1 <= ios_limit) {
+        if (wait) {
+            *wait = 0;
+        }
+
+        return false;
+    }
+
+    /* Calc approx time to dispatch */
+    wait_time = (ios_disp + 1) / iops_limit;
+    if (wait_time > elapsed_time) {
+        wait_time = wait_time - elapsed_time;
+    } else {
+        wait_time = 0;
+    }
+
+    bs->slice_time = wait_time * BLOCK_IO_SLICE_TIME * 10;
+    bs->slice_end += bs->slice_time - 3 * BLOCK_IO_SLICE_TIME;
+    if (wait) {
+        *wait = wait_time * BLOCK_IO_SLICE_TIME * 10;
+    }
+
+    return true;
+}
+
+static bool bdrv_exceed_io_limits(BlockDriverState *bs, int nb_sectors,
+                           bool is_write, int64_t *wait) {
+    int64_t  now, max_wait;
+    uint64_t bps_wait = 0, iops_wait = 0;
+    double   elapsed_time;
+    int      bps_ret, iops_ret;
+
+    now = qemu_get_clock_ns(vm_clock);
+    if ((bs->slice_start < now)
+        && (bs->slice_end > now)) {
+        bs->slice_end = now + bs->slice_time;
+    } else {
+        bs->slice_time  =  5 * BLOCK_IO_SLICE_TIME;
+        bs->slice_start = now;
+        bs->slice_end   = now + bs->slice_time;
+
+        bs->io_disps.bytes[is_write]  = bs->nr_bytes[is_write];
+        bs->io_disps.bytes[!is_write] = bs->nr_bytes[!is_write];
+
+        bs->io_disps.ios[is_write]    = bs->nr_ops[is_write];
+        bs->io_disps.ios[!is_write]   = bs->nr_ops[!is_write];
+    }
+
+    elapsed_time  = now - bs->slice_start;
+    elapsed_time  /= (NANOSECONDS_PER_SECOND);
+
+    bps_ret  = bdrv_exceed_bps_limits(bs, nb_sectors,
+                                      is_write, elapsed_time, &bps_wait);
+    iops_ret = bdrv_exceed_iops_limits(bs, is_write,
+                                      elapsed_time, &iops_wait);
+    if (bps_ret || iops_ret) {
+        max_wait = bps_wait > iops_wait ? bps_wait : iops_wait;
+        if (!qemu_co_queue_empty(&bs->throttled_reqs)) {
+            max_wait = -1;
+        }
+
+        if (wait) {
+            *wait = max_wait;
+        }
+
+        now = qemu_get_clock_ns(vm_clock);
+        if (bs->slice_end < now + max_wait) {
+            bs->slice_end = now + max_wait;
+        }
+
+        return true;
+    }
+
+    if (wait) {
+        *wait = 0;
+    }
+
+    return false;
+}
 
 /**************************************************************/
 /* async block device emulation */
diff --git a/block.h b/block.h
index bc8315d..9b5b35f 100644
--- a/block.h
+++ b/block.h
@@ -91,6 +91,7 @@ void bdrv_info_stats(Monitor *mon, QObject **ret_data);
 
 /* disk I/O throttling */
 void bdrv_io_limits_enable(BlockDriverState *bs);
+void bdrv_io_limits_disable(BlockDriverState *bs);
 bool bdrv_io_limits_enabled(BlockDriverState *bs);
 
 void bdrv_init(void);
diff --git a/block_int.h b/block_int.h
index b835ef6..fc53f5b 100644
--- a/block_int.h
+++ b/block_int.h
@@ -39,6 +39,7 @@
 #define BLOCK_IO_LIMIT_TOTAL    2
 
 #define BLOCK_IO_SLICE_TIME     100000000
+#define NANOSECONDS_PER_SECOND  1000000000.0
 
 #define BLOCK_OPT_SIZE          "size"
 #define BLOCK_OPT_ENCRYPT       "encryption"
diff --git a/qemu-coroutine-lock.c b/qemu-coroutine-lock.c
index 6b58160..9549c07 100644
--- a/qemu-coroutine-lock.c
+++ b/qemu-coroutine-lock.c
@@ -61,6 +61,14 @@ void coroutine_fn qemu_co_queue_wait(CoQueue *queue)
     assert(qemu_in_coroutine());
 }
 
+void coroutine_fn qemu_co_queue_wait_insert_head(CoQueue *queue)
+{
+    Coroutine *self = qemu_coroutine_self();
+    QTAILQ_INSERT_HEAD(&queue->entries, self, co_queue_next);
+    qemu_coroutine_yield();
+    assert(qemu_in_coroutine());
+}
+
 bool qemu_co_queue_next(CoQueue *queue)
 {
     Coroutine *next;
diff --git a/qemu-coroutine.h b/qemu-coroutine.h
index b8fc4f4..8a2e5d2 100644
--- a/qemu-coroutine.h
+++ b/qemu-coroutine.h
@@ -118,6 +118,12 @@ void qemu_co_queue_init(CoQueue *queue);
 void coroutine_fn qemu_co_queue_wait(CoQueue *queue);
 
 /**
+ * Adds the current coroutine to the head of the CoQueue and transfers control to the
+ * caller of the coroutine.
+ */
+void coroutine_fn qemu_co_queue_wait_insert_head(CoQueue *queue);
+
+/**
  * Restarts the next coroutine in the CoQueue and removes it from the queue.
  *
  * Returns true if a coroutine was restarted, false if the queue is empty.
-- 
1.7.6

^ permalink raw reply related	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2011-11-02  3:44 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-11-01  7:40 [Qemu-devel] [PATCH v10 0/3] The intro of QEMU block I/O throttling Zhi Yong Wu
2011-11-01  7:40 ` [Qemu-devel] [PATCH v10 1/3] block: add the command line support Zhi Yong Wu
2011-11-01 22:43   ` Ryan Harper
2011-11-02  3:44     ` Zhi Yong Wu
2011-11-01  7:40 ` [Qemu-devel] [PATCH v10 2/3] block: add I/O throttling algorithm Zhi Yong Wu
2011-11-01 11:33   ` Stefan Hajnoczi
2011-11-01 11:40     ` Zhi Yong Wu
2011-11-01  7:40 ` [Qemu-devel] [PATCH v10 3/3] hmp/qmp: add block_set_io_throttle Zhi Yong Wu
  -- strict thread matches above, loose matches on Subject: below --
2011-11-01 11:53 [Qemu-devel] [PATCH v10 2/3] block: add I/O throttling algorithm Zhi Yong Wu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).