[Qemu-devel] [PATCH V5 0/5] Continuous Leaky Bucket Throttling

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH V5 0/5] Continuous Leaky Bucket Throttling
@ 2013-08-12 16:53 Benoît Canet
  2013-08-12 16:53 ` [Qemu-devel] [PATCH V5 1/5] throttle: Add a new throttling API implementing continuous leaky bucket Benoît Canet
                   ` (5 more replies)
  0 siblings, 6 replies; 17+ messages in thread
From: Benoît Canet @ 2013-08-12 16:53 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

This patchset implement continous leaky bucket throttling.

It use two requests queue to enable to do silly unbalanced throttling like
block_set_io_throttle 0 0 0 0 6000 1

It use two timer to get the timer callbacks and the throttle.c code simple

It add unit tests.

The throttling core is pretty solid and the surrouding of the patchset needs
polish. (new options ...)

since previous version:
Fix bdrv_drain_all broken logic hence fixing the assertion error at exit.

v3-V4:
    wrap qemu-option.hx declararation [Eric]
    continuus -> continuous [Fam]
    unit test [Paolo]

Benoît Canet (5):
  throttle: Add a new throttling API implementing continuous leaky
    bucket.
  throttle: Add units tests
  block: Enable the new throttling code in the block layer.
  block: Add support for throttling burst max in QMP and the command
    line.
  block: Add iops_sector_count to do the iops accounting for a given io
    size.

 block.c                   |  349 ++++++++++----------------------
 block/qapi.c              |   50 +++--
 blockdev.c                |  207 ++++++++++++++-----
 hmp.c                     |   36 +++-
 include/block/block.h     |    1 -
 include/block/block_int.h |   32 +--
 include/qemu/throttle.h   |  105 ++++++++++
 qapi-schema.json          |   40 +++-
 qemu-options.hx           |    4 +-
 qmp-commands.hx           |   34 +++-
 tests/Makefile            |    2 +
 tests/test-throttle.c     |  494 +++++++++++++++++++++++++++++++++++++++++++++
 util/Makefile.objs        |    1 +
 util/throttle.c           |  391 +++++++++++++++++++++++++++++++++++
 14 files changed, 1405 insertions(+), 341 deletions(-)
 create mode 100644 include/qemu/throttle.h
 create mode 100644 tests/test-throttle.c
 create mode 100644 util/throttle.c

-- 
1.7.10.4

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH V5 1/5] throttle: Add a new throttling API implementing continuous leaky bucket.
  2013-08-12 16:53 [Qemu-devel] [PATCH V5 0/5] Continuous Leaky Bucket Throttling Benoît Canet
@ 2013-08-12 16:53 ` Benoît Canet
  2013-08-14  8:52   ` Fam Zheng
  2013-08-16 11:45   ` Stefan Hajnoczi
  2013-08-12 16:53 ` [Qemu-devel] [PATCH V5 2/5] throttle: Add units tests Benoît Canet
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 17+ messages in thread
From: Benoît Canet @ 2013-08-12 16:53 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

Implement the continuous leaky bucket algorithm devised on IRC as a separate
module.

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 include/qemu/throttle.h |  105 +++++++++++++
 util/Makefile.objs      |    1 +
 util/throttle.c         |  391 +++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 497 insertions(+)
 create mode 100644 include/qemu/throttle.h
 create mode 100644 util/throttle.c

diff --git a/include/qemu/throttle.h b/include/qemu/throttle.h
new file mode 100644
index 0000000..e03bc3e
--- /dev/null
+++ b/include/qemu/throttle.h
@@ -0,0 +1,105 @@
+/*
+ * QEMU throttling infrastructure
+ *
+ * Copyright (C) Nodalink, SARL. 2013
+ *
+ * Author:
+ *   Benoît Canet <benoit.canet@irqsave.net>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 or
+ * (at your option) version 3 of the License.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef THROTTLING_H
+#define THROTTLING_H
+
+#include <stdint.h>
+#include "qemu-common.h"
+#include "qemu/timer.h"
+
+#define NANOSECONDS_PER_SECOND  1000000000.0
+
+#define BUCKETS_COUNT 6
+
+typedef enum {
+    THROTTLE_BPS_TOTAL = 0,
+    THROTTLE_BPS_READ  = 1,
+    THROTTLE_BPS_WRITE = 2,
+    THROTTLE_OPS_TOTAL = 3,
+    THROTTLE_OPS_READ  = 4,
+    THROTTLE_OPS_WRITE = 5,
+} BucketType;
+
+typedef struct LeakyBucket {
+    double  ups;            /* units per second */
+    double  max;            /* leaky bucket max in units */
+    double  bucket;         /* bucket in units */
+} LeakyBucket;
+
+/* The following structure is used to configure a ThrottleState
+ * It contains a bit of state: the bucket field of the LeakyBucket structure.
+ * However it allows to keep the code clean and the bucket field is reset to
+ * zero at the right time.
+ */
+typedef struct ThrottleConfig {
+    LeakyBucket buckets[6]; /* leaky buckets */
+    uint64_t unit_size;     /* size of an unit in bytes */
+    uint64_t op_size;       /* size of an operation in units */
+} ThrottleConfig;
+
+typedef struct ThrottleState {
+    ThrottleConfig cfg;     /* configuration */
+    int64_t previous_leak;  /* timestamp of the last leak done */
+    QEMUTimer * timers[2];  /* timers used to do the throttling */
+    QEMUClock *clock;       /* the clock used */
+} ThrottleState;
+
+/* operations on single leaky buckets */
+void throttle_leak_bucket(LeakyBucket *bkt, int64_t delta);
+
+int64_t throttle_compute_wait(LeakyBucket *bkt);
+
+/* expose timer computation function for unit tests */
+bool throttle_compute_timer(ThrottleState *ts,
+                            bool is_write,
+                            int64_t now,
+                            int64_t *next_timer);
+
+/* init/destroy cycle */
+void throttle_init(ThrottleState *ts,
+                   QEMUClock *clock,
+                   void (read_timer)(void *),
+                   void (write_timer)(void *),
+                   void *timer_opaque);
+
+void throttle_destroy(ThrottleState *ts);
+
+bool throttle_have_timer(ThrottleState *ts);
+
+/* configuration */
+bool throttle_enabled(ThrottleConfig *cfg);
+
+bool throttle_conflicting(ThrottleConfig *cfg);
+
+bool throttle_is_valid(ThrottleConfig *cfg);
+
+void throttle_config(ThrottleState *ts, ThrottleConfig *cfg);
+
+void throttle_get_config(ThrottleState *ts, ThrottleConfig *cfg);
+
+/* usage */
+bool throttle_allowed(ThrottleState *ts, bool is_write);
+
+void throttle_account(ThrottleState *ts, bool is_write, uint64_t size);
+
+#endif
diff --git a/util/Makefile.objs b/util/Makefile.objs
index dc72ab0..2bb13a2 100644
--- a/util/Makefile.objs
+++ b/util/Makefile.objs
@@ -11,3 +11,4 @@ util-obj-y += iov.o aes.o qemu-config.o qemu-sockets.o uri.o notify.o
 util-obj-y += qemu-option.o qemu-progress.o
 util-obj-y += hexdump.o
 util-obj-y += crc32c.o
+util-obj-y += throttle.o
diff --git a/util/throttle.c b/util/throttle.c
new file mode 100644
index 0000000..2f25d44
--- /dev/null
+++ b/util/throttle.c
@@ -0,0 +1,391 @@
+/*
+ * QEMU throttling infrastructure
+ *
+ * Copyright (C) Nodalink, SARL. 2013
+ *
+ * Author:
+ *   Benoît Canet <benoit.canet@irqsave.net>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 or
+ * (at your option) version 3 of the License.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/throttle.h"
+#include "qemu/timer.h"
+
+/* This function make a bucket leak
+ *
+ * @bkt:   the bucket to make leak
+ * @delta: the time delta
+ */
+void throttle_leak_bucket(LeakyBucket *bkt, int64_t delta)
+{
+    double leak;
+
+    /* compute how much to leak */
+    leak = (bkt->ups * (double) delta) / NANOSECONDS_PER_SECOND;
+
+    /* make the bucket leak */
+    bkt->bucket = MAX(bkt->bucket - leak, 0);
+}
+
+/* Calculate the time delta since last leak and make proportionals leaks
+ *
+ * @now:      the current timestamp in ns
+ */
+static void throttle_do_leak(ThrottleState *ts, int64_t now)
+{
+    /* compute the time elapsed since the last leak */
+    int64_t delta = now - ts->previous_leak;
+    int i;
+
+    ts->previous_leak = now;
+
+    if (delta <= 0) {
+        return;
+    }
+
+    /* make each bucket leak */
+    for (i = 0; i < BUCKETS_COUNT; i++) {
+        throttle_leak_bucket(&ts->cfg.buckets[i], delta);
+    }
+}
+
+/* do the real job of computing the time to wait
+ *
+ * @limit: the throttling limit
+ * @extra: the number of operation to delay
+ * @ret:   the time to wait in ns
+ */
+static int64_t throttle_do_compute_wait(double limit, double extra)
+{
+    double wait = extra * NANOSECONDS_PER_SECOND;
+    wait /= limit;
+    return wait;
+}
+
+/* This function compute the wait time in ns that a leaky bucket should trigger
+ *
+ * @bkt: the leaky bucket we operate on
+ * @ret: the resulting wait time in ns or 0 if the operation can go through
+ */
+int64_t throttle_compute_wait(LeakyBucket *bkt)
+{
+    double extra; /* the number of extra units blocking the io */
+
+    if (!bkt->ups) {
+        return 0;
+    }
+
+    extra = bkt->bucket - bkt->max;
+
+    if (extra <= 0) {
+        return 0;
+    }
+
+    return throttle_do_compute_wait(bkt->ups, extra);
+}
+
+/* This function compute the time that must be waited while this IO
+ *
+ * @is_write:   true if the current IO is a write, false if it's a read
+ * @ret:        time to wait
+ */
+static int64_t throttle_compute_wait_for(ThrottleState *ts,
+                                         bool is_write,
+                                         int64_t now)
+{
+    BucketType to_check[2][4] = { {THROTTLE_BPS_TOTAL,
+                                   THROTTLE_OPS_TOTAL,
+                                   THROTTLE_BPS_READ,
+                                   THROTTLE_OPS_READ},
+                                  {THROTTLE_BPS_TOTAL,
+                                   THROTTLE_OPS_TOTAL,
+                                   THROTTLE_BPS_WRITE,
+                                   THROTTLE_OPS_WRITE}, };
+    int64_t wait, max_wait = 0;
+    int i;
+
+    for (i = 0; i < 4; i++) {
+        BucketType index = to_check[is_write][i];
+        wait = throttle_compute_wait(&ts->cfg.buckets[index]);
+        if (wait > max_wait) {
+            max_wait = wait;
+        }
+    }
+
+    return max_wait;
+}
+
+/* compute the timer for this type of operation
+ *
+ * @is_write:   the type of operation
+ * @now:        the current clock timerstamp
+ * @next_timer: the resulting timer
+ * @ret:        true if a timer must be set
+ */
+bool throttle_compute_timer(ThrottleState *ts,
+                            bool is_write,
+                            int64_t now,
+                            int64_t *next_timer)
+{
+    int64_t wait;
+
+    /* leak proportionally to the time elapsed */
+    throttle_do_leak(ts, now);
+
+    /* compute the wait time if any */
+    wait = throttle_compute_wait_for(ts, is_write, now);
+
+    /* if the code must wait compute when the next timer should fire */
+    if (wait) {
+        *next_timer = now + wait;
+        return true;
+    }
+
+    /* else no need to wait at all */
+    *next_timer = now;
+    return false;
+}
+
+/* To be called first on the ThrottleState */
+void throttle_init(ThrottleState *ts,
+                   QEMUClock *clock,
+                   void (read_timer)(void *),
+                   void (write_timer)(void *),
+                   void *timer_opaque)
+{
+    memset(ts, 0, sizeof(ThrottleState));
+
+    ts->clock = clock;
+    ts->timers[0] = qemu_new_timer_ns(ts->clock, read_timer, timer_opaque);
+    ts->timers[1] = qemu_new_timer_ns(ts->clock, write_timer, timer_opaque);
+}
+
+/* destroy a timer */
+static void throttle_timer_destroy(QEMUTimer **timer)
+{
+    assert(*timer != NULL);
+
+    if (qemu_timer_pending(*timer)) {
+        qemu_del_timer(*timer);
+    }
+
+    qemu_free_timer(*timer);
+    *timer = NULL;
+}
+
+/* To be called last on the ThrottleState */
+void throttle_destroy(ThrottleState *ts)
+{
+    int i;
+
+    for (i = 0; i < 2; i++) {
+        throttle_timer_destroy(&ts->timers[i]);
+    }
+}
+
+/* is any throttling timer configured */
+bool throttle_have_timer(ThrottleState *ts)
+{
+    if (ts->timers[0]) {
+        return true;
+    }
+
+    return false;
+}
+
+/* Does any throttling must be done
+ *
+ * @cfg: the throttling configuration to inspect
+ * @ret: true if throttling must be done else false
+ */
+bool throttle_enabled(ThrottleConfig *cfg)
+{
+    int i;
+
+    for (i = 0; i < BUCKETS_COUNT; i++) {
+        if (cfg->buckets[i].ups > 0) {
+            return true;
+        }
+    }
+
+    return false;
+}
+
+/* return true if any two throttling parameters conflicts
+ *
+ * @cfg: the throttling configuration to inspect
+ * @ret: true if any conflict detected else false
+ */
+bool throttle_conflicting(ThrottleConfig *cfg)
+{
+    bool bps_flag, ops_flag;
+    bool bps_max_flag, ops_max_flag;
+
+    bps_flag = cfg->buckets[THROTTLE_BPS_TOTAL].ups &&
+               (cfg->buckets[THROTTLE_BPS_READ].ups ||
+                cfg->buckets[THROTTLE_BPS_WRITE].ups);
+
+    ops_flag = cfg->buckets[THROTTLE_OPS_TOTAL].ups &&
+               (cfg->buckets[THROTTLE_OPS_READ].ups ||
+                cfg->buckets[THROTTLE_OPS_WRITE].ups);
+
+    bps_max_flag = cfg->buckets[THROTTLE_BPS_TOTAL].max &&
+                  (cfg->buckets[THROTTLE_BPS_READ].max  ||
+                   cfg->buckets[THROTTLE_BPS_WRITE].max);
+
+    ops_max_flag = cfg->buckets[THROTTLE_OPS_TOTAL].max &&
+                   (cfg->buckets[THROTTLE_OPS_READ].max ||
+                   cfg->buckets[THROTTLE_OPS_WRITE].max);
+
+    return bps_flag || ops_flag || bps_max_flag || ops_max_flag;
+}
+
+/* check if a throttling configuration is valid
+ * @cfg: the throttling configuration to inspect
+ * @ret: true if valid else false
+ */
+bool throttle_is_valid(ThrottleConfig *cfg)
+{
+    bool invalid = false;
+    int i;
+
+    for (i = 0; i < BUCKETS_COUNT; i++) {
+        if (cfg->buckets[i].ups < 0) {
+            invalid = true;
+        }
+    }
+
+    for (i = 0; i < BUCKETS_COUNT; i++) {
+        if (cfg->buckets[i].max < 0) {
+            invalid = true;
+        }
+    }
+
+    return !invalid;
+}
+
+/* fix bucket parameters */
+static void throttle_fix_bucket(LeakyBucket *bkt)
+{
+    double min = bkt->ups / 10;
+    /* zero bucket level */
+    bkt->bucket = 0;
+
+    /* take care of not using cpu and also improve throttling precision */
+    if (bkt->ups &&
+        bkt->max < min) {
+        bkt->max = min;
+    }
+}
+
+/* take care of canceling a timer */
+static void throttle_cancel_timer(QEMUTimer *timer)
+{
+    assert(timer != NULL);
+    if (!qemu_timer_pending(timer)) {
+        return;
+    }
+
+    qemu_del_timer(timer);
+}
+
+/* Used to configure the throttle
+ *
+ * @ts: the throttle state we are working on
+ * @cfg: the config to set
+ */
+void throttle_config(ThrottleState *ts, ThrottleConfig *cfg)
+{
+    int i;
+
+    ts->cfg = *cfg;
+
+    for (i = 0; i < BUCKETS_COUNT; i++) {
+        throttle_fix_bucket(&ts->cfg.buckets[i]);
+    }
+
+    ts->previous_leak = qemu_get_clock_ns(ts->clock);
+
+    for (i = 0; i < 2; i++) {
+        throttle_cancel_timer(ts->timers[i]);
+    }
+}
+
+/* used to get config
+ *
+ * @ts: the throttle state we are working on
+ * @cfg: where to write the config
+ */
+void throttle_get_config(ThrottleState *ts, ThrottleConfig *cfg)
+{
+    *cfg = ts->cfg;
+}
+
+
+/* compute if an operation must be allowed and set a timer if not
+ *
+ * NOTE: this function is not unit tested due to it's usage of qemu_mod_timer
+ *
+ * @is_write: the type of operation (read/write)
+ * @ret:      true if the operation is allowed to flow else if must wait
+ */
+bool throttle_allowed(ThrottleState *ts, bool is_write)
+{
+    int64_t now = qemu_get_clock_ns(ts->clock);
+    int64_t next_timer;
+    bool must_wait;
+
+    must_wait = throttle_compute_timer(ts,
+                                       is_write,
+                                       now,
+                                       &next_timer);
+
+    /* if the request is throttled arm timer */
+    if (must_wait) {
+        qemu_mod_timer(ts->timers[is_write], next_timer);
+    }
+
+    return !must_wait;
+}
+
+/* do the accounting for this operation
+ *
+ * @is_write: the type of operation (read/write)
+ * size:      the size of the operation
+ */
+void throttle_account(ThrottleState *ts, bool is_write, uint64_t size)
+{
+    double bytes_size;
+
+    /* if cfg.op_size is not defined we will acccount exactly 1 operation */
+    double units = 1.0;
+    if (ts->cfg.op_size) {
+        units = (double) size / ts->cfg.op_size;
+    }
+
+    bytes_size = size * ts->cfg.unit_size;
+
+    ts->cfg.buckets[THROTTLE_BPS_TOTAL].bucket += bytes_size;
+    ts->cfg.buckets[THROTTLE_OPS_TOTAL].bucket += units;
+
+    if (is_write) {
+        ts->cfg.buckets[THROTTLE_BPS_WRITE].bucket += bytes_size;
+        ts->cfg.buckets[THROTTLE_OPS_WRITE].bucket += units;
+    } else {
+        ts->cfg.buckets[THROTTLE_BPS_READ].bucket += bytes_size;
+        ts->cfg.buckets[THROTTLE_OPS_READ].bucket += units;
+    }
+}
+
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH V5 1/5] throttle: Add a new throttling API implementing continuous leaky bucket.
  2013-08-12 16:53 ` [Qemu-devel] [PATCH V5 1/5] throttle: Add a new throttling API implementing continuous leaky bucket Benoît Canet
@ 2013-08-14  8:52   ` Fam Zheng
  2013-08-16 11:45   ` Stefan Hajnoczi
  1 sibling, 0 replies; 17+ messages in thread
From: Fam Zheng @ 2013-08-14  8:52 UTC (permalink / raw)
  To: Benoît Canet; +Cc: kwolf, pbonzini, qemu-devel, stefanha

On Mon, 08/12 18:53, Benoît Canet wrote:
> Implement the continuous leaky bucket algorithm devised on IRC as a separate
> module.
> 
> Signed-off-by: Benoit Canet <benoit@irqsave.net>
> ---
>  include/qemu/throttle.h |  105 +++++++++++++
>  util/Makefile.objs      |    1 +
>  util/throttle.c         |  391 +++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 497 insertions(+)
>  create mode 100644 include/qemu/throttle.h
>  create mode 100644 util/throttle.c
> 
> diff --git a/include/qemu/throttle.h b/include/qemu/throttle.h
> new file mode 100644
> index 0000000..e03bc3e
> --- /dev/null
> +++ b/include/qemu/throttle.h
> @@ -0,0 +1,105 @@
> +/*
> + * QEMU throttling infrastructure
> + *
> + * Copyright (C) Nodalink, SARL. 2013
> + *
> + * Author:
> + *   Benoît Canet <benoit.canet@irqsave.net>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License as
> + * published by the Free Software Foundation; either version 2 or
> + * (at your option) version 3 of the License.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#ifndef THROTTLING_H
> +#define THROTTLING_H
> +
> +#include <stdint.h>
> +#include "qemu-common.h"
> +#include "qemu/timer.h"
> +
> +#define NANOSECONDS_PER_SECOND  1000000000.0
> +
> +#define BUCKETS_COUNT 6
> +
> +typedef enum {
> +    THROTTLE_BPS_TOTAL = 0,
> +    THROTTLE_BPS_READ  = 1,
> +    THROTTLE_BPS_WRITE = 2,
> +    THROTTLE_OPS_TOTAL = 3,
> +    THROTTLE_OPS_READ  = 4,
> +    THROTTLE_OPS_WRITE = 5,
> +} BucketType;
> +
> +typedef struct LeakyBucket {
> +    double  ups;            /* units per second */
> +    double  max;            /* leaky bucket max in units */
> +    double  bucket;         /* bucket in units */
> +} LeakyBucket;
> +
> +/* The following structure is used to configure a ThrottleState
> + * It contains a bit of state: the bucket field of the LeakyBucket structure.
> + * However it allows to keep the code clean and the bucket field is reset to
> + * zero at the right time.
> + */
> +typedef struct ThrottleConfig {
> +    LeakyBucket buckets[6]; /* leaky buckets */
> +    uint64_t unit_size;     /* size of an unit in bytes */
> +    uint64_t op_size;       /* size of an operation in units */
> +} ThrottleConfig;
> +
> +typedef struct ThrottleState {
> +    ThrottleConfig cfg;     /* configuration */
> +    int64_t previous_leak;  /* timestamp of the last leak done */
> +    QEMUTimer * timers[2];  /* timers used to do the throttling */
> +    QEMUClock *clock;       /* the clock used */
> +} ThrottleState;
> +
> +/* operations on single leaky buckets */
> +void throttle_leak_bucket(LeakyBucket *bkt, int64_t delta);
> +
> +int64_t throttle_compute_wait(LeakyBucket *bkt);
> +
> +/* expose timer computation function for unit tests */
> +bool throttle_compute_timer(ThrottleState *ts,
> +                            bool is_write,
> +                            int64_t now,
> +                            int64_t *next_timer);
> +
> +/* init/destroy cycle */
> +void throttle_init(ThrottleState *ts,
> +                   QEMUClock *clock,
> +                   void (read_timer)(void *),
> +                   void (write_timer)(void *),
> +                   void *timer_opaque);
> +
> +void throttle_destroy(ThrottleState *ts);
> +
> +bool throttle_have_timer(ThrottleState *ts);
> +
> +/* configuration */
> +bool throttle_enabled(ThrottleConfig *cfg);
> +
> +bool throttle_conflicting(ThrottleConfig *cfg);
> +
> +bool throttle_is_valid(ThrottleConfig *cfg);
> +
> +void throttle_config(ThrottleState *ts, ThrottleConfig *cfg);
> +
> +void throttle_get_config(ThrottleState *ts, ThrottleConfig *cfg);
> +
> +/* usage */
> +bool throttle_allowed(ThrottleState *ts, bool is_write);
> +
> +void throttle_account(ThrottleState *ts, bool is_write, uint64_t size);
> +
> +#endif
> diff --git a/util/Makefile.objs b/util/Makefile.objs
> index dc72ab0..2bb13a2 100644
> --- a/util/Makefile.objs
> +++ b/util/Makefile.objs
> @@ -11,3 +11,4 @@ util-obj-y += iov.o aes.o qemu-config.o qemu-sockets.o uri.o notify.o
>  util-obj-y += qemu-option.o qemu-progress.o
>  util-obj-y += hexdump.o
>  util-obj-y += crc32c.o
> +util-obj-y += throttle.o
> diff --git a/util/throttle.c b/util/throttle.c
> new file mode 100644
> index 0000000..2f25d44
> --- /dev/null
> +++ b/util/throttle.c
> @@ -0,0 +1,391 @@
> +/*
> + * QEMU throttling infrastructure
> + *
> + * Copyright (C) Nodalink, SARL. 2013
> + *
> + * Author:
> + *   Benoît Canet <benoit.canet@irqsave.net>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License as
> + * published by the Free Software Foundation; either version 2 or
> + * (at your option) version 3 of the License.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include "qemu/throttle.h"
> +#include "qemu/timer.h"
> +
> +/* This function make a bucket leak
> + *
> + * @bkt:   the bucket to make leak
> + * @delta: the time delta
> + */
> +void throttle_leak_bucket(LeakyBucket *bkt, int64_t delta)
> +{
> +    double leak;
> +
> +    /* compute how much to leak */
> +    leak = (bkt->ups * (double) delta) / NANOSECONDS_PER_SECOND;
> +
> +    /* make the bucket leak */
> +    bkt->bucket = MAX(bkt->bucket - leak, 0);
> +}
> +
> +/* Calculate the time delta since last leak and make proportionals leaks
> + *
> + * @now:      the current timestamp in ns
> + */
> +static void throttle_do_leak(ThrottleState *ts, int64_t now)
> +{
> +    /* compute the time elapsed since the last leak */
> +    int64_t delta = now - ts->previous_leak;
> +    int i;
> +
> +    ts->previous_leak = now;
> +
> +    if (delta <= 0) {
> +        return;
> +    }
> +
> +    /* make each bucket leak */
> +    for (i = 0; i < BUCKETS_COUNT; i++) {
> +        throttle_leak_bucket(&ts->cfg.buckets[i], delta);
> +    }
> +}
> +
> +/* do the real job of computing the time to wait
> + *
> + * @limit: the throttling limit
> + * @extra: the number of operation to delay
> + * @ret:   the time to wait in ns
> + */
> +static int64_t throttle_do_compute_wait(double limit, double extra)
> +{
> +    double wait = extra * NANOSECONDS_PER_SECOND;
> +    wait /= limit;
> +    return wait;
> +}
> +
> +/* This function compute the wait time in ns that a leaky bucket should trigger
> + *
> + * @bkt: the leaky bucket we operate on
> + * @ret: the resulting wait time in ns or 0 if the operation can go through
> + */
> +int64_t throttle_compute_wait(LeakyBucket *bkt)
> +{
> +    double extra; /* the number of extra units blocking the io */
> +
> +    if (!bkt->ups) {
> +        return 0;
> +    }
> +
> +    extra = bkt->bucket - bkt->max;
> +
> +    if (extra <= 0) {
> +        return 0;
> +    }
> +
> +    return throttle_do_compute_wait(bkt->ups, extra);
> +}
> +
> +/* This function compute the time that must be waited while this IO
> + *
> + * @is_write:   true if the current IO is a write, false if it's a read
> + * @ret:        time to wait
> + */
> +static int64_t throttle_compute_wait_for(ThrottleState *ts,
> +                                         bool is_write,
> +                                         int64_t now)

Parameter "now" not used.

> +{
> +    BucketType to_check[2][4] = { {THROTTLE_BPS_TOTAL,
> +                                   THROTTLE_OPS_TOTAL,
> +                                   THROTTLE_BPS_READ,
> +                                   THROTTLE_OPS_READ},
> +                                  {THROTTLE_BPS_TOTAL,
> +                                   THROTTLE_OPS_TOTAL,
> +                                   THROTTLE_BPS_WRITE,
> +                                   THROTTLE_OPS_WRITE}, };
> +    int64_t wait, max_wait = 0;
> +    int i;
> +
> +    for (i = 0; i < 4; i++) {
> +        BucketType index = to_check[is_write][i];
> +        wait = throttle_compute_wait(&ts->cfg.buckets[index]);
> +        if (wait > max_wait) {
> +            max_wait = wait;
> +        }
> +    }
> +
> +    return max_wait;
> +}
> +
> +/* compute the timer for this type of operation
> + *
> + * @is_write:   the type of operation
> + * @now:        the current clock timerstamp

s/timerstamp/timestamp/

> + * @next_timer: the resulting timer

A bit confusing with the name and description. This is actually a computed
timestamp for next triggering of the timer, right?

> + * @ret:        true if a timer must be set
> + */
> +bool throttle_compute_timer(ThrottleState *ts,
> +                            bool is_write,
> +                            int64_t now,
> +                            int64_t *next_timer)
> +{
> +    int64_t wait;
> +
> +    /* leak proportionally to the time elapsed */
> +    throttle_do_leak(ts, now);
> +
> +    /* compute the wait time if any */
> +    wait = throttle_compute_wait_for(ts, is_write, now);
> +
> +    /* if the code must wait compute when the next timer should fire */
> +    if (wait) {
> +        *next_timer = now + wait;
> +        return true;
> +    }
> +
> +    /* else no need to wait at all */
> +    *next_timer = now;
> +    return false;
> +}
> +
> +/* To be called first on the ThrottleState */
> +void throttle_init(ThrottleState *ts,
> +                   QEMUClock *clock,
> +                   void (read_timer)(void *),

There's a type name for this:

                      QEMUTimerCB * read_timer_cb,

> +                   void (write_timer)(void *),

                      QEMUTimerCB * write_timer_cb,

> +                   void *timer_opaque)
> +{
> +    memset(ts, 0, sizeof(ThrottleState));
> +
> +    ts->clock = clock;
> +    ts->timers[0] = qemu_new_timer_ns(ts->clock, read_timer, timer_opaque);
> +    ts->timers[1] = qemu_new_timer_ns(ts->clock, write_timer, timer_opaque);
> +}
> +
> +/* destroy a timer */
> +static void throttle_timer_destroy(QEMUTimer **timer)
> +{
> +    assert(*timer != NULL);
> +
> +    if (qemu_timer_pending(*timer)) {
> +        qemu_del_timer(*timer);
> +    }
> +
> +    qemu_free_timer(*timer);
> +    *timer = NULL;
> +}
> +
> +/* To be called last on the ThrottleState */
> +void throttle_destroy(ThrottleState *ts)
> +{
> +    int i;
> +
> +    for (i = 0; i < 2; i++) {
> +        throttle_timer_destroy(&ts->timers[i]);
> +    }
> +}
> +
> +/* is any throttling timer configured */
> +bool throttle_have_timer(ThrottleState *ts)
> +{
> +    if (ts->timers[0]) {
> +        return true;
> +    }
> +
> +    return false;
> +}
> +
> +/* Does any throttling must be done
> + *
> + * @cfg: the throttling configuration to inspect
> + * @ret: true if throttling must be done else false
> + */
> +bool throttle_enabled(ThrottleConfig *cfg)
> +{
> +    int i;
> +
> +    for (i = 0; i < BUCKETS_COUNT; i++) {
> +        if (cfg->buckets[i].ups > 0) {
> +            return true;
> +        }
> +    }
> +
> +    return false;
> +}
> +
> +/* return true if any two throttling parameters conflicts
> + *
> + * @cfg: the throttling configuration to inspect
> + * @ret: true if any conflict detected else false
> + */
> +bool throttle_conflicting(ThrottleConfig *cfg)
> +{
> +    bool bps_flag, ops_flag;
> +    bool bps_max_flag, ops_max_flag;
> +
> +    bps_flag = cfg->buckets[THROTTLE_BPS_TOTAL].ups &&
> +               (cfg->buckets[THROTTLE_BPS_READ].ups ||
> +                cfg->buckets[THROTTLE_BPS_WRITE].ups);
> +
> +    ops_flag = cfg->buckets[THROTTLE_OPS_TOTAL].ups &&
> +               (cfg->buckets[THROTTLE_OPS_READ].ups ||
> +                cfg->buckets[THROTTLE_OPS_WRITE].ups);
> +
> +    bps_max_flag = cfg->buckets[THROTTLE_BPS_TOTAL].max &&
> +                  (cfg->buckets[THROTTLE_BPS_READ].max  ||
> +                   cfg->buckets[THROTTLE_BPS_WRITE].max);
> +
> +    ops_max_flag = cfg->buckets[THROTTLE_OPS_TOTAL].max &&
> +                   (cfg->buckets[THROTTLE_OPS_READ].max ||
> +                   cfg->buckets[THROTTLE_OPS_WRITE].max);
> +
> +    return bps_flag || ops_flag || bps_max_flag || ops_max_flag;
> +}
> +
> +/* check if a throttling configuration is valid
> + * @cfg: the throttling configuration to inspect
> + * @ret: true if valid else false
> + */
> +bool throttle_is_valid(ThrottleConfig *cfg)
> +{
> +    bool invalid = false;
> +    int i;
> +
> +    for (i = 0; i < BUCKETS_COUNT; i++) {
> +        if (cfg->buckets[i].ups < 0) {
> +            invalid = true;
> +        }
> +    }
> +
> +    for (i = 0; i < BUCKETS_COUNT; i++) {
> +        if (cfg->buckets[i].max < 0) {
> +            invalid = true;
> +        }
> +    }
> +
> +    return !invalid;
> +}
> +
> +/* fix bucket parameters */
> +static void throttle_fix_bucket(LeakyBucket *bkt)
> +{
> +    double min = bkt->ups / 10;
> +    /* zero bucket level */
> +    bkt->bucket = 0;
> +
> +    /* take care of not using cpu and also improve throttling precision */
> +    if (bkt->ups &&
> +        bkt->max < min) {
> +        bkt->max = min;
> +    }
> +}
> +
> +/* take care of canceling a timer */
> +static void throttle_cancel_timer(QEMUTimer *timer)
> +{
> +    assert(timer != NULL);
> +    if (!qemu_timer_pending(timer)) {
> +        return;
> +    }
> +
> +    qemu_del_timer(timer);
> +}
> +
> +/* Used to configure the throttle
> + *
> + * @ts: the throttle state we are working on
> + * @cfg: the config to set
> + */
> +void throttle_config(ThrottleState *ts, ThrottleConfig *cfg)
> +{
> +    int i;
> +
> +    ts->cfg = *cfg;
> +
> +    for (i = 0; i < BUCKETS_COUNT; i++) {
> +        throttle_fix_bucket(&ts->cfg.buckets[i]);
> +    }
> +
> +    ts->previous_leak = qemu_get_clock_ns(ts->clock);
> +
> +    for (i = 0; i < 2; i++) {
> +        throttle_cancel_timer(ts->timers[i]);
> +    }
> +}
> +
> +/* used to get config
> + *
> + * @ts: the throttle state we are working on
> + * @cfg: where to write the config
> + */
> +void throttle_get_config(ThrottleState *ts, ThrottleConfig *cfg)
> +{
> +    *cfg = ts->cfg;

Any reason to copy the structure, instead of using the reference?

> +}
> +
> +
> +/* compute if an operation must be allowed and set a timer if not
> + *
> + * NOTE: this function is not unit tested due to it's usage of qemu_mod_timer
> + *
> + * @is_write: the type of operation (read/write)
> + * @ret:      true if the operation is allowed to flow else if must wait
> + */
> +bool throttle_allowed(ThrottleState *ts, bool is_write)

The function name sounds like a immutable check operation, but it actually
reschedules timer. This is not a good style.

> +{
> +    int64_t now = qemu_get_clock_ns(ts->clock);
> +    int64_t next_timer;
> +    bool must_wait;
> +
> +    must_wait = throttle_compute_timer(ts,
> +                                       is_write,
> +                                       now,
> +                                       &next_timer);
> +
> +    /* if the request is throttled arm timer */
> +    if (must_wait) {
> +        qemu_mod_timer(ts->timers[is_write], next_timer);
> +    }
> +
> +    return !must_wait;
> +}
> +
> +/* do the accounting for this operation
> + *
> + * @is_write: the type of operation (read/write)
> + * size:      the size of the operation
> + */
> +void throttle_account(ThrottleState *ts, bool is_write, uint64_t size)
> +{
> +    double bytes_size;
> +
> +    /* if cfg.op_size is not defined we will acccount exactly 1 operation */
> +    double units = 1.0;
> +    if (ts->cfg.op_size) {
> +        units = (double) size / ts->cfg.op_size;
> +    }
> +
> +    bytes_size = size * ts->cfg.unit_size;
> +
> +    ts->cfg.buckets[THROTTLE_BPS_TOTAL].bucket += bytes_size;
> +    ts->cfg.buckets[THROTTLE_OPS_TOTAL].bucket += units;
> +
> +    if (is_write) {
> +        ts->cfg.buckets[THROTTLE_BPS_WRITE].bucket += bytes_size;
> +        ts->cfg.buckets[THROTTLE_OPS_WRITE].bucket += units;
> +    } else {
> +        ts->cfg.buckets[THROTTLE_BPS_READ].bucket += bytes_size;
> +        ts->cfg.buckets[THROTTLE_OPS_READ].bucket += units;
> +    }
> +}
> +
> -- 
> 1.7.10.4
> 
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH V5 1/5] throttle: Add a new throttling API implementing continuous leaky bucket.
  2013-08-12 16:53 ` [Qemu-devel] [PATCH V5 1/5] throttle: Add a new throttling API implementing continuous leaky bucket Benoît Canet
  2013-08-14  8:52   ` Fam Zheng
@ 2013-08-16 11:45   ` Stefan Hajnoczi
  2013-08-19 11:27     ` Paolo Bonzini
  1 sibling, 1 reply; 17+ messages in thread
From: Stefan Hajnoczi @ 2013-08-16 11:45 UTC (permalink / raw)
  To: Benoît Canet; +Cc: kwolf, pbonzini, qemu-devel, stefanha

On Mon, Aug 12, 2013 at 06:53:12PM +0200, Benoît Canet wrote:
> +#ifndef THROTTLING_H
> +#define THROTTLING_H

THROTTLE_H

> +
> +#include <stdint.h>
> +#include "qemu-common.h"
> +#include "qemu/timer.h"
> +
> +#define NANOSECONDS_PER_SECOND  1000000000.0
> +
> +#define BUCKETS_COUNT 6
> +
> +typedef enum {
> +    THROTTLE_BPS_TOTAL = 0,
> +    THROTTLE_BPS_READ  = 1,
> +    THROTTLE_BPS_WRITE = 2,
> +    THROTTLE_OPS_TOTAL = 3,
> +    THROTTLE_OPS_READ  = 4,
> +    THROTTLE_OPS_WRITE = 5,
> +} BucketType;
> +
> +typedef struct LeakyBucket {
> +    double  ups;            /* units per second */
> +    double  max;            /* leaky bucket max in units */
> +    double  bucket;         /* bucket in units */

These comments aren't very clear to me :).  So I guess bps or iops would
be in ups.  Max would be the total budget or maximum burst.  Bucket
might be the current level.

> +} LeakyBucket;
> +
> +/* The following structure is used to configure a ThrottleState
> + * It contains a bit of state: the bucket field of the LeakyBucket structure.
> + * However it allows to keep the code clean and the bucket field is reset to
> + * zero at the right time.
> + */
> +typedef struct ThrottleConfig {
> +    LeakyBucket buckets[6]; /* leaky buckets */

s/6/THROTTLE_TYPE_MAX/

> +    uint64_t unit_size;     /* size of an unit in bytes */
> +    uint64_t op_size;       /* size of an operation in units */

It's not clear yet why we need both unit_size *and* op_size.  I thought
you would have a single granularity field for accounting big requests as
multiple iops.

> +/* This function make a bucket leak
> + *
> + * @bkt:   the bucket to make leak
> + * @delta: the time delta

delta is in nanoseconds.  Probably best to call it delta_ns.

> +/* destroy a timer */
> +static void throttle_timer_destroy(QEMUTimer **timer)
> +{
> +    assert(*timer != NULL);
> +
> +    if (qemu_timer_pending(*timer)) {
> +        qemu_del_timer(*timer);
> +    }

You can always call qemu_del_timer(), the timer doesn't need to be pending.

> +/* fix bucket parameters */
> +static void throttle_fix_bucket(LeakyBucket *bkt)
> +{
> +    double min = bkt->ups / 10;
> +    /* zero bucket level */
> +    bkt->bucket = 0;
> +
> +    /* take care of not using cpu and also improve throttling precision */
> +    if (bkt->ups &&
> +        bkt->max < min) {
> +        bkt->max = min;
> +    }
> +}

This function seems like magic.  What is really going on here?  Why
divide by 10 and when does this case happen?

> +
> +/* take care of canceling a timer */
> +static void throttle_cancel_timer(QEMUTimer *timer)
> +{
> +    assert(timer != NULL);
> +    if (!qemu_timer_pending(timer)) {
> +        return;
> +    }

No need to check pending first.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH V5 1/5] throttle: Add a new throttling API implementing continuous leaky bucket.
  2013-08-16 11:45   ` Stefan Hajnoczi
@ 2013-08-19 11:27     ` Paolo Bonzini
  0 siblings, 0 replies; 17+ messages in thread
From: Paolo Bonzini @ 2013-08-19 11:27 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: kwolf, Benoît Canet, stefanha, qemu-devel

Il 16/08/2013 13:45, Stefan Hajnoczi ha scritto:
>> > +#define BUCKETS_COUNT 6
>> > +
>> > +typedef enum {
>> > +    THROTTLE_BPS_TOTAL = 0,
>> > +    THROTTLE_BPS_READ  = 1,
>> > +    THROTTLE_BPS_WRITE = 2,
>> > +    THROTTLE_OPS_TOTAL = 3,
>> > +    THROTTLE_OPS_READ  = 4,
>> > +    THROTTLE_OPS_WRITE = 5,
>> > +} BucketType;

Please remove the "= N" from the enums, and add BUCKETS_COUNT here.

>> > +typedef struct LeakyBucket {
>> > +    double  ups;            /* units per second */
>> > +    double  max;            /* leaky bucket max in units */
>> > +    double  bucket;         /* bucket in units */
> These comments aren't very clear to me :).  So I guess bps or iops would
> be in ups.  Max would be the total budget or maximum burst.  Bucket
> might be the current level.

I also suggest replacing "ups" with "avg", since it's the average
throughput that the leaky bucket allows after the initial burst has
emptied the bucket.

>> +    uint64_t unit_size;     /* size of an unit in bytes */
>> +    uint64_t op_size;       /* size of an operation in units */
> 
> It's not clear yet why we need both unit_size *and* op_size.  I thought
> you would have a single granularity field for accounting big requests as
> multiple iops.

IIUC the ops buckets account operations in op_size / unit_size units,
while the bps buckets account operations in 1 / unit_size units, or
something like that.  But it needs clarification indeed.

Paolo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH V5 2/5] throttle: Add units tests
  2013-08-12 16:53 [Qemu-devel] [PATCH V5 0/5] Continuous Leaky Bucket Throttling Benoît Canet
  2013-08-12 16:53 ` [Qemu-devel] [PATCH V5 1/5] throttle: Add a new throttling API implementing continuous leaky bucket Benoît Canet
@ 2013-08-12 16:53 ` Benoît Canet
  2013-08-12 16:53 ` [Qemu-devel] [PATCH V5 3/5] block: Enable the new throttling code in the block layer Benoît Canet
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 17+ messages in thread
From: Benoît Canet @ 2013-08-12 16:53 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 tests/Makefile        |    2 +
 tests/test-throttle.c |  494 +++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 496 insertions(+)
 create mode 100644 tests/test-throttle.c

diff --git a/tests/Makefile b/tests/Makefile
index d044908..fb1e06a 100644
--- a/tests/Makefile
+++ b/tests/Makefile
@@ -29,6 +29,7 @@ check-unit-y += tests/test-visitor-serialization$(EXESUF)
 check-unit-y += tests/test-iov$(EXESUF)
 gcov-files-test-iov-y = util/iov.c
 check-unit-y += tests/test-aio$(EXESUF)
+check-unit-y += tests/test-throttle$(EXESUF)
 gcov-files-test-aio-$(CONFIG_WIN32) = aio-win32.c
 gcov-files-test-aio-$(CONFIG_POSIX) = aio-posix.c
 check-unit-y += tests/test-thread-pool$(EXESUF)
@@ -116,6 +117,7 @@ tests/check-qfloat$(EXESUF): tests/check-qfloat.o libqemuutil.a
 tests/check-qjson$(EXESUF): tests/check-qjson.o libqemuutil.a libqemustub.a
 tests/test-coroutine$(EXESUF): tests/test-coroutine.o $(block-obj-y) libqemuutil.a libqemustub.a
 tests/test-aio$(EXESUF): tests/test-aio.o $(block-obj-y) libqemuutil.a libqemustub.a
+tests/test-throttle$(EXESUF): tests/test-throttle.o $(block-obj-y) libqemuutil.a libqemustub.a
 tests/test-thread-pool$(EXESUF): tests/test-thread-pool.o $(block-obj-y) libqemuutil.a libqemustub.a
 tests/test-iov$(EXESUF): tests/test-iov.o libqemuutil.a
 tests/test-hbitmap$(EXESUF): tests/test-hbitmap.o libqemuutil.a libqemustub.a
diff --git a/tests/test-throttle.c b/tests/test-throttle.c
new file mode 100644
index 0000000..c687f5f
--- /dev/null
+++ b/tests/test-throttle.c
@@ -0,0 +1,494 @@
+/*
+ * Throttle infrastructure tests
+ *
+ * Copyright Nodalink, SARL. 2013
+ *
+ * Authors:
+ *  Benoît Canet     <benoit.canet@irqsave.net>
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ */
+
+#include <glib.h>
+#include <math.h>
+#include "qemu/throttle.h"
+
+LeakyBucket    bkt;
+ThrottleConfig cfg;
+ThrottleState  ts;
+
+/* usefull function */
+static bool double_cmp(double x, double y)
+{
+    return fabsl(x - y) < 1e-6;
+}
+
+/* tests for single bucket operations */
+static void test_leak_bucket(void)
+{
+    /* set initial value */
+    bkt.ups = 150;
+    bkt.max = 15;
+    bkt.bucket = 1.5;
+
+    /* leak an op work of time */
+    throttle_leak_bucket(&bkt, NANOSECONDS_PER_SECOND / 150);
+    g_assert(bkt.ups == 150);
+    g_assert(bkt.max == 15);
+    g_assert(double_cmp(bkt.bucket, 0.5));
+
+    /* leak again emptying the bucket */
+    throttle_leak_bucket(&bkt, NANOSECONDS_PER_SECOND / 150);
+    g_assert(bkt.ups == 150);
+    g_assert(bkt.max == 15);
+    g_assert(double_cmp(bkt.bucket, 0));
+
+    /* check that the bucket level won't go lower */
+    throttle_leak_bucket(&bkt, NANOSECONDS_PER_SECOND / 150);
+    g_assert(bkt.ups == 150);
+    g_assert(bkt.max == 15);
+    g_assert(double_cmp(bkt.bucket, 0));
+}
+
+static void test_compute_wait(void)
+{
+    int64_t wait;
+    int64_t result;
+
+    /* no operation limit set */
+    bkt.ups = 0;
+    bkt.max = 15;
+    bkt.bucket = 1.5;
+    wait = throttle_compute_wait(&bkt);
+    g_assert(!wait);
+
+    /* zero delta */
+    bkt.ups = 150;
+    bkt.max = 15;
+    bkt.bucket = 15;
+    wait = throttle_compute_wait(&bkt);
+    g_assert(!wait);
+
+    /* below zero delta */
+    bkt.ups = 150;
+    bkt.max = 15;
+    bkt.bucket = 9;
+    wait = throttle_compute_wait(&bkt);
+    g_assert(!wait);
+
+    /* half an operation above max */
+    bkt.ups = 150;
+    bkt.max = 15;
+    bkt.bucket = 15.5;
+    wait = throttle_compute_wait(&bkt);
+    /* time required to do half an operation */
+    result = (int64_t)  NANOSECONDS_PER_SECOND / 150 / 2;
+    g_assert(wait == result);
+}
+
+/* functions to test ThrottleState initialization/destroy methods */
+static void read_timer_cb(void *opaque)
+{
+}
+
+static void write_timer_cb(void *opaque)
+{
+}
+
+static void test_init(void)
+{
+    int i;
+
+    /* fill the structure with crap */
+    memset(&ts, 1, sizeof(ts));
+
+    /* init the structure */
+    throttle_init(&ts, vm_clock, read_timer_cb, write_timer_cb, &ts);
+
+    /* check initialized fields */
+    g_assert(ts.clock == vm_clock);
+    g_assert(ts.timers[0]);
+    g_assert(ts.timers[1]);
+
+    /* check other fields where cleared */
+    g_assert(!ts.previous_leak);
+    g_assert(!ts.cfg.op_size);
+    g_assert(!ts.cfg.unit_size);
+    for (i = 0; i < BUCKETS_COUNT; i++) {
+        g_assert(!ts.cfg.buckets[i].ups);
+        g_assert(!ts.cfg.buckets[i].max);
+        g_assert(!ts.cfg.buckets[i].bucket);
+    }
+
+    throttle_destroy(&ts);
+}
+
+static void test_destroy(void)
+{
+    int i;
+    throttle_init(&ts, vm_clock, read_timer_cb, write_timer_cb, &ts);
+    throttle_destroy(&ts);
+    for (i = 0; i < 2; i++) {
+        g_assert(!ts.timers[i]);
+    }
+}
+
+/* function to test throttle_config and throttle_get_config */
+static void test_config_functions(void)
+{
+    int i;
+    ThrottleConfig orig_cfg, final_cfg;
+
+    orig_cfg.buckets[THROTTLE_BPS_TOTAL].ups = 153;
+    orig_cfg.buckets[THROTTLE_BPS_READ].ups  = 56;
+    orig_cfg.buckets[THROTTLE_BPS_WRITE].ups = 1;
+
+    orig_cfg.buckets[THROTTLE_OPS_TOTAL].ups = 150;
+    orig_cfg.buckets[THROTTLE_OPS_READ].ups  = 69;
+    orig_cfg.buckets[THROTTLE_OPS_WRITE].ups = 23;
+
+    orig_cfg.buckets[THROTTLE_BPS_TOTAL].max = 0; /* should be corrected */
+    orig_cfg.buckets[THROTTLE_BPS_READ].max  = 1; /* should be corrected */
+    orig_cfg.buckets[THROTTLE_BPS_WRITE].max = 120;
+
+    orig_cfg.buckets[THROTTLE_OPS_TOTAL].max = 150;
+    orig_cfg.buckets[THROTTLE_OPS_READ].max  = 400;
+    orig_cfg.buckets[THROTTLE_OPS_WRITE].max = 500;
+
+    orig_cfg.buckets[THROTTLE_BPS_TOTAL].bucket = 45;
+    orig_cfg.buckets[THROTTLE_BPS_READ].bucket  = 65;
+    orig_cfg.buckets[THROTTLE_BPS_WRITE].bucket = 23;
+
+    orig_cfg.buckets[THROTTLE_OPS_TOTAL].bucket = 1;
+    orig_cfg.buckets[THROTTLE_OPS_READ].bucket  = 90;
+    orig_cfg.buckets[THROTTLE_OPS_WRITE].bucket = 75;
+
+    orig_cfg.unit_size = 333;
+    orig_cfg.op_size = 1;
+
+    throttle_init(&ts, vm_clock, read_timer_cb, write_timer_cb, &ts);
+    /* structure reset by throttle_init previous_leak should be null */
+    g_assert(!ts.previous_leak);
+    throttle_config(&ts, &orig_cfg);
+
+    /* has previous leak been initialized by throttle_config ? */
+    g_assert(ts.previous_leak);
+
+    /* get back the fixed configuration */
+    throttle_get_config(&ts, &final_cfg);
+    throttle_destroy(&ts);
+
+    g_assert(final_cfg.buckets[THROTTLE_BPS_TOTAL].ups == 153);
+    g_assert(final_cfg.buckets[THROTTLE_BPS_READ].ups  == 56);
+    g_assert(final_cfg.buckets[THROTTLE_BPS_WRITE].ups == 1);
+
+    g_assert(final_cfg.buckets[THROTTLE_OPS_TOTAL].ups == 150);
+    g_assert(final_cfg.buckets[THROTTLE_OPS_READ].ups  == 69);
+    g_assert(final_cfg.buckets[THROTTLE_OPS_WRITE].ups == 23);
+
+    g_assert(final_cfg.buckets[THROTTLE_BPS_TOTAL].max == 15.3); /* fixed */
+    g_assert(final_cfg.buckets[THROTTLE_BPS_READ].max  == 5.6);  /* fixed */
+    g_assert(final_cfg.buckets[THROTTLE_BPS_WRITE].max == 120);
+
+    g_assert(final_cfg.buckets[THROTTLE_OPS_TOTAL].max == 150);
+    g_assert(final_cfg.buckets[THROTTLE_OPS_READ].max  == 400);
+    g_assert(final_cfg.buckets[THROTTLE_OPS_WRITE].max == 500);
+
+    g_assert(final_cfg.unit_size == 333);
+    g_assert(final_cfg.op_size == 1);
+
+    /* check bucket have been cleared */
+    for (i = 0; i < BUCKETS_COUNT; i++) {
+        g_assert(!final_cfg.buckets[i].bucket);
+    }
+
+
+}
+
+/* functions to test is throttle is enabled by a config */
+static void set_cfg_value(bool is_max, int index, int value)
+{
+    if (is_max) {
+        cfg.buckets[index].max = value;
+    } else {
+        cfg.buckets[index].ups = value;
+    }
+}
+
+static void test_enabled(void)
+{
+    int i;
+
+    memset(&cfg, 0, sizeof(cfg));
+    g_assert(!throttle_enabled(&cfg));
+
+    for (i = 0; i < BUCKETS_COUNT; i++) {
+        memset(&cfg, 0, sizeof(cfg));
+        set_cfg_value(false, i, 150);
+        g_assert(throttle_enabled(&cfg));
+    }
+
+    for (i = 0; i < BUCKETS_COUNT; i++) {
+        memset(&cfg, 0, sizeof(cfg));
+        set_cfg_value(false, i, -150);
+        g_assert(!throttle_enabled(&cfg));
+    }
+}
+
+/* tests functions for throttle_conflicting */
+
+static void test_conflicts_for_one_set(bool is_max,
+                                       int total,
+                                       int read,
+                                       int write)
+{
+    memset(&cfg, 0, sizeof(cfg));
+    g_assert(!throttle_conflicting(&cfg));
+
+    set_cfg_value(is_max, total, 1);
+    set_cfg_value(is_max, read,  1);
+    g_assert(throttle_conflicting(&cfg));
+
+    memset(&cfg, 0, sizeof(cfg));
+    set_cfg_value(is_max, total, 1);
+    set_cfg_value(is_max, write, 1);
+    g_assert(throttle_conflicting(&cfg));
+
+    memset(&cfg, 0, sizeof(cfg));
+    set_cfg_value(is_max, total, 1);
+    set_cfg_value(is_max, read,  1);
+    set_cfg_value(is_max, write, 1);
+    g_assert(throttle_conflicting(&cfg));
+
+    memset(&cfg, 0, sizeof(cfg));
+    set_cfg_value(is_max, total, 1);
+    g_assert(!throttle_conflicting(&cfg));
+
+    memset(&cfg, 0, sizeof(cfg));
+    set_cfg_value(is_max, read,  1);
+    set_cfg_value(is_max, write, 1);
+    g_assert(!throttle_conflicting(&cfg));
+}
+
+static void test_conflicting_config(void)
+{
+    /* bps average conflicts */
+    test_conflicts_for_one_set(false,
+                               THROTTLE_BPS_TOTAL,
+                               THROTTLE_BPS_READ,
+                               THROTTLE_BPS_WRITE);
+
+    /* ops average conflicts */
+    test_conflicts_for_one_set(false,
+                               THROTTLE_OPS_TOTAL,
+                               THROTTLE_OPS_READ,
+                               THROTTLE_OPS_WRITE);
+
+    /* bps average conflicts */
+    test_conflicts_for_one_set(true,
+                               THROTTLE_BPS_TOTAL,
+                               THROTTLE_BPS_READ,
+                               THROTTLE_BPS_WRITE);
+    /* ops average conflicts */
+    test_conflicts_for_one_set(true,
+                               THROTTLE_OPS_TOTAL,
+                               THROTTLE_OPS_READ,
+                               THROTTLE_OPS_WRITE);
+}
+/* functions to test the throttle_is_valid function */
+static void test_is_valid_for_value(int value, bool should_be_valid)
+{
+    int is_max, index;
+    for (is_max = 0; is_max < 2; is_max++) {
+        for (index = 0; index < BUCKETS_COUNT; index++) {
+            memset(&cfg, 0, sizeof(cfg));
+            set_cfg_value(is_max, index, value);
+            g_assert(throttle_is_valid(&cfg) == should_be_valid);
+        }
+    }
+}
+
+static void test_is_valid(void)
+{
+    /* negative number are invalid */
+    test_is_valid_for_value(-1, false);
+    /* zero are valids */
+    test_is_valid_for_value(0, true);
+    /* positives numers are valids */
+    test_is_valid_for_value(1, true);
+}
+
+static void test_have_timer(void)
+{
+    /* zero the structure */
+    memset(&ts, 0, sizeof(ts));
+
+    /* no timer set shoudl return false */
+    g_assert(!throttle_have_timer(&ts));
+
+    /* init the structure */
+    throttle_init(&ts, vm_clock, read_timer_cb, write_timer_cb, &ts);
+
+    /* timer set by init should return true */
+    g_assert(throttle_have_timer(&ts));
+
+    throttle_destroy(&ts);
+}
+
+static bool do_test_accounting(bool is_ops, /* are we testing bps or ops */
+                int size,                   /* size of the operation to do */
+                double ups,                 /* io limit */
+                uint64_t unit_size,         /* the byte size of the unit */
+                uint64_t op_size,           /* ideal size of an io */
+                double total_result,
+                double read_result,
+                double write_result)
+{
+    BucketType to_test[2][3] = { { THROTTLE_BPS_TOTAL,
+                                   THROTTLE_BPS_READ,
+                                   THROTTLE_BPS_WRITE, },
+                                 { THROTTLE_OPS_TOTAL,
+                                   THROTTLE_OPS_READ,
+                                   THROTTLE_OPS_WRITE, } };
+    ThrottleConfig cfg;
+    BucketType index;
+    int i;
+
+    for (i = 0; i < 3; i++) {
+        BucketType index = to_test[is_ops][i];
+        cfg.buckets[index].ups = ups;
+    }
+
+    cfg.unit_size = unit_size;
+    cfg.op_size = op_size;
+
+    throttle_init(&ts, vm_clock, read_timer_cb, write_timer_cb, &ts);
+    throttle_config(&ts, &cfg);
+
+    /* account a read */
+    throttle_account(&ts, false, size);
+    /* account a write */
+    throttle_account(&ts, true, size);
+
+    /* check total result */
+    index = to_test[is_ops][0];
+    if (!double_cmp(ts.cfg.buckets[index].bucket, total_result)) {
+        return false;
+    }
+
+    /* check read result */
+    index = to_test[is_ops][1];
+    if (!double_cmp(ts.cfg.buckets[index].bucket, read_result)) {
+        return false;
+    }
+
+    /* check write result */
+    index = to_test[is_ops][2];
+    if (!double_cmp(ts.cfg.buckets[index].bucket, write_result)) {
+        return false;
+    }
+
+    throttle_destroy(&ts);
+
+    return true;
+}
+
+static void test_accounting(void)
+{
+    /* tests for bps */
+
+    /* op of size 1 */
+    g_assert(do_test_accounting(false,
+                                1,
+                                150,
+                                512,
+                                0,
+                                1024,
+                                512,
+                                512));
+
+    /* op of size 2 */
+    g_assert(do_test_accounting(false,
+                                2,
+                                150,
+                                512,
+                                0,
+                                2048,
+                                1024,
+                                1024));
+
+    /* op of size 2 and orthogonal parameter change */
+    g_assert(do_test_accounting(false,
+                                2,
+                                150,
+                                512,
+                                17,
+                                2048,
+                                1024,
+                                1024));
+
+
+    /* tests for ops */
+
+    /* op of size 1 */
+    g_assert(do_test_accounting(true,
+                                1,
+                                150,
+                                512,
+                                0,
+                                2,
+                                1,
+                                1));
+
+    /* op of size 2 */
+    g_assert(do_test_accounting(true,
+                                2,
+                                150,
+                                512,
+                                0,
+                                2,
+                                1,
+                                1));
+
+    /* jumbo op accounting fragmentation : size 64 with op size of 13 units */
+    g_assert(do_test_accounting(true,
+                                64,
+                                150,
+                                512,
+                                13,
+                                (64.0 * 2) / 13,
+                                (64.0 / 13),
+                                (64.0 / 13)));
+
+    /* same with orthogonal parameters changes */
+    g_assert(do_test_accounting(true,
+                                64,
+                                300,
+                                3433,
+                                13,
+                                (64.0 * 2) / 13,
+                                (64.0 / 13),
+                                (64.0 / 13)));
+}
+
+int main(int argc, char **argv)
+{
+    init_clocks();
+    do {} while (g_main_context_iteration(NULL, false));
+
+    /* tests in the same order as the header function declarations */
+    g_test_init(&argc, &argv, NULL);
+    g_test_add_func("/throttle/leak_bucket",        test_leak_bucket);
+    g_test_add_func("/throttle/compute_wait",       test_compute_wait);
+    g_test_add_func("/throttle/init",               test_init);
+    g_test_add_func("/throttle/destroy",            test_destroy);
+    g_test_add_func("/throttle/have_timer",         test_have_timer);
+    g_test_add_func("/throttle/config/enabled",     test_enabled);
+    g_test_add_func("/throttle/config/conflicting", test_conflicting_config);
+    g_test_add_func("/throttle/config/is_valid",    test_is_valid);
+    g_test_add_func("/throttle/config_functions",   test_config_functions);
+    g_test_add_func("/throttle/accounting",         test_accounting);
+    return g_test_run();
+}
+
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH V5 3/5] block: Enable the new throttling code in the block layer.
  2013-08-12 16:53 [Qemu-devel] [PATCH V5 0/5] Continuous Leaky Bucket Throttling Benoît Canet
  2013-08-12 16:53 ` [Qemu-devel] [PATCH V5 1/5] throttle: Add a new throttling API implementing continuous leaky bucket Benoît Canet
  2013-08-12 16:53 ` [Qemu-devel] [PATCH V5 2/5] throttle: Add units tests Benoît Canet
@ 2013-08-12 16:53 ` Benoît Canet
  2013-08-14  9:31   ` Fam Zheng
  2013-08-16 12:02   ` Stefan Hajnoczi
  2013-08-12 16:53 ` [Qemu-devel] [PATCH V5 4/5] block: Add support for throttling burst max in QMP and the command line Benoît Canet
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 17+ messages in thread
From: Benoît Canet @ 2013-08-12 16:53 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block.c                   |  349 ++++++++++++++-------------------------------
 block/qapi.c              |   21 ++-
 blockdev.c                |  100 +++++++------
 include/block/block.h     |    1 -
 include/block/block_int.h |   32 +----
 5 files changed, 173 insertions(+), 330 deletions(-)

diff --git a/block.c b/block.c
index 01b66d8..d49bc98 100644
--- a/block.c
+++ b/block.c
@@ -86,13 +86,6 @@ static void coroutine_fn bdrv_co_do_rw(void *opaque);
 static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs,
     int64_t sector_num, int nb_sectors);
 
-static bool bdrv_exceed_bps_limits(BlockDriverState *bs, int nb_sectors,
-        bool is_write, double elapsed_time, uint64_t *wait);
-static bool bdrv_exceed_iops_limits(BlockDriverState *bs, bool is_write,
-        double elapsed_time, uint64_t *wait);
-static bool bdrv_exceed_io_limits(BlockDriverState *bs, int nb_sectors,
-        bool is_write, int64_t *wait);
-
 static QTAILQ_HEAD(, BlockDriverState) bdrv_states =
     QTAILQ_HEAD_INITIALIZER(bdrv_states);
 
@@ -123,70 +116,110 @@ int is_windows_drive(const char *filename)
 #endif
 
 /* throttling disk I/O limits */
-void bdrv_io_limits_disable(BlockDriverState *bs)
+void bdrv_set_io_limits(BlockDriverState *bs,
+                        ThrottleConfig *cfg)
 {
-    bs->io_limits_enabled = false;
+    int i;
+
+    throttle_config(&bs->throttle_state, cfg);
+
+    for (i = 0; i < 2; i++) {
+        qemu_co_enter_next(&bs->throttled_reqs[i]);
+    }
+}
+
+/* this function drain all the throttled IOs */
+static bool bdrv_drain_throttled(BlockDriverState *bs)
+{
+    bool drained = false;
+    bool enabled = bs->io_limits_enabled;
+    int i;
 
-    do {} while (qemu_co_enter_next(&bs->throttled_reqs));
+    bs->io_limits_enabled = false;
 
-    if (bs->block_timer) {
-        qemu_del_timer(bs->block_timer);
-        qemu_free_timer(bs->block_timer);
-        bs->block_timer = NULL;
+    for (i = 0; i < 2; i++) {
+        while (qemu_co_enter_next(&bs->throttled_reqs[i])) {
+            drained = true;
+        }
     }
 
-    bs->slice_start = 0;
-    bs->slice_end   = 0;
+    bs->io_limits_enabled = enabled;
+
+    return drained;
 }
 
-static void bdrv_block_timer(void *opaque)
+void bdrv_io_limits_disable(BlockDriverState *bs)
+{
+    bs->io_limits_enabled = false;
+
+    bdrv_drain_throttled(bs);
+
+    throttle_destroy(&bs->throttle_state);
+}
+
+static void bdrv_throttle_read_timer_cb(void *opaque)
 {
     BlockDriverState *bs = opaque;
+    qemu_co_enter_next(&bs->throttled_reqs[0]);
+}
 
-    qemu_co_enter_next(&bs->throttled_reqs);
+static void bdrv_throttle_write_timer_cb(void *opaque)
+{
+    BlockDriverState *bs = opaque;
+    qemu_co_enter_next(&bs->throttled_reqs[1]);
 }
 
+/* should be called before bdrv_set_io_limits if a limit is set */
 void bdrv_io_limits_enable(BlockDriverState *bs)
 {
-    qemu_co_queue_init(&bs->throttled_reqs);
-    bs->block_timer = qemu_new_timer_ns(vm_clock, bdrv_block_timer, bs);
+    throttle_init(&bs->throttle_state,
+                  vm_clock,
+                  bdrv_throttle_read_timer_cb,
+                  bdrv_throttle_write_timer_cb,
+                  bs);
+    qemu_co_queue_init(&bs->throttled_reqs[0]);
+    qemu_co_queue_init(&bs->throttled_reqs[1]);
     bs->io_limits_enabled = true;
 }
 
-bool bdrv_io_limits_enabled(BlockDriverState *bs)
+/* This function make an IO wait if needed
+ *
+ * @is_write:   is the IO a write
+ * @nb_sectors: the number of sectors of the IO
+ */
+static void bdrv_io_limits_intercept(BlockDriverState *bs,
+                                     bool is_write,
+                                     int nb_sectors)
 {
-    BlockIOLimit *io_limits = &bs->io_limits;
-    return io_limits->bps[BLOCK_IO_LIMIT_READ]
-         || io_limits->bps[BLOCK_IO_LIMIT_WRITE]
-         || io_limits->bps[BLOCK_IO_LIMIT_TOTAL]
-         || io_limits->iops[BLOCK_IO_LIMIT_READ]
-         || io_limits->iops[BLOCK_IO_LIMIT_WRITE]
-         || io_limits->iops[BLOCK_IO_LIMIT_TOTAL];
+    /* does this io must wait */
+    bool must_wait = !throttle_allowed(&bs->throttle_state, is_write);
+
+    /* if must wait or any request of this type throttled queue the IO */
+    if (must_wait ||
+        !qemu_co_queue_empty(&bs->throttled_reqs[is_write])) {
+        qemu_co_queue_wait(&bs->throttled_reqs[is_write]);
+    }
+
+    /* the IO will be executed do the accounting */
+    throttle_account(&bs->throttle_state, is_write, nb_sectors);
 }
 
-static void bdrv_io_limits_intercept(BlockDriverState *bs,
-                                     bool is_write, int nb_sectors)
+/* This function will schedule the next IO throttled IO of this type if needed
+ *
+ * @is_write: is the IO a write
+ */
+static void bdrv_io_limits_resched(BlockDriverState *bs, bool is_write)
 {
-    int64_t wait_time = -1;
 
-    if (!qemu_co_queue_empty(&bs->throttled_reqs)) {
-        qemu_co_queue_wait(&bs->throttled_reqs);
+    if (qemu_co_queue_empty(&bs->throttled_reqs[is_write])) {
+        return;
     }
 
-    /* In fact, we hope to keep each request's timing, in FIFO mode. The next
-     * throttled requests will not be dequeued until the current request is
-     * allowed to be serviced. So if the current request still exceeds the
-     * limits, it will be inserted to the head. All requests followed it will
-     * be still in throttled_reqs queue.
-     */
-
-    while (bdrv_exceed_io_limits(bs, nb_sectors, is_write, &wait_time)) {
-        qemu_mod_timer(bs->block_timer,
-                       wait_time + qemu_get_clock_ns(vm_clock));
-        qemu_co_queue_wait_insert_head(&bs->throttled_reqs);
+    if (!throttle_allowed(&bs->throttle_state, is_write)) {
+        return;
     }
 
-    qemu_co_queue_next(&bs->throttled_reqs);
+    qemu_co_queue_next(&bs->throttled_reqs[is_write]);
 }
 
 /* check if the path starts with "<protocol>:" */
@@ -1112,11 +1145,6 @@ int bdrv_open(BlockDriverState *bs, const char *filename, QDict *options,
         bdrv_dev_change_media_cb(bs, true);
     }
 
-    /* throttling disk I/O limits */
-    if (bs->io_limits_enabled) {
-        bdrv_io_limits_enable(bs);
-    }
-
     return 0;
 
 unlink_and_fail:
@@ -1452,7 +1480,7 @@ void bdrv_drain_all(void)
          * a busy wait.
          */
         QTAILQ_FOREACH(bs, &bdrv_states, list) {
-            while (qemu_co_enter_next(&bs->throttled_reqs)) {
+            if (bdrv_drain_throttled(bs)) {
                 busy = true;
             }
         }
@@ -1461,7 +1489,8 @@ void bdrv_drain_all(void)
     /* If requests are still pending there is a bug somewhere */
     QTAILQ_FOREACH(bs, &bdrv_states, list) {
         assert(QLIST_EMPTY(&bs->tracked_requests));
-        assert(qemu_co_queue_empty(&bs->throttled_reqs));
+        assert(qemu_co_queue_empty(&bs->throttled_reqs[0]));
+        assert(qemu_co_queue_empty(&bs->throttled_reqs[1]));
     }
 }
 
@@ -1497,13 +1526,12 @@ static void bdrv_move_feature_fields(BlockDriverState *bs_dest,
 
     bs_dest->enable_write_cache = bs_src->enable_write_cache;
 
-    /* i/o timing parameters */
-    bs_dest->slice_start        = bs_src->slice_start;
-    bs_dest->slice_end          = bs_src->slice_end;
-    bs_dest->slice_submitted    = bs_src->slice_submitted;
-    bs_dest->io_limits          = bs_src->io_limits;
-    bs_dest->throttled_reqs     = bs_src->throttled_reqs;
-    bs_dest->block_timer        = bs_src->block_timer;
+    /* i/o throttled req */
+    memcpy(&bs_dest->throttle_state,
+           &bs_src->throttle_state,
+           sizeof(ThrottleState));
+    bs_dest->throttled_reqs[0]  = bs_src->throttled_reqs[0];
+    bs_dest->throttled_reqs[1]  = bs_src->throttled_reqs[1];
     bs_dest->io_limits_enabled  = bs_src->io_limits_enabled;
 
     /* r/w error */
@@ -1550,7 +1578,7 @@ void bdrv_swap(BlockDriverState *bs_new, BlockDriverState *bs_old)
     assert(bs_new->dev == NULL);
     assert(bs_new->in_use == 0);
     assert(bs_new->io_limits_enabled == false);
-    assert(bs_new->block_timer == NULL);
+    assert(!throttle_have_timer(&bs_new->throttle_state));
 
     tmp = *bs_new;
     *bs_new = *bs_old;
@@ -1569,7 +1597,7 @@ void bdrv_swap(BlockDriverState *bs_new, BlockDriverState *bs_old)
     assert(bs_new->job == NULL);
     assert(bs_new->in_use == 0);
     assert(bs_new->io_limits_enabled == false);
-    assert(bs_new->block_timer == NULL);
+    assert(!throttle_have_timer(&bs_new->throttle_state));
 
     bdrv_rebind(bs_new);
     bdrv_rebind(bs_old);
@@ -1860,8 +1888,15 @@ int bdrv_commit_all(void)
  *
  * This function should be called when a tracked request is completing.
  */
-static void tracked_request_end(BdrvTrackedRequest *req)
+static void tracked_request_end(BlockDriverState *bs,
+                                BdrvTrackedRequest *req,
+                                bool is_write)
 {
+    /* throttling disk I/O */
+    if (bs->io_limits_enabled) {
+        bdrv_io_limits_resched(bs, is_write);
+    }
+
     QLIST_REMOVE(req, list);
     qemu_co_queue_restart_all(&req->wait_queue);
 }
@@ -1874,6 +1909,11 @@ static void tracked_request_begin(BdrvTrackedRequest *req,
                                   int64_t sector_num,
                                   int nb_sectors, bool is_write)
 {
+    /* throttling disk I/O */
+    if (bs->io_limits_enabled) {
+        bdrv_io_limits_intercept(bs, is_write, nb_sectors);
+    }
+
     *req = (BdrvTrackedRequest){
         .bs = bs,
         .sector_num = sector_num,
@@ -2512,11 +2552,6 @@ static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs,
         return -EIO;
     }
 
-    /* throttling disk read I/O */
-    if (bs->io_limits_enabled) {
-        bdrv_io_limits_intercept(bs, false, nb_sectors);
-    }
-
     if (bs->copy_on_read) {
         flags |= BDRV_REQ_COPY_ON_READ;
     }
@@ -2547,7 +2582,7 @@ static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs,
     ret = drv->bdrv_co_readv(bs, sector_num, nb_sectors, qiov);
 
 out:
-    tracked_request_end(&req);
+    tracked_request_end(bs, &req, false);
 
     if (flags & BDRV_REQ_COPY_ON_READ) {
         bs->copy_on_read_in_flight--;
@@ -2625,11 +2660,6 @@ static int coroutine_fn bdrv_co_do_writev(BlockDriverState *bs,
         return -EIO;
     }
 
-    /* throttling disk write I/O */
-    if (bs->io_limits_enabled) {
-        bdrv_io_limits_intercept(bs, true, nb_sectors);
-    }
-
     if (bs->copy_on_read_in_flight) {
         wait_for_overlapping_requests(bs, sector_num, nb_sectors);
     }
@@ -2658,7 +2688,7 @@ static int coroutine_fn bdrv_co_do_writev(BlockDriverState *bs,
         bs->wr_highest_sector = sector_num + nb_sectors - 1;
     }
 
-    tracked_request_end(&req);
+    tracked_request_end(bs, &req, true);
 
     return ret;
 }
@@ -2751,14 +2781,6 @@ void bdrv_get_geometry(BlockDriverState *bs, uint64_t *nb_sectors_ptr)
     *nb_sectors_ptr = length;
 }
 
-/* throttling disk io limits */
-void bdrv_set_io_limits(BlockDriverState *bs,
-                        BlockIOLimit *io_limits)
-{
-    bs->io_limits = *io_limits;
-    bs->io_limits_enabled = bdrv_io_limits_enabled(bs);
-}
-
 void bdrv_set_on_error(BlockDriverState *bs, BlockdevOnError on_read_error,
                        BlockdevOnError on_write_error)
 {
@@ -3568,169 +3590,6 @@ void bdrv_aio_cancel(BlockDriverAIOCB *acb)
     acb->aiocb_info->cancel(acb);
 }
 
-/* block I/O throttling */
-static bool bdrv_exceed_bps_limits(BlockDriverState *bs, int nb_sectors,
-                 bool is_write, double elapsed_time, uint64_t *wait)
-{
-    uint64_t bps_limit = 0;
-    uint64_t extension;
-    double   bytes_limit, bytes_base, bytes_res;
-    double   slice_time, wait_time;
-
-    if (bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL]) {
-        bps_limit = bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL];
-    } else if (bs->io_limits.bps[is_write]) {
-        bps_limit = bs->io_limits.bps[is_write];
-    } else {
-        if (wait) {
-            *wait = 0;
-        }
-
-        return false;
-    }
-
-    slice_time = bs->slice_end - bs->slice_start;
-    slice_time /= (NANOSECONDS_PER_SECOND);
-    bytes_limit = bps_limit * slice_time;
-    bytes_base  = bs->slice_submitted.bytes[is_write];
-    if (bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL]) {
-        bytes_base += bs->slice_submitted.bytes[!is_write];
-    }
-
-    /* bytes_base: the bytes of data which have been read/written; and
-     *             it is obtained from the history statistic info.
-     * bytes_res: the remaining bytes of data which need to be read/written.
-     * (bytes_base + bytes_res) / bps_limit: used to calcuate
-     *             the total time for completing reading/writting all data.
-     */
-    bytes_res   = (unsigned) nb_sectors * BDRV_SECTOR_SIZE;
-
-    if (bytes_base + bytes_res <= bytes_limit) {
-        if (wait) {
-            *wait = 0;
-        }
-
-        return false;
-    }
-
-    /* Calc approx time to dispatch */
-    wait_time = (bytes_base + bytes_res) / bps_limit - elapsed_time;
-
-    /* When the I/O rate at runtime exceeds the limits,
-     * bs->slice_end need to be extended in order that the current statistic
-     * info can be kept until the timer fire, so it is increased and tuned
-     * based on the result of experiment.
-     */
-    extension = wait_time * NANOSECONDS_PER_SECOND;
-    extension = DIV_ROUND_UP(extension, BLOCK_IO_SLICE_TIME) *
-                BLOCK_IO_SLICE_TIME;
-    bs->slice_end += extension;
-    if (wait) {
-        *wait = wait_time * NANOSECONDS_PER_SECOND;
-    }
-
-    return true;
-}
-
-static bool bdrv_exceed_iops_limits(BlockDriverState *bs, bool is_write,
-                             double elapsed_time, uint64_t *wait)
-{
-    uint64_t iops_limit = 0;
-    double   ios_limit, ios_base;
-    double   slice_time, wait_time;
-
-    if (bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL]) {
-        iops_limit = bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL];
-    } else if (bs->io_limits.iops[is_write]) {
-        iops_limit = bs->io_limits.iops[is_write];
-    } else {
-        if (wait) {
-            *wait = 0;
-        }
-
-        return false;
-    }
-
-    slice_time = bs->slice_end - bs->slice_start;
-    slice_time /= (NANOSECONDS_PER_SECOND);
-    ios_limit  = iops_limit * slice_time;
-    ios_base   = bs->slice_submitted.ios[is_write];
-    if (bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL]) {
-        ios_base += bs->slice_submitted.ios[!is_write];
-    }
-
-    if (ios_base + 1 <= ios_limit) {
-        if (wait) {
-            *wait = 0;
-        }
-
-        return false;
-    }
-
-    /* Calc approx time to dispatch, in seconds */
-    wait_time = (ios_base + 1) / iops_limit;
-    if (wait_time > elapsed_time) {
-        wait_time = wait_time - elapsed_time;
-    } else {
-        wait_time = 0;
-    }
-
-    /* Exceeded current slice, extend it by another slice time */
-    bs->slice_end += BLOCK_IO_SLICE_TIME;
-    if (wait) {
-        *wait = wait_time * NANOSECONDS_PER_SECOND;
-    }
-
-    return true;
-}
-
-static bool bdrv_exceed_io_limits(BlockDriverState *bs, int nb_sectors,
-                           bool is_write, int64_t *wait)
-{
-    int64_t  now, max_wait;
-    uint64_t bps_wait = 0, iops_wait = 0;
-    double   elapsed_time;
-    int      bps_ret, iops_ret;
-
-    now = qemu_get_clock_ns(vm_clock);
-    if (now > bs->slice_end) {
-        bs->slice_start = now;
-        bs->slice_end   = now + BLOCK_IO_SLICE_TIME;
-        memset(&bs->slice_submitted, 0, sizeof(bs->slice_submitted));
-    }
-
-    elapsed_time  = now - bs->slice_start;
-    elapsed_time  /= (NANOSECONDS_PER_SECOND);
-
-    bps_ret  = bdrv_exceed_bps_limits(bs, nb_sectors,
-                                      is_write, elapsed_time, &bps_wait);
-    iops_ret = bdrv_exceed_iops_limits(bs, is_write,
-                                      elapsed_time, &iops_wait);
-    if (bps_ret || iops_ret) {
-        max_wait = bps_wait > iops_wait ? bps_wait : iops_wait;
-        if (wait) {
-            *wait = max_wait;
-        }
-
-        now = qemu_get_clock_ns(vm_clock);
-        if (bs->slice_end < now + max_wait) {
-            bs->slice_end = now + max_wait;
-        }
-
-        return true;
-    }
-
-    if (wait) {
-        *wait = 0;
-    }
-
-    bs->slice_submitted.bytes[is_write] += (int64_t)nb_sectors *
-                                           BDRV_SECTOR_SIZE;
-    bs->slice_submitted.ios[is_write]++;
-
-    return false;
-}
-
 /**************************************************************/
 /* async block device emulation */
 
diff --git a/block/qapi.c b/block/qapi.c
index a4bc411..45f806b 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -223,18 +223,15 @@ void bdrv_query_info(BlockDriverState *bs,
         info->inserted->backing_file_depth = bdrv_get_backing_file_depth(bs);
 
         if (bs->io_limits_enabled) {
-            info->inserted->bps =
-                           bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL];
-            info->inserted->bps_rd =
-                           bs->io_limits.bps[BLOCK_IO_LIMIT_READ];
-            info->inserted->bps_wr =
-                           bs->io_limits.bps[BLOCK_IO_LIMIT_WRITE];
-            info->inserted->iops =
-                           bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL];
-            info->inserted->iops_rd =
-                           bs->io_limits.iops[BLOCK_IO_LIMIT_READ];
-            info->inserted->iops_wr =
-                           bs->io_limits.iops[BLOCK_IO_LIMIT_WRITE];
+            ThrottleConfig cfg;
+            throttle_get_config(&bs->throttle_state, &cfg);
+            info->inserted->bps     = cfg.buckets[THROTTLE_BPS_TOTAL].ups;
+            info->inserted->bps_rd  = cfg.buckets[THROTTLE_BPS_READ].ups;
+            info->inserted->bps_wr  = cfg.buckets[THROTTLE_BPS_WRITE].ups;
+
+            info->inserted->iops    = cfg.buckets[THROTTLE_OPS_TOTAL].ups;
+            info->inserted->iops_rd = cfg.buckets[THROTTLE_OPS_READ].ups;
+            info->inserted->iops_wr = cfg.buckets[THROTTLE_OPS_WRITE].ups;
         }
 
         bs0 = bs;
diff --git a/blockdev.c b/blockdev.c
index 41b0a49..7559ea5 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -281,32 +281,16 @@ static int parse_block_error_action(const char *buf, bool is_read)
     }
 }
 
-static bool do_check_io_limits(BlockIOLimit *io_limits, Error **errp)
+static bool check_throttle_config(ThrottleConfig *cfg, Error **errp)
 {
-    bool bps_flag;
-    bool iops_flag;
-
-    assert(io_limits);
-
-    bps_flag  = (io_limits->bps[BLOCK_IO_LIMIT_TOTAL] != 0)
-                 && ((io_limits->bps[BLOCK_IO_LIMIT_READ] != 0)
-                 || (io_limits->bps[BLOCK_IO_LIMIT_WRITE] != 0));
-    iops_flag = (io_limits->iops[BLOCK_IO_LIMIT_TOTAL] != 0)
-                 && ((io_limits->iops[BLOCK_IO_LIMIT_READ] != 0)
-                 || (io_limits->iops[BLOCK_IO_LIMIT_WRITE] != 0));
-    if (bps_flag || iops_flag) {
-        error_setg(errp, "bps(iops) and bps_rd/bps_wr(iops_rd/iops_wr) "
+    if (throttle_conflicting(cfg)) {
+        error_setg(errp, "bps/iops/max total values and read/write values"
                          "cannot be used at the same time");
         return false;
     }
 
-    if (io_limits->bps[BLOCK_IO_LIMIT_TOTAL] < 0 ||
-        io_limits->bps[BLOCK_IO_LIMIT_WRITE] < 0 ||
-        io_limits->bps[BLOCK_IO_LIMIT_READ] < 0 ||
-        io_limits->iops[BLOCK_IO_LIMIT_TOTAL] < 0 ||
-        io_limits->iops[BLOCK_IO_LIMIT_WRITE] < 0 ||
-        io_limits->iops[BLOCK_IO_LIMIT_READ] < 0) {
-        error_setg(errp, "bps and iops values must be 0 or greater");
+    if (!throttle_is_valid(cfg)) {
+        error_setg(errp, "bps/iops/maxs values must be 0 or greater");
         return false;
     }
 
@@ -331,7 +315,7 @@ static DriveInfo *blockdev_init(QemuOpts *all_opts,
     int on_read_error, on_write_error;
     const char *devaddr;
     DriveInfo *dinfo;
-    BlockIOLimit io_limits;
+    ThrottleConfig cfg;
     int snapshot = 0;
     bool copy_on_read;
     int ret;
@@ -489,20 +473,31 @@ static DriveInfo *blockdev_init(QemuOpts *all_opts,
     }
 
     /* disk I/O throttling */
-    io_limits.bps[BLOCK_IO_LIMIT_TOTAL]  =
+    cfg.buckets[THROTTLE_BPS_TOTAL].ups =
         qemu_opt_get_number(opts, "throttling.bps-total", 0);
-    io_limits.bps[BLOCK_IO_LIMIT_READ]   =
+    cfg.buckets[THROTTLE_BPS_READ].ups  =
         qemu_opt_get_number(opts, "throttling.bps-read", 0);
-    io_limits.bps[BLOCK_IO_LIMIT_WRITE]  =
+    cfg.buckets[THROTTLE_BPS_WRITE].ups =
         qemu_opt_get_number(opts, "throttling.bps-write", 0);
-    io_limits.iops[BLOCK_IO_LIMIT_TOTAL] =
+    cfg.buckets[THROTTLE_OPS_TOTAL].ups =
         qemu_opt_get_number(opts, "throttling.iops-total", 0);
-    io_limits.iops[BLOCK_IO_LIMIT_READ]  =
+    cfg.buckets[THROTTLE_OPS_READ].ups =
         qemu_opt_get_number(opts, "throttling.iops-read", 0);
-    io_limits.iops[BLOCK_IO_LIMIT_WRITE] =
+    cfg.buckets[THROTTLE_OPS_WRITE].ups =
         qemu_opt_get_number(opts, "throttling.iops-write", 0);
 
-    if (!do_check_io_limits(&io_limits, &error)) {
+    cfg.buckets[THROTTLE_BPS_TOTAL].max = 0;
+    cfg.buckets[THROTTLE_BPS_READ].max  = 0;
+    cfg.buckets[THROTTLE_BPS_WRITE].max = 0;
+
+    cfg.buckets[THROTTLE_OPS_TOTAL].max = 0;
+    cfg.buckets[THROTTLE_OPS_READ].max  = 0;
+    cfg.buckets[THROTTLE_OPS_WRITE].max = 0;
+
+    cfg.unit_size = BDRV_SECTOR_SIZE;
+    cfg.op_size = 0;
+
+    if (!check_throttle_config(&cfg, &error)) {
         error_report("%s", error_get_pretty(error));
         error_free(error);
         return NULL;
@@ -629,7 +624,10 @@ static DriveInfo *blockdev_init(QemuOpts *all_opts,
     bdrv_set_on_error(dinfo->bdrv, on_read_error, on_write_error);
 
     /* disk I/O throttling */
-    bdrv_set_io_limits(dinfo->bdrv, &io_limits);
+    if (throttle_enabled(&cfg)) {
+        bdrv_io_limits_enable(dinfo->bdrv);
+        bdrv_set_io_limits(dinfo->bdrv, &cfg);
+    }
 
     switch(type) {
     case IF_IDE:
@@ -1262,7 +1260,7 @@ void qmp_block_set_io_throttle(const char *device, int64_t bps, int64_t bps_rd,
                                int64_t bps_wr, int64_t iops, int64_t iops_rd,
                                int64_t iops_wr, Error **errp)
 {
-    BlockIOLimit io_limits;
+    ThrottleConfig cfg;
     BlockDriverState *bs;
 
     bs = bdrv_find(device);
@@ -1271,27 +1269,37 @@ void qmp_block_set_io_throttle(const char *device, int64_t bps, int64_t bps_rd,
         return;
     }
 
-    io_limits.bps[BLOCK_IO_LIMIT_TOTAL] = bps;
-    io_limits.bps[BLOCK_IO_LIMIT_READ]  = bps_rd;
-    io_limits.bps[BLOCK_IO_LIMIT_WRITE] = bps_wr;
-    io_limits.iops[BLOCK_IO_LIMIT_TOTAL]= iops;
-    io_limits.iops[BLOCK_IO_LIMIT_READ] = iops_rd;
-    io_limits.iops[BLOCK_IO_LIMIT_WRITE]= iops_wr;
+    cfg.buckets[THROTTLE_BPS_TOTAL].ups = bps;
+    cfg.buckets[THROTTLE_BPS_READ].ups  = bps_rd;
+    cfg.buckets[THROTTLE_BPS_WRITE].ups = bps_wr;
+
+    cfg.buckets[THROTTLE_OPS_TOTAL].ups = iops;
+    cfg.buckets[THROTTLE_OPS_READ].ups  = iops_rd;
+    cfg.buckets[THROTTLE_OPS_WRITE].ups = iops_wr;
+
+    cfg.buckets[THROTTLE_BPS_TOTAL].max = 0;
+    cfg.buckets[THROTTLE_BPS_READ].max  = 0;
+    cfg.buckets[THROTTLE_BPS_WRITE].max = 0;
+
+    cfg.buckets[THROTTLE_OPS_TOTAL].max = 0;
+    cfg.buckets[THROTTLE_OPS_READ].max  = 0;
+    cfg.buckets[THROTTLE_OPS_WRITE].max = 0;
 
-    if (!do_check_io_limits(&io_limits, errp)) {
+    cfg.unit_size = BDRV_SECTOR_SIZE;
+    cfg.op_size = 0;
+
+    if (!check_throttle_config(&cfg, errp)) {
         return;
     }
 
-    bs->io_limits = io_limits;
-
-    if (!bs->io_limits_enabled && bdrv_io_limits_enabled(bs)) {
+    if (!bs->io_limits_enabled && throttle_enabled(&cfg)) {
         bdrv_io_limits_enable(bs);
-    } else if (bs->io_limits_enabled && !bdrv_io_limits_enabled(bs)) {
+    } else if (bs->io_limits_enabled && !throttle_enabled(&cfg)) {
         bdrv_io_limits_disable(bs);
-    } else {
-        if (bs->block_timer) {
-            qemu_mod_timer(bs->block_timer, qemu_get_clock_ns(vm_clock));
-        }
+    }
+
+    if (bs->io_limits_enabled) {
+        bdrv_set_io_limits(bs, &cfg);
     }
 }
 
diff --git a/include/block/block.h b/include/block/block.h
index 742fce5..b16d579 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -107,7 +107,6 @@ void bdrv_info_stats(Monitor *mon, QObject **ret_data);
 /* disk I/O throttling */
 void bdrv_io_limits_enable(BlockDriverState *bs);
 void bdrv_io_limits_disable(BlockDriverState *bs);
-bool bdrv_io_limits_enabled(BlockDriverState *bs);
 
 void bdrv_init(void);
 void bdrv_init_with_whitelist(void);
diff --git a/include/block/block_int.h b/include/block/block_int.h
index e45f2a0..a0b6bc1 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -34,18 +34,12 @@
 #include "monitor/monitor.h"
 #include "qemu/hbitmap.h"
 #include "block/snapshot.h"
+#include "qemu/throttle.h"
 
 #define BLOCK_FLAG_ENCRYPT          1
 #define BLOCK_FLAG_COMPAT6          4
 #define BLOCK_FLAG_LAZY_REFCOUNTS   8
 
-#define BLOCK_IO_LIMIT_READ     0
-#define BLOCK_IO_LIMIT_WRITE    1
-#define BLOCK_IO_LIMIT_TOTAL    2
-
-#define BLOCK_IO_SLICE_TIME     100000000
-#define NANOSECONDS_PER_SECOND  1000000000.0
-
 #define BLOCK_OPT_SIZE              "size"
 #define BLOCK_OPT_ENCRYPT           "encryption"
 #define BLOCK_OPT_COMPAT6           "compat6"
@@ -69,17 +63,6 @@ typedef struct BdrvTrackedRequest {
     CoQueue wait_queue; /* coroutines blocked on this request */
 } BdrvTrackedRequest;
 
-
-typedef struct BlockIOLimit {
-    int64_t bps[3];
-    int64_t iops[3];
-} BlockIOLimit;
-
-typedef struct BlockIOBaseValue {
-    uint64_t bytes[2];
-    uint64_t ios[2];
-} BlockIOBaseValue;
-
 struct BlockDriver {
     const char *format_name;
     int instance_size;
@@ -263,13 +246,9 @@ struct BlockDriverState {
     /* number of in-flight copy-on-read requests */
     unsigned int copy_on_read_in_flight;
 
-    /* the time for latest disk I/O */
-    int64_t slice_start;
-    int64_t slice_end;
-    BlockIOLimit io_limits;
-    BlockIOBaseValue slice_submitted;
-    CoQueue      throttled_reqs;
-    QEMUTimer    *block_timer;
+    /* I/O throttling */
+    ThrottleState throttle_state;
+    CoQueue      throttled_reqs[2];
     bool         io_limits_enabled;
 
     /* I/O stats (display with "info blockstats"). */
@@ -308,7 +287,8 @@ struct BlockDriverState {
 int get_tmp_filename(char *filename, int size);
 
 void bdrv_set_io_limits(BlockDriverState *bs,
-                        BlockIOLimit *io_limits);
+                        ThrottleConfig *cfg);
+
 
 /**
  * bdrv_add_before_write_notifier:
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH V5 3/5] block: Enable the new throttling code in the block layer.
  2013-08-12 16:53 ` [Qemu-devel] [PATCH V5 3/5] block: Enable the new throttling code in the block layer Benoît Canet
@ 2013-08-14  9:31   ` Fam Zheng
  2013-08-14  9:50     ` Fam Zheng
  2013-08-16 12:02   ` Stefan Hajnoczi
  1 sibling, 1 reply; 17+ messages in thread
From: Fam Zheng @ 2013-08-14  9:31 UTC (permalink / raw)
  To: Benoît Canet; +Cc: kwolf, pbonzini, qemu-devel, stefanha

On Mon, 08/12 18:53, Benoît Canet wrote:
> Signed-off-by: Benoit Canet <benoit@irqsave.net>
> ---
>  block.c                   |  349 ++++++++++++++-------------------------------
>  block/qapi.c              |   21 ++-
>  blockdev.c                |  100 +++++++------
>  include/block/block.h     |    1 -
>  include/block/block_int.h |   32 +----
>  5 files changed, 173 insertions(+), 330 deletions(-)
> 
> diff --git a/block.c b/block.c
> index 01b66d8..d49bc98 100644
> --- a/block.c
> +++ b/block.c
> @@ -86,13 +86,6 @@ static void coroutine_fn bdrv_co_do_rw(void *opaque);
>  static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs,
>      int64_t sector_num, int nb_sectors);
>  
> -static bool bdrv_exceed_bps_limits(BlockDriverState *bs, int nb_sectors,
> -        bool is_write, double elapsed_time, uint64_t *wait);
> -static bool bdrv_exceed_iops_limits(BlockDriverState *bs, bool is_write,
> -        double elapsed_time, uint64_t *wait);
> -static bool bdrv_exceed_io_limits(BlockDriverState *bs, int nb_sectors,
> -        bool is_write, int64_t *wait);
> -
>  static QTAILQ_HEAD(, BlockDriverState) bdrv_states =
>      QTAILQ_HEAD_INITIALIZER(bdrv_states);
>  
> @@ -123,70 +116,110 @@ int is_windows_drive(const char *filename)
>  #endif
>  
>  /* throttling disk I/O limits */
> -void bdrv_io_limits_disable(BlockDriverState *bs)
> +void bdrv_set_io_limits(BlockDriverState *bs,
> +                        ThrottleConfig *cfg)
>  {
> -    bs->io_limits_enabled = false;
> +    int i;
> +
> +    throttle_config(&bs->throttle_state, cfg);
> +
> +    for (i = 0; i < 2; i++) {
> +        qemu_co_enter_next(&bs->throttled_reqs[i]);
> +    }
> +}
> +
> +/* this function drain all the throttled IOs */
> +static bool bdrv_drain_throttled(BlockDriverState *bs)
> +{
> +    bool drained = false;
> +    bool enabled = bs->io_limits_enabled;
> +    int i;
>  
> -    do {} while (qemu_co_enter_next(&bs->throttled_reqs));
> +    bs->io_limits_enabled = false;
>  
> -    if (bs->block_timer) {
> -        qemu_del_timer(bs->block_timer);
> -        qemu_free_timer(bs->block_timer);
> -        bs->block_timer = NULL;
> +    for (i = 0; i < 2; i++) {
> +        while (qemu_co_enter_next(&bs->throttled_reqs[i])) {
> +            drained = true;

OK, this should be right. Throttling is disabled here, so there can't be
inserting to throttled_reqs[0] when you drain throttled_reqs[1].

> +        }
>      }
>  
> -    bs->slice_start = 0;
> -    bs->slice_end   = 0;
> +    bs->io_limits_enabled = enabled;
> +
> +    return drained;
>  }
>  
> -static void bdrv_block_timer(void *opaque)
> +void bdrv_io_limits_disable(BlockDriverState *bs)
> +{
> +    bs->io_limits_enabled = false;
> +
> +    bdrv_drain_throttled(bs);
> +
> +    throttle_destroy(&bs->throttle_state);
> +}
> +
> +static void bdrv_throttle_read_timer_cb(void *opaque)
>  {
>      BlockDriverState *bs = opaque;
> +    qemu_co_enter_next(&bs->throttled_reqs[0]);
> +}
>  
> -    qemu_co_enter_next(&bs->throttled_reqs);
> +static void bdrv_throttle_write_timer_cb(void *opaque)
> +{
> +    BlockDriverState *bs = opaque;
> +    qemu_co_enter_next(&bs->throttled_reqs[1]);
>  }
>  
> +/* should be called before bdrv_set_io_limits if a limit is set */
>  void bdrv_io_limits_enable(BlockDriverState *bs)
>  {
> -    qemu_co_queue_init(&bs->throttled_reqs);
> -    bs->block_timer = qemu_new_timer_ns(vm_clock, bdrv_block_timer, bs);
> +    throttle_init(&bs->throttle_state,
> +                  vm_clock,
> +                  bdrv_throttle_read_timer_cb,
> +                  bdrv_throttle_write_timer_cb,
> +                  bs);
> +    qemu_co_queue_init(&bs->throttled_reqs[0]);
> +    qemu_co_queue_init(&bs->throttled_reqs[1]);
>      bs->io_limits_enabled = true;
>  }
>  
> -bool bdrv_io_limits_enabled(BlockDriverState *bs)
> +/* This function make an IO wait if needed
s/make/makes/
> + *
> + * @is_write:   is the IO a write
> + * @nb_sectors: the number of sectors of the IO
> + */
> +static void bdrv_io_limits_intercept(BlockDriverState *bs,
> +                                     bool is_write,
> +                                     int nb_sectors)
>  {
> -    BlockIOLimit *io_limits = &bs->io_limits;
> -    return io_limits->bps[BLOCK_IO_LIMIT_READ]
> -         || io_limits->bps[BLOCK_IO_LIMIT_WRITE]
> -         || io_limits->bps[BLOCK_IO_LIMIT_TOTAL]
> -         || io_limits->iops[BLOCK_IO_LIMIT_READ]
> -         || io_limits->iops[BLOCK_IO_LIMIT_WRITE]
> -         || io_limits->iops[BLOCK_IO_LIMIT_TOTAL];
> +    /* does this io must wait */
> +    bool must_wait = !throttle_allowed(&bs->throttle_state, is_write);
> +
> +    /* if must wait or any request of this type throttled queue the IO */
> +    if (must_wait ||
> +        !qemu_co_queue_empty(&bs->throttled_reqs[is_write])) {
> +        qemu_co_queue_wait(&bs->throttled_reqs[is_write]);
> +    }
> +
> +    /* the IO will be executed do the accounting */
s/executed/executed,/
> +    throttle_account(&bs->throttle_state, is_write, nb_sectors);
>  }
>  
> -static void bdrv_io_limits_intercept(BlockDriverState *bs,
> -                                     bool is_write, int nb_sectors)
> +/* This function will schedule the next IO throttled IO of this type if needed
> + *
> + * @is_write: is the IO a write
> + */
> +static void bdrv_io_limits_resched(BlockDriverState *bs, bool is_write)
>  {
> -    int64_t wait_time = -1;
>  
> -    if (!qemu_co_queue_empty(&bs->throttled_reqs)) {
> -        qemu_co_queue_wait(&bs->throttled_reqs);
> +    if (qemu_co_queue_empty(&bs->throttled_reqs[is_write])) {
> +        return;
>      }
>  
> -    /* In fact, we hope to keep each request's timing, in FIFO mode. The next
> -     * throttled requests will not be dequeued until the current request is
> -     * allowed to be serviced. So if the current request still exceeds the
> -     * limits, it will be inserted to the head. All requests followed it will
> -     * be still in throttled_reqs queue.
> -     */
> -
> -    while (bdrv_exceed_io_limits(bs, nb_sectors, is_write, &wait_time)) {
> -        qemu_mod_timer(bs->block_timer,
> -                       wait_time + qemu_get_clock_ns(vm_clock));
> -        qemu_co_queue_wait_insert_head(&bs->throttled_reqs);
> +    if (!throttle_allowed(&bs->throttle_state, is_write)) {
> +        return;
>      }
>  
> -    qemu_co_queue_next(&bs->throttled_reqs);
> +    qemu_co_queue_next(&bs->throttled_reqs[is_write]);
>  }
>  
>  /* check if the path starts with "<protocol>:" */
> @@ -1112,11 +1145,6 @@ int bdrv_open(BlockDriverState *bs, const char *filename, QDict *options,
>          bdrv_dev_change_media_cb(bs, true);
>      }
>  
> -    /* throttling disk I/O limits */
> -    if (bs->io_limits_enabled) {
> -        bdrv_io_limits_enable(bs);
> -    }
> -
>      return 0;
>  
>  unlink_and_fail:
> @@ -1452,7 +1480,7 @@ void bdrv_drain_all(void)
>           * a busy wait.
>           */
>          QTAILQ_FOREACH(bs, &bdrv_states, list) {
> -            while (qemu_co_enter_next(&bs->throttled_reqs)) {
> +            if (bdrv_drain_throttled(bs)) {
>                  busy = true;
>              }
>          }
> @@ -1461,7 +1489,8 @@ void bdrv_drain_all(void)
>      /* If requests are still pending there is a bug somewhere */
>      QTAILQ_FOREACH(bs, &bdrv_states, list) {
>          assert(QLIST_EMPTY(&bs->tracked_requests));
> -        assert(qemu_co_queue_empty(&bs->throttled_reqs));
> +        assert(qemu_co_queue_empty(&bs->throttled_reqs[0]));
> +        assert(qemu_co_queue_empty(&bs->throttled_reqs[1]));
>      }
>  }
>  
> @@ -1497,13 +1526,12 @@ static void bdrv_move_feature_fields(BlockDriverState *bs_dest,
>  
>      bs_dest->enable_write_cache = bs_src->enable_write_cache;
>  
> -    /* i/o timing parameters */
> -    bs_dest->slice_start        = bs_src->slice_start;
> -    bs_dest->slice_end          = bs_src->slice_end;
> -    bs_dest->slice_submitted    = bs_src->slice_submitted;
> -    bs_dest->io_limits          = bs_src->io_limits;
> -    bs_dest->throttled_reqs     = bs_src->throttled_reqs;
> -    bs_dest->block_timer        = bs_src->block_timer;
> +    /* i/o throttled req */
> +    memcpy(&bs_dest->throttle_state,
> +           &bs_src->throttle_state,
> +           sizeof(ThrottleState));
> +    bs_dest->throttled_reqs[0]  = bs_src->throttled_reqs[0];
> +    bs_dest->throttled_reqs[1]  = bs_src->throttled_reqs[1];
>      bs_dest->io_limits_enabled  = bs_src->io_limits_enabled;
>  
>      /* r/w error */
> @@ -1550,7 +1578,7 @@ void bdrv_swap(BlockDriverState *bs_new, BlockDriverState *bs_old)
>      assert(bs_new->dev == NULL);
>      assert(bs_new->in_use == 0);
>      assert(bs_new->io_limits_enabled == false);
> -    assert(bs_new->block_timer == NULL);
> +    assert(!throttle_have_timer(&bs_new->throttle_state));
>  
>      tmp = *bs_new;
>      *bs_new = *bs_old;
> @@ -1569,7 +1597,7 @@ void bdrv_swap(BlockDriverState *bs_new, BlockDriverState *bs_old)
>      assert(bs_new->job == NULL);
>      assert(bs_new->in_use == 0);
>      assert(bs_new->io_limits_enabled == false);
> -    assert(bs_new->block_timer == NULL);
> +    assert(!throttle_have_timer(&bs_new->throttle_state));
>  
>      bdrv_rebind(bs_new);
>      bdrv_rebind(bs_old);
> @@ -1860,8 +1888,15 @@ int bdrv_commit_all(void)
>   *
>   * This function should be called when a tracked request is completing.
>   */
> -static void tracked_request_end(BdrvTrackedRequest *req)
> +static void tracked_request_end(BlockDriverState *bs,
> +                                BdrvTrackedRequest *req,
> +                                bool is_write)
>  {
> +    /* throttling disk I/O */
> +    if (bs->io_limits_enabled) {
> +        bdrv_io_limits_resched(bs, is_write);
> +    }
> +
>      QLIST_REMOVE(req, list);
>      qemu_co_queue_restart_all(&req->wait_queue);
>  }
> @@ -1874,6 +1909,11 @@ static void tracked_request_begin(BdrvTrackedRequest *req,
>                                    int64_t sector_num,
>                                    int nb_sectors, bool is_write)
>  {
> +    /* throttling disk I/O */
> +    if (bs->io_limits_enabled) {
> +        bdrv_io_limits_intercept(bs, is_write, nb_sectors);
> +    }
> +
>      *req = (BdrvTrackedRequest){
>          .bs = bs,
>          .sector_num = sector_num,
> @@ -2512,11 +2552,6 @@ static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs,
>          return -EIO;
>      }
>  
> -    /* throttling disk read I/O */
> -    if (bs->io_limits_enabled) {
> -        bdrv_io_limits_intercept(bs, false, nb_sectors);
> -    }
> -
>      if (bs->copy_on_read) {
>          flags |= BDRV_REQ_COPY_ON_READ;
>      }
> @@ -2547,7 +2582,7 @@ static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs,
>      ret = drv->bdrv_co_readv(bs, sector_num, nb_sectors, qiov);
>  
>  out:
> -    tracked_request_end(&req);
> +    tracked_request_end(bs, &req, false);
>  
>      if (flags & BDRV_REQ_COPY_ON_READ) {
>          bs->copy_on_read_in_flight--;
> @@ -2625,11 +2660,6 @@ static int coroutine_fn bdrv_co_do_writev(BlockDriverState *bs,
>          return -EIO;
>      }
>  
> -    /* throttling disk write I/O */
> -    if (bs->io_limits_enabled) {
> -        bdrv_io_limits_intercept(bs, true, nb_sectors);
> -    }
> -
>      if (bs->copy_on_read_in_flight) {
>          wait_for_overlapping_requests(bs, sector_num, nb_sectors);
>      }
> @@ -2658,7 +2688,7 @@ static int coroutine_fn bdrv_co_do_writev(BlockDriverState *bs,
>          bs->wr_highest_sector = sector_num + nb_sectors - 1;
>      }
>  
> -    tracked_request_end(&req);
> +    tracked_request_end(bs, &req, true);
>  
>      return ret;
>  }
> @@ -2751,14 +2781,6 @@ void bdrv_get_geometry(BlockDriverState *bs, uint64_t *nb_sectors_ptr)
>      *nb_sectors_ptr = length;
>  }
>  
> -/* throttling disk io limits */
> -void bdrv_set_io_limits(BlockDriverState *bs,
> -                        BlockIOLimit *io_limits)
> -{
> -    bs->io_limits = *io_limits;
> -    bs->io_limits_enabled = bdrv_io_limits_enabled(bs);
> -}
> -
>  void bdrv_set_on_error(BlockDriverState *bs, BlockdevOnError on_read_error,
>                         BlockdevOnError on_write_error)
>  {
> @@ -3568,169 +3590,6 @@ void bdrv_aio_cancel(BlockDriverAIOCB *acb)
>      acb->aiocb_info->cancel(acb);
>  }
>  
> -/* block I/O throttling */
> -static bool bdrv_exceed_bps_limits(BlockDriverState *bs, int nb_sectors,
> -                 bool is_write, double elapsed_time, uint64_t *wait)
> -{
> -    uint64_t bps_limit = 0;
> -    uint64_t extension;
> -    double   bytes_limit, bytes_base, bytes_res;
> -    double   slice_time, wait_time;
> -
> -    if (bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL]) {
> -        bps_limit = bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL];
> -    } else if (bs->io_limits.bps[is_write]) {
> -        bps_limit = bs->io_limits.bps[is_write];
> -    } else {
> -        if (wait) {
> -            *wait = 0;
> -        }
> -
> -        return false;
> -    }
> -
> -    slice_time = bs->slice_end - bs->slice_start;
> -    slice_time /= (NANOSECONDS_PER_SECOND);
> -    bytes_limit = bps_limit * slice_time;
> -    bytes_base  = bs->slice_submitted.bytes[is_write];
> -    if (bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL]) {
> -        bytes_base += bs->slice_submitted.bytes[!is_write];
> -    }
> -
> -    /* bytes_base: the bytes of data which have been read/written; and
> -     *             it is obtained from the history statistic info.
> -     * bytes_res: the remaining bytes of data which need to be read/written.
> -     * (bytes_base + bytes_res) / bps_limit: used to calcuate
> -     *             the total time for completing reading/writting all data.
> -     */
> -    bytes_res   = (unsigned) nb_sectors * BDRV_SECTOR_SIZE;
> -
> -    if (bytes_base + bytes_res <= bytes_limit) {
> -        if (wait) {
> -            *wait = 0;
> -        }
> -
> -        return false;
> -    }
> -
> -    /* Calc approx time to dispatch */
> -    wait_time = (bytes_base + bytes_res) / bps_limit - elapsed_time;
> -
> -    /* When the I/O rate at runtime exceeds the limits,
> -     * bs->slice_end need to be extended in order that the current statistic
> -     * info can be kept until the timer fire, so it is increased and tuned
> -     * based on the result of experiment.
> -     */
> -    extension = wait_time * NANOSECONDS_PER_SECOND;
> -    extension = DIV_ROUND_UP(extension, BLOCK_IO_SLICE_TIME) *
> -                BLOCK_IO_SLICE_TIME;
> -    bs->slice_end += extension;
> -    if (wait) {
> -        *wait = wait_time * NANOSECONDS_PER_SECOND;
> -    }
> -
> -    return true;
> -}
> -
> -static bool bdrv_exceed_iops_limits(BlockDriverState *bs, bool is_write,
> -                             double elapsed_time, uint64_t *wait)
> -{
> -    uint64_t iops_limit = 0;
> -    double   ios_limit, ios_base;
> -    double   slice_time, wait_time;
> -
> -    if (bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL]) {
> -        iops_limit = bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL];
> -    } else if (bs->io_limits.iops[is_write]) {
> -        iops_limit = bs->io_limits.iops[is_write];
> -    } else {
> -        if (wait) {
> -            *wait = 0;
> -        }
> -
> -        return false;
> -    }
> -
> -    slice_time = bs->slice_end - bs->slice_start;
> -    slice_time /= (NANOSECONDS_PER_SECOND);
> -    ios_limit  = iops_limit * slice_time;
> -    ios_base   = bs->slice_submitted.ios[is_write];
> -    if (bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL]) {
> -        ios_base += bs->slice_submitted.ios[!is_write];
> -    }
> -
> -    if (ios_base + 1 <= ios_limit) {
> -        if (wait) {
> -            *wait = 0;
> -        }
> -
> -        return false;
> -    }
> -
> -    /* Calc approx time to dispatch, in seconds */
> -    wait_time = (ios_base + 1) / iops_limit;
> -    if (wait_time > elapsed_time) {
> -        wait_time = wait_time - elapsed_time;
> -    } else {
> -        wait_time = 0;
> -    }
> -
> -    /* Exceeded current slice, extend it by another slice time */
> -    bs->slice_end += BLOCK_IO_SLICE_TIME;
> -    if (wait) {
> -        *wait = wait_time * NANOSECONDS_PER_SECOND;
> -    }
> -
> -    return true;
> -}
> -
> -static bool bdrv_exceed_io_limits(BlockDriverState *bs, int nb_sectors,
> -                           bool is_write, int64_t *wait)
> -{
> -    int64_t  now, max_wait;
> -    uint64_t bps_wait = 0, iops_wait = 0;
> -    double   elapsed_time;
> -    int      bps_ret, iops_ret;
> -
> -    now = qemu_get_clock_ns(vm_clock);
> -    if (now > bs->slice_end) {
> -        bs->slice_start = now;
> -        bs->slice_end   = now + BLOCK_IO_SLICE_TIME;
> -        memset(&bs->slice_submitted, 0, sizeof(bs->slice_submitted));
> -    }
> -
> -    elapsed_time  = now - bs->slice_start;
> -    elapsed_time  /= (NANOSECONDS_PER_SECOND);
> -
> -    bps_ret  = bdrv_exceed_bps_limits(bs, nb_sectors,
> -                                      is_write, elapsed_time, &bps_wait);
> -    iops_ret = bdrv_exceed_iops_limits(bs, is_write,
> -                                      elapsed_time, &iops_wait);
> -    if (bps_ret || iops_ret) {
> -        max_wait = bps_wait > iops_wait ? bps_wait : iops_wait;
> -        if (wait) {
> -            *wait = max_wait;
> -        }
> -
> -        now = qemu_get_clock_ns(vm_clock);
> -        if (bs->slice_end < now + max_wait) {
> -            bs->slice_end = now + max_wait;
> -        }
> -
> -        return true;
> -    }
> -
> -    if (wait) {
> -        *wait = 0;
> -    }
> -
> -    bs->slice_submitted.bytes[is_write] += (int64_t)nb_sectors *
> -                                           BDRV_SECTOR_SIZE;
> -    bs->slice_submitted.ios[is_write]++;
> -
> -    return false;
> -}
> -
>  /**************************************************************/
>  /* async block device emulation */
>  
> diff --git a/block/qapi.c b/block/qapi.c
> index a4bc411..45f806b 100644
> --- a/block/qapi.c
> +++ b/block/qapi.c
> @@ -223,18 +223,15 @@ void bdrv_query_info(BlockDriverState *bs,
>          info->inserted->backing_file_depth = bdrv_get_backing_file_depth(bs);
>  
>          if (bs->io_limits_enabled) {
> -            info->inserted->bps =
> -                           bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL];
> -            info->inserted->bps_rd =
> -                           bs->io_limits.bps[BLOCK_IO_LIMIT_READ];
> -            info->inserted->bps_wr =
> -                           bs->io_limits.bps[BLOCK_IO_LIMIT_WRITE];
> -            info->inserted->iops =
> -                           bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL];
> -            info->inserted->iops_rd =
> -                           bs->io_limits.iops[BLOCK_IO_LIMIT_READ];
> -            info->inserted->iops_wr =
> -                           bs->io_limits.iops[BLOCK_IO_LIMIT_WRITE];
> +            ThrottleConfig cfg;
> +            throttle_get_config(&bs->throttle_state, &cfg);
> +            info->inserted->bps     = cfg.buckets[THROTTLE_BPS_TOTAL].ups;
> +            info->inserted->bps_rd  = cfg.buckets[THROTTLE_BPS_READ].ups;
> +            info->inserted->bps_wr  = cfg.buckets[THROTTLE_BPS_WRITE].ups;
> +
> +            info->inserted->iops    = cfg.buckets[THROTTLE_OPS_TOTAL].ups;
> +            info->inserted->iops_rd = cfg.buckets[THROTTLE_OPS_READ].ups;
> +            info->inserted->iops_wr = cfg.buckets[THROTTLE_OPS_WRITE].ups;
>          }
>  
>          bs0 = bs;
> diff --git a/blockdev.c b/blockdev.c
> index 41b0a49..7559ea5 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -281,32 +281,16 @@ static int parse_block_error_action(const char *buf, bool is_read)
>      }
>  }
>  
> -static bool do_check_io_limits(BlockIOLimit *io_limits, Error **errp)
> +static bool check_throttle_config(ThrottleConfig *cfg, Error **errp)
>  {
> -    bool bps_flag;
> -    bool iops_flag;
> -
> -    assert(io_limits);
> -
> -    bps_flag  = (io_limits->bps[BLOCK_IO_LIMIT_TOTAL] != 0)
> -                 && ((io_limits->bps[BLOCK_IO_LIMIT_READ] != 0)
> -                 || (io_limits->bps[BLOCK_IO_LIMIT_WRITE] != 0));
> -    iops_flag = (io_limits->iops[BLOCK_IO_LIMIT_TOTAL] != 0)
> -                 && ((io_limits->iops[BLOCK_IO_LIMIT_READ] != 0)
> -                 || (io_limits->iops[BLOCK_IO_LIMIT_WRITE] != 0));
> -    if (bps_flag || iops_flag) {
> -        error_setg(errp, "bps(iops) and bps_rd/bps_wr(iops_rd/iops_wr) "
> +    if (throttle_conflicting(cfg)) {
> +        error_setg(errp, "bps/iops/max total values and read/write values"
>                           "cannot be used at the same time");
>          return false;
>      }
>  
> -    if (io_limits->bps[BLOCK_IO_LIMIT_TOTAL] < 0 ||
> -        io_limits->bps[BLOCK_IO_LIMIT_WRITE] < 0 ||
> -        io_limits->bps[BLOCK_IO_LIMIT_READ] < 0 ||
> -        io_limits->iops[BLOCK_IO_LIMIT_TOTAL] < 0 ||
> -        io_limits->iops[BLOCK_IO_LIMIT_WRITE] < 0 ||
> -        io_limits->iops[BLOCK_IO_LIMIT_READ] < 0) {
> -        error_setg(errp, "bps and iops values must be 0 or greater");
> +    if (!throttle_is_valid(cfg)) {
> +        error_setg(errp, "bps/iops/maxs values must be 0 or greater");
>          return false;
>      }
>  
> @@ -331,7 +315,7 @@ static DriveInfo *blockdev_init(QemuOpts *all_opts,
>      int on_read_error, on_write_error;
>      const char *devaddr;
>      DriveInfo *dinfo;
> -    BlockIOLimit io_limits;
> +    ThrottleConfig cfg;
>      int snapshot = 0;
>      bool copy_on_read;
>      int ret;
> @@ -489,20 +473,31 @@ static DriveInfo *blockdev_init(QemuOpts *all_opts,
>      }
>  
>      /* disk I/O throttling */
> -    io_limits.bps[BLOCK_IO_LIMIT_TOTAL]  =
> +    cfg.buckets[THROTTLE_BPS_TOTAL].ups =
>          qemu_opt_get_number(opts, "throttling.bps-total", 0);
> -    io_limits.bps[BLOCK_IO_LIMIT_READ]   =
> +    cfg.buckets[THROTTLE_BPS_READ].ups  =
>          qemu_opt_get_number(opts, "throttling.bps-read", 0);
> -    io_limits.bps[BLOCK_IO_LIMIT_WRITE]  =
> +    cfg.buckets[THROTTLE_BPS_WRITE].ups =
>          qemu_opt_get_number(opts, "throttling.bps-write", 0);
> -    io_limits.iops[BLOCK_IO_LIMIT_TOTAL] =
> +    cfg.buckets[THROTTLE_OPS_TOTAL].ups =
>          qemu_opt_get_number(opts, "throttling.iops-total", 0);
> -    io_limits.iops[BLOCK_IO_LIMIT_READ]  =
> +    cfg.buckets[THROTTLE_OPS_READ].ups =
>          qemu_opt_get_number(opts, "throttling.iops-read", 0);
> -    io_limits.iops[BLOCK_IO_LIMIT_WRITE] =
> +    cfg.buckets[THROTTLE_OPS_WRITE].ups =
>          qemu_opt_get_number(opts, "throttling.iops-write", 0);
>  
> -    if (!do_check_io_limits(&io_limits, &error)) {
> +    cfg.buckets[THROTTLE_BPS_TOTAL].max = 0;
> +    cfg.buckets[THROTTLE_BPS_READ].max  = 0;
> +    cfg.buckets[THROTTLE_BPS_WRITE].max = 0;
> +
> +    cfg.buckets[THROTTLE_OPS_TOTAL].max = 0;
> +    cfg.buckets[THROTTLE_OPS_READ].max  = 0;
> +    cfg.buckets[THROTTLE_OPS_WRITE].max = 0;
> +
> +    cfg.unit_size = BDRV_SECTOR_SIZE;
> +    cfg.op_size = 0;
> +
> +    if (!check_throttle_config(&cfg, &error)) {
>          error_report("%s", error_get_pretty(error));
>          error_free(error);
>          return NULL;
> @@ -629,7 +624,10 @@ static DriveInfo *blockdev_init(QemuOpts *all_opts,
>      bdrv_set_on_error(dinfo->bdrv, on_read_error, on_write_error);
>  
>      /* disk I/O throttling */
> -    bdrv_set_io_limits(dinfo->bdrv, &io_limits);
> +    if (throttle_enabled(&cfg)) {
> +        bdrv_io_limits_enable(dinfo->bdrv);
> +        bdrv_set_io_limits(dinfo->bdrv, &cfg);
> +    }
>  
>      switch(type) {
>      case IF_IDE:
> @@ -1262,7 +1260,7 @@ void qmp_block_set_io_throttle(const char *device, int64_t bps, int64_t bps_rd,
>                                 int64_t bps_wr, int64_t iops, int64_t iops_rd,
>                                 int64_t iops_wr, Error **errp)
>  {
> -    BlockIOLimit io_limits;
> +    ThrottleConfig cfg;
>      BlockDriverState *bs;
>  
>      bs = bdrv_find(device);
> @@ -1271,27 +1269,37 @@ void qmp_block_set_io_throttle(const char *device, int64_t bps, int64_t bps_rd,
>          return;
>      }
>  
> -    io_limits.bps[BLOCK_IO_LIMIT_TOTAL] = bps;
> -    io_limits.bps[BLOCK_IO_LIMIT_READ]  = bps_rd;
> -    io_limits.bps[BLOCK_IO_LIMIT_WRITE] = bps_wr;
> -    io_limits.iops[BLOCK_IO_LIMIT_TOTAL]= iops;
> -    io_limits.iops[BLOCK_IO_LIMIT_READ] = iops_rd;
> -    io_limits.iops[BLOCK_IO_LIMIT_WRITE]= iops_wr;
> +    cfg.buckets[THROTTLE_BPS_TOTAL].ups = bps;
> +    cfg.buckets[THROTTLE_BPS_READ].ups  = bps_rd;
> +    cfg.buckets[THROTTLE_BPS_WRITE].ups = bps_wr;
> +
> +    cfg.buckets[THROTTLE_OPS_TOTAL].ups = iops;
> +    cfg.buckets[THROTTLE_OPS_READ].ups  = iops_rd;
> +    cfg.buckets[THROTTLE_OPS_WRITE].ups = iops_wr;
> +
> +    cfg.buckets[THROTTLE_BPS_TOTAL].max = 0;
> +    cfg.buckets[THROTTLE_BPS_READ].max  = 0;
> +    cfg.buckets[THROTTLE_BPS_WRITE].max = 0;
> +
> +    cfg.buckets[THROTTLE_OPS_TOTAL].max = 0;
> +    cfg.buckets[THROTTLE_OPS_READ].max  = 0;
> +    cfg.buckets[THROTTLE_OPS_WRITE].max = 0;
>  
> -    if (!do_check_io_limits(&io_limits, errp)) {
> +    cfg.unit_size = BDRV_SECTOR_SIZE;

Does this mean user set bps limit in sector now? I think we should be
consistent to existing parameter unit.

> +    cfg.op_size = 0;

Why not set op_size to 1?

> +
> +    if (!check_throttle_config(&cfg, errp)) {
>          return;
>      }
>  
> -    bs->io_limits = io_limits;
> -
> -    if (!bs->io_limits_enabled && bdrv_io_limits_enabled(bs)) {
> +    if (!bs->io_limits_enabled && throttle_enabled(&cfg)) {
>          bdrv_io_limits_enable(bs);
> -    } else if (bs->io_limits_enabled && !bdrv_io_limits_enabled(bs)) {
> +    } else if (bs->io_limits_enabled && !throttle_enabled(&cfg)) {
>          bdrv_io_limits_disable(bs);
> -    } else {
> -        if (bs->block_timer) {
> -            qemu_mod_timer(bs->block_timer, qemu_get_clock_ns(vm_clock));
> -        }
> +    }
> +
> +    if (bs->io_limits_enabled) {
> +        bdrv_set_io_limits(bs, &cfg);
>      }
>  }
>  
> diff --git a/include/block/block.h b/include/block/block.h
> index 742fce5..b16d579 100644
> --- a/include/block/block.h
> +++ b/include/block/block.h
> @@ -107,7 +107,6 @@ void bdrv_info_stats(Monitor *mon, QObject **ret_data);
>  /* disk I/O throttling */
>  void bdrv_io_limits_enable(BlockDriverState *bs);
>  void bdrv_io_limits_disable(BlockDriverState *bs);
> -bool bdrv_io_limits_enabled(BlockDriverState *bs);
>  
>  void bdrv_init(void);
>  void bdrv_init_with_whitelist(void);
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index e45f2a0..a0b6bc1 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -34,18 +34,12 @@
>  #include "monitor/monitor.h"
>  #include "qemu/hbitmap.h"
>  #include "block/snapshot.h"
> +#include "qemu/throttle.h"
>  
>  #define BLOCK_FLAG_ENCRYPT          1
>  #define BLOCK_FLAG_COMPAT6          4
>  #define BLOCK_FLAG_LAZY_REFCOUNTS   8
>  
> -#define BLOCK_IO_LIMIT_READ     0
> -#define BLOCK_IO_LIMIT_WRITE    1
> -#define BLOCK_IO_LIMIT_TOTAL    2
> -
> -#define BLOCK_IO_SLICE_TIME     100000000
> -#define NANOSECONDS_PER_SECOND  1000000000.0
> -
>  #define BLOCK_OPT_SIZE              "size"
>  #define BLOCK_OPT_ENCRYPT           "encryption"
>  #define BLOCK_OPT_COMPAT6           "compat6"
> @@ -69,17 +63,6 @@ typedef struct BdrvTrackedRequest {
>      CoQueue wait_queue; /* coroutines blocked on this request */
>  } BdrvTrackedRequest;
>  
> -
> -typedef struct BlockIOLimit {
> -    int64_t bps[3];
> -    int64_t iops[3];
> -} BlockIOLimit;
> -
> -typedef struct BlockIOBaseValue {
> -    uint64_t bytes[2];
> -    uint64_t ios[2];
> -} BlockIOBaseValue;
> -
>  struct BlockDriver {
>      const char *format_name;
>      int instance_size;
> @@ -263,13 +246,9 @@ struct BlockDriverState {
>      /* number of in-flight copy-on-read requests */
>      unsigned int copy_on_read_in_flight;
>  
> -    /* the time for latest disk I/O */
> -    int64_t slice_start;
> -    int64_t slice_end;
> -    BlockIOLimit io_limits;
> -    BlockIOBaseValue slice_submitted;
> -    CoQueue      throttled_reqs;
> -    QEMUTimer    *block_timer;
> +    /* I/O throttling */
> +    ThrottleState throttle_state;
> +    CoQueue      throttled_reqs[2];
>      bool         io_limits_enabled;
>  
>      /* I/O stats (display with "info blockstats"). */
> @@ -308,7 +287,8 @@ struct BlockDriverState {
>  int get_tmp_filename(char *filename, int size);
>  
>  void bdrv_set_io_limits(BlockDriverState *bs,
> -                        BlockIOLimit *io_limits);
> +                        ThrottleConfig *cfg);
> +
>  
>  /**
>   * bdrv_add_before_write_notifier:
> -- 
> 1.7.10.4
> 
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH V5 3/5] block: Enable the new throttling code in the block layer.
  2013-08-14  9:31   ` Fam Zheng
@ 2013-08-14  9:50     ` Fam Zheng
  0 siblings, 0 replies; 17+ messages in thread
From: Fam Zheng @ 2013-08-14  9:50 UTC (permalink / raw)
  To: Benoît Canet; +Cc: kwolf, pbonzini, qemu-devel, stefanha

On Wed, 08/14 17:31, Fam Zheng wrote:
> On Mon, 08/12 18:53, Benoît Canet wrote:
> > Signed-off-by: Benoit Canet <benoit@irqsave.net>
> > @@ -1262,7 +1260,7 @@ void qmp_block_set_io_throttle(const char *device, int64_t bps, int64_t bps_rd,
> >                                 int64_t bps_wr, int64_t iops, int64_t iops_rd,
> >                                 int64_t iops_wr, Error **errp)
> >  {
> > -    BlockIOLimit io_limits;
> > +    ThrottleConfig cfg;
> >      BlockDriverState *bs;
> >  
> >      bs = bdrv_find(device);
> > @@ -1271,27 +1269,37 @@ void qmp_block_set_io_throttle(const char *device, int64_t bps, int64_t bps_rd,
> >          return;
> >      }
> >  
> > -    io_limits.bps[BLOCK_IO_LIMIT_TOTAL] = bps;
> > -    io_limits.bps[BLOCK_IO_LIMIT_READ]  = bps_rd;
> > -    io_limits.bps[BLOCK_IO_LIMIT_WRITE] = bps_wr;
> > -    io_limits.iops[BLOCK_IO_LIMIT_TOTAL]= iops;
> > -    io_limits.iops[BLOCK_IO_LIMIT_READ] = iops_rd;
> > -    io_limits.iops[BLOCK_IO_LIMIT_WRITE]= iops_wr;
> > +    cfg.buckets[THROTTLE_BPS_TOTAL].ups = bps;
> > +    cfg.buckets[THROTTLE_BPS_READ].ups  = bps_rd;
> > +    cfg.buckets[THROTTLE_BPS_WRITE].ups = bps_wr;
> > +
> > +    cfg.buckets[THROTTLE_OPS_TOTAL].ups = iops;
> > +    cfg.buckets[THROTTLE_OPS_READ].ups  = iops_rd;
> > +    cfg.buckets[THROTTLE_OPS_WRITE].ups = iops_wr;
> > +
> > +    cfg.buckets[THROTTLE_BPS_TOTAL].max = 0;
> > +    cfg.buckets[THROTTLE_BPS_READ].max  = 0;
> > +    cfg.buckets[THROTTLE_BPS_WRITE].max = 0;
> > +
> > +    cfg.buckets[THROTTLE_OPS_TOTAL].max = 0;
> > +    cfg.buckets[THROTTLE_OPS_READ].max  = 0;
> > +    cfg.buckets[THROTTLE_OPS_WRITE].max = 0;
> >  
> > -    if (!do_check_io_limits(&io_limits, errp)) {
> > +    cfg.unit_size = BDRV_SECTOR_SIZE;
> 
> Does this mean user set bps limit in sector now? I think we should be
> consistent to existing parameter unit.
> 
> > +    cfg.op_size = 0;
> 
> Why not set op_size to 1?
> 
Never mind. You se it to 0 here and it's the condition of setting
unit=1.

Fam

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH V5 3/5] block: Enable the new throttling code in the block layer.
  2013-08-12 16:53 ` [Qemu-devel] [PATCH V5 3/5] block: Enable the new throttling code in the block layer Benoît Canet
  2013-08-14  9:31   ` Fam Zheng
@ 2013-08-16 12:02   ` Stefan Hajnoczi
  1 sibling, 0 replies; 17+ messages in thread
From: Stefan Hajnoczi @ 2013-08-16 12:02 UTC (permalink / raw)
  To: Benoît Canet; +Cc: kwolf, pbonzini, qemu-devel, stefanha

On Mon, Aug 12, 2013 at 06:53:14PM +0200, Benoît Canet wrote:
> +/* this function drain all the throttled IOs */
> +static bool bdrv_drain_throttled(BlockDriverState *bs)
> +{
> +    bool drained = false;
> +    bool enabled = bs->io_limits_enabled;
> +    int i;
>  
> -    do {} while (qemu_co_enter_next(&bs->throttled_reqs));
> +    bs->io_limits_enabled = false;
>  
> -    if (bs->block_timer) {
> -        qemu_del_timer(bs->block_timer);
> -        qemu_free_timer(bs->block_timer);
> -        bs->block_timer = NULL;
> +    for (i = 0; i < 2; i++) {
> +        while (qemu_co_enter_next(&bs->throttled_reqs[i])) {
> +            drained = true;
> +        }
>      }

This function submits throttled requests but it doesn't drain them -
they might still be executing when this function returns!

> +/* should be called before bdrv_set_io_limits if a limit is set */
>  void bdrv_io_limits_enable(BlockDriverState *bs)
>  {
> -    qemu_co_queue_init(&bs->throttled_reqs);
> -    bs->block_timer = qemu_new_timer_ns(vm_clock, bdrv_block_timer, bs);

Make sure we never double-initialized ->throttle_state:

assert(!bs->io_limits enabled);

> @@ -2512,11 +2552,6 @@ static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs,
>          return -EIO;
>      }
>  
> -    /* throttling disk read I/O */
> -    if (bs->io_limits_enabled) {
> -        bdrv_io_limits_intercept(bs, false, nb_sectors);
> -    }
> -

Why move bdrv_io_limits_intercept() into tracked_request_begin()?  IMO
tracked_request_begin() should only create the tracked request, it
shouldn't do unrelated stuff like I/O throttling and yielding.  If it
does then it must be declared coroutine_fn.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH V5 4/5] block: Add support for throttling burst max in QMP and the command line.
  2013-08-12 16:53 [Qemu-devel] [PATCH V5 0/5] Continuous Leaky Bucket Throttling Benoît Canet
                   ` (2 preceding siblings ...)
  2013-08-12 16:53 ` [Qemu-devel] [PATCH V5 3/5] block: Enable the new throttling code in the block layer Benoît Canet
@ 2013-08-12 16:53 ` Benoît Canet
  2013-08-16 12:07   ` Stefan Hajnoczi
  2013-08-12 16:53 ` [Qemu-devel] [PATCH V5 5/5] block: Add iops_sector_count to do the iops accounting for a given io size Benoît Canet
  2013-08-16 12:14 ` [Qemu-devel] [PATCH V5 0/5] Continuous Leaky Bucket Throttling Stefan Hajnoczi
  5 siblings, 1 reply; 17+ messages in thread
From: Benoît Canet @ 2013-08-12 16:53 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

The max parameter of the leaky bycket throttling algoritm can be used to
allow the guest to do bursts.
The max value is a pool of I/O that the guest can use without being throttled
at all. Throttling is triggered once this pool is empty.

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/qapi.c     |   26 ++++++++++++
 blockdev.c       |  119 ++++++++++++++++++++++++++++++++++++++++++++++--------
 hmp.c            |   32 +++++++++++++--
 qapi-schema.json |   34 +++++++++++++++-
 qemu-options.hx  |    4 +-
 qmp-commands.hx  |   30 ++++++++++++--
 6 files changed, 220 insertions(+), 25 deletions(-)

diff --git a/block/qapi.c b/block/qapi.c
index 45f806b..5ba10f4 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -232,6 +232,32 @@ void bdrv_query_info(BlockDriverState *bs,
             info->inserted->iops    = cfg.buckets[THROTTLE_OPS_TOTAL].ups;
             info->inserted->iops_rd = cfg.buckets[THROTTLE_OPS_READ].ups;
             info->inserted->iops_wr = cfg.buckets[THROTTLE_OPS_WRITE].ups;
+
+            info->inserted->has_bps_max     =
+                cfg.buckets[THROTTLE_BPS_TOTAL].max;
+            info->inserted->bps_max         =
+                cfg.buckets[THROTTLE_BPS_TOTAL].max;
+            info->inserted->has_bps_rd_max  =
+                cfg.buckets[THROTTLE_BPS_READ].max;
+            info->inserted->bps_rd_max      =
+                cfg.buckets[THROTTLE_BPS_READ].max;
+            info->inserted->has_bps_wr_max  =
+                cfg.buckets[THROTTLE_BPS_WRITE].max;
+            info->inserted->bps_wr_max      =
+                cfg.buckets[THROTTLE_BPS_WRITE].max;
+
+            info->inserted->has_iops_max    =
+                cfg.buckets[THROTTLE_OPS_TOTAL].max;
+            info->inserted->iops_max        =
+                cfg.buckets[THROTTLE_OPS_TOTAL].max;
+            info->inserted->has_iops_rd_max =
+                cfg.buckets[THROTTLE_OPS_READ].max;
+            info->inserted->iops_rd_max     =
+                cfg.buckets[THROTTLE_OPS_READ].max;
+            info->inserted->has_iops_wr_max =
+                cfg.buckets[THROTTLE_OPS_WRITE].max;
+            info->inserted->iops_wr_max     =
+                cfg.buckets[THROTTLE_OPS_WRITE].max;
         }
 
         bs0 = bs;
diff --git a/blockdev.c b/blockdev.c
index 7559ea5..22016a2 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -486,13 +486,18 @@ static DriveInfo *blockdev_init(QemuOpts *all_opts,
     cfg.buckets[THROTTLE_OPS_WRITE].ups =
         qemu_opt_get_number(opts, "throttling.iops-write", 0);
 
-    cfg.buckets[THROTTLE_BPS_TOTAL].max = 0;
-    cfg.buckets[THROTTLE_BPS_READ].max  = 0;
-    cfg.buckets[THROTTLE_BPS_WRITE].max = 0;
-
-    cfg.buckets[THROTTLE_OPS_TOTAL].max = 0;
-    cfg.buckets[THROTTLE_OPS_READ].max  = 0;
-    cfg.buckets[THROTTLE_OPS_WRITE].max = 0;
+    cfg.buckets[THROTTLE_BPS_TOTAL].max =
+        qemu_opt_get_number(opts, "throttling.bps-total-max", 0);
+    cfg.buckets[THROTTLE_BPS_READ].max  =
+        qemu_opt_get_number(opts, "throttling.bps-read-max", 0);
+    cfg.buckets[THROTTLE_BPS_WRITE].max =
+        qemu_opt_get_number(opts, "throttling.bps-write-max", 0);
+    cfg.buckets[THROTTLE_OPS_TOTAL].max =
+        qemu_opt_get_number(opts, "throttling.iops-total-max", 0);
+    cfg.buckets[THROTTLE_OPS_READ].max =
+        qemu_opt_get_number(opts, "throttling.iops-read-max", 0);
+    cfg.buckets[THROTTLE_OPS_WRITE].max =
+        qemu_opt_get_number(opts, "throttling.iops-write-max", 0);
 
     cfg.unit_size = BDRV_SECTOR_SIZE;
     cfg.op_size = 0;
@@ -773,6 +778,14 @@ DriveInfo *drive_init(QemuOpts *all_opts, BlockInterfaceType block_default_type)
     qemu_opt_rename(all_opts, "bps_rd", "throttling.bps-read");
     qemu_opt_rename(all_opts, "bps_wr", "throttling.bps-write");
 
+    qemu_opt_rename(all_opts, "iops_max", "throttling.iops-total-max");
+    qemu_opt_rename(all_opts, "iops_rd_max", "throttling.iops-read-max");
+    qemu_opt_rename(all_opts, "iops_wr_max", "throttling.iops-write-max");
+
+    qemu_opt_rename(all_opts, "bps_max", "throttling.bps-total-max");
+    qemu_opt_rename(all_opts, "bps_rd_max", "throttling.bps-read-max");
+    qemu_opt_rename(all_opts, "bps_wr_max", "throttling.bps-write-max");
+
     qemu_opt_rename(all_opts, "readonly", "read-only");
 
     value = qemu_opt_get(all_opts, "cache");
@@ -1257,8 +1270,22 @@ void qmp_change_blockdev(const char *device, const char *filename,
 
 /* throttling disk I/O limits */
 void qmp_block_set_io_throttle(const char *device, int64_t bps, int64_t bps_rd,
-                               int64_t bps_wr, int64_t iops, int64_t iops_rd,
-                               int64_t iops_wr, Error **errp)
+                               int64_t bps_wr,
+                               int64_t iops,
+                               int64_t iops_rd,
+                               int64_t iops_wr,
+                               bool has_bps_max,
+                               int64_t bps_max,
+                               bool has_bps_rd_max,
+                               int64_t bps_rd_max,
+                               bool has_bps_wr_max,
+                               int64_t bps_wr_max,
+                               bool has_iops_max,
+                               int64_t iops_max,
+                               bool has_iops_rd_max,
+                               int64_t iops_rd_max,
+                               bool has_iops_wr_max,
+                               int64_t iops_wr_max, Error **errp)
 {
     ThrottleConfig cfg;
     BlockDriverState *bs;
@@ -1269,6 +1296,7 @@ void qmp_block_set_io_throttle(const char *device, int64_t bps, int64_t bps_rd,
         return;
     }
 
+    memset(&cfg, 0, sizeof(cfg));
     cfg.buckets[THROTTLE_BPS_TOTAL].ups = bps;
     cfg.buckets[THROTTLE_BPS_READ].ups  = bps_rd;
     cfg.buckets[THROTTLE_BPS_WRITE].ups = bps_wr;
@@ -1277,13 +1305,24 @@ void qmp_block_set_io_throttle(const char *device, int64_t bps, int64_t bps_rd,
     cfg.buckets[THROTTLE_OPS_READ].ups  = iops_rd;
     cfg.buckets[THROTTLE_OPS_WRITE].ups = iops_wr;
 
-    cfg.buckets[THROTTLE_BPS_TOTAL].max = 0;
-    cfg.buckets[THROTTLE_BPS_READ].max  = 0;
-    cfg.buckets[THROTTLE_BPS_WRITE].max = 0;
-
-    cfg.buckets[THROTTLE_OPS_TOTAL].max = 0;
-    cfg.buckets[THROTTLE_OPS_READ].max  = 0;
-    cfg.buckets[THROTTLE_OPS_WRITE].max = 0;
+    if (has_bps_max) {
+        cfg.buckets[THROTTLE_BPS_TOTAL].max = bps_max;
+    }
+    if (has_bps_rd_max) {
+        cfg.buckets[THROTTLE_BPS_READ].max = bps_rd_max;
+    }
+    if (has_bps_wr_max) {
+        cfg.buckets[THROTTLE_BPS_WRITE].max = bps_wr_max;
+    }
+    if (has_iops_max) {
+        cfg.buckets[THROTTLE_OPS_TOTAL].max = iops_max;
+    }
+    if (has_iops_rd_max) {
+        cfg.buckets[THROTTLE_OPS_READ].max = iops_rd_max;
+    }
+    if (has_iops_wr_max) {
+        cfg.buckets[THROTTLE_OPS_WRITE].max = iops_wr_max;
+    }
 
     cfg.unit_size = BDRV_SECTOR_SIZE;
     cfg.op_size = 0;
@@ -1988,6 +2027,30 @@ QemuOptsList qemu_common_drive_opts = {
             .type = QEMU_OPT_NUMBER,
             .help = "limit write bytes per second",
         },{
+            .name = "throttling.iops-total-max",
+            .type = QEMU_OPT_NUMBER,
+            .help = "I/O operations burst",
+        },{
+            .name = "throttling.iops-read-max",
+            .type = QEMU_OPT_NUMBER,
+            .help = "I/O operations read burst",
+        },{
+            .name = "throttling.iops-write-max",
+            .type = QEMU_OPT_NUMBER,
+            .help = "I/O operations write burst",
+        },{
+            .name = "throttling.bps-total-max",
+            .type = QEMU_OPT_NUMBER,
+            .help = "total bytes burst",
+        },{
+            .name = "throttling.bps-read-max",
+            .type = QEMU_OPT_NUMBER,
+            .help = "total bytes read burst",
+        },{
+            .name = "throttling.bps-write-max",
+            .type = QEMU_OPT_NUMBER,
+            .help = "total bytes write burst",
+        },{
             .name = "copy-on-read",
             .type = QEMU_OPT_BOOL,
             .help = "copy read data from backing file into image file",
@@ -2110,6 +2173,30 @@ QemuOptsList qemu_old_drive_opts = {
             .type = QEMU_OPT_NUMBER,
             .help = "limit write bytes per second",
         },{
+            .name = "iops_max",
+            .type = QEMU_OPT_NUMBER,
+            .help = "I/O operations burst",
+        },{
+            .name = "iops_rd_max",
+            .type = QEMU_OPT_NUMBER,
+            .help = "I/O operations read burst",
+        },{
+            .name = "iops_wr_max",
+            .type = QEMU_OPT_NUMBER,
+            .help = "I/O operations write burst",
+        },{
+            .name = "bps_max",
+            .type = QEMU_OPT_NUMBER,
+            .help = "total bytes burst",
+        },{
+            .name = "bps_rd_max",
+            .type = QEMU_OPT_NUMBER,
+            .help = "total bytes read burst",
+        },{
+            .name = "bps_wr_max",
+            .type = QEMU_OPT_NUMBER,
+            .help = "total bytes write burst",
+        },{
             .name = "copy-on-read",
             .type = QEMU_OPT_BOOL,
             .help = "copy read data from backing file into image file",
diff --git a/hmp.c b/hmp.c
index c45514b..e0c387c 100644
--- a/hmp.c
+++ b/hmp.c
@@ -344,14 +344,28 @@ void hmp_info_block(Monitor *mon, const QDict *qdict)
         {
             monitor_printf(mon, "    I/O throttling:   bps=%" PRId64
                             " bps_rd=%" PRId64  " bps_wr=%" PRId64
+                            " bps_max=%" PRId64
+                            " bps_rd_max=%" PRId64
+                            " bps_wr_max=%" PRId64
                             " iops=%" PRId64 " iops_rd=%" PRId64
-                            " iops_wr=%" PRId64 "\n",
+                            " iops_wr=%" PRId64
+                            " iops_max=%" PRId64
+                            " iops_rd_max=%" PRId64
+                            " iops_wr_max=%" PRId64 "\n",
                             info->value->inserted->bps,
                             info->value->inserted->bps_rd,
                             info->value->inserted->bps_wr,
+                            info->value->inserted->bps_max,
+                            info->value->inserted->bps_rd_max,
+                            info->value->inserted->bps_wr_max,
                             info->value->inserted->iops,
                             info->value->inserted->iops_rd,
-                            info->value->inserted->iops_wr);
+                            info->value->inserted->iops_wr,
+                            info->value->inserted->iops_max,
+                            info->value->inserted->iops_rd_max,
+                            info->value->inserted->iops_wr_max);
+        } else {
+            monitor_printf(mon, " [not inserted]");
         }
 
         if (verbose) {
@@ -1098,7 +1112,19 @@ void hmp_block_set_io_throttle(Monitor *mon, const QDict *qdict)
                               qdict_get_int(qdict, "bps_wr"),
                               qdict_get_int(qdict, "iops"),
                               qdict_get_int(qdict, "iops_rd"),
-                              qdict_get_int(qdict, "iops_wr"), &err);
+                              qdict_get_int(qdict, "iops_wr"),
+                              false, /* no burst max via QMP */
+                              0,
+                              false,
+                              0,
+                              false,
+                              0,
+                              false,
+                              0,
+                              false,
+                              0,
+                              false,
+                              0, &err);
     hmp_handle_error(mon, &err);
 }
 
diff --git a/qapi-schema.json b/qapi-schema.json
index a51f7d2..5e5461e 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -785,6 +785,18 @@
 #
 # @image: the info of image used (since: 1.6)
 #
+# @bps_max: #optional total max in bytes (Since 1.7)
+#
+# @bps_rd_max: #optional read max in bytes (Since 1.7)
+#
+# @bps_wr_max: #optional write max in bytes (Since 1.7)
+#
+# @iops_max: #optional total I/O operations max (Since 1.7)
+#
+# @iops_rd_max: #optional read I/O operations max (Since 1.7)
+#
+# @iops_wr_max: #optional write I/O operations max (Since 1.7)
+#
 # Since: 0.14.0
 #
 # Notes: This interface is only found in @BlockInfo.
@@ -795,7 +807,10 @@
             'encrypted': 'bool', 'encryption_key_missing': 'bool',
             'bps': 'int', 'bps_rd': 'int', 'bps_wr': 'int',
             'iops': 'int', 'iops_rd': 'int', 'iops_wr': 'int',
-            'image': 'ImageInfo' } }
+            'image': 'ImageInfo',
+            '*bps_max': 'int', '*bps_rd_max': 'int',
+            '*bps_wr_max': 'int', '*iops_max': 'int',
+            '*iops_rd_max': 'int', '*iops_wr_max': 'int' }}
 
 ##
 # @BlockDeviceIoStatus:
@@ -2174,6 +2189,18 @@
 #
 # @iops_wr: write I/O operations per second
 #
+# @bps_max: #optional total max in bytes (Since 1.7)
+#
+# @bps_rd_max: #optional read max in bytes (Since 1.7)
+#
+# @bps_wr_max: #optional write max in bytes (Since 1.7)
+#
+# @iops_max: #optional total I/O operations max (Since 1.7)
+#
+# @iops_rd_max: #optional read I/O operations max (Since 1.7)
+#
+# @iops_wr_max: #optional write I/O operations max (Since 1.7)
+#
 # Returns: Nothing on success
 #          If @device is not a valid block device, DeviceNotFound
 #
@@ -2181,7 +2208,10 @@
 ##
 { 'command': 'block_set_io_throttle',
   'data': { 'device': 'str', 'bps': 'int', 'bps_rd': 'int', 'bps_wr': 'int',
-            'iops': 'int', 'iops_rd': 'int', 'iops_wr': 'int' } }
+            'iops': 'int', 'iops_rd': 'int', 'iops_wr': 'int',
+            '*bps_max': 'int', '*bps_rd_max': 'int',
+            '*bps_wr_max': 'int', '*iops_max': 'int',
+            '*iops_rd_max': 'int', '*iops_wr_max': 'int' }}
 
 ##
 # @block-stream:
diff --git a/qemu-options.hx b/qemu-options.hx
index d15338e..3aa8f9e 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -409,7 +409,9 @@ DEF("drive", HAS_ARG, QEMU_OPTION_drive,
     "       [,cache=writethrough|writeback|none|directsync|unsafe][,format=f]\n"
     "       [,serial=s][,addr=A][,id=name][,aio=threads|native]\n"
     "       [,readonly=on|off][,copy-on-read=on|off]\n"
-    "       [[,bps=b]|[[,bps_rd=r][,bps_wr=w]]][[,iops=i]|[[,iops_rd=r][,iops_wr=w]]\n"
+    "       [[,bps=b]|[[,bps_rd=r][,bps_wr=w]]][[,iops=i]|[[,iops_rd=r]"
+    "        [,iops_wr=w][,bps_max=bt]|[[,bps_rd_max=rt][,bps_wr_max=wt]]]"
+    "       [[,iops_max=it]|[[,iops_rd_max=rt][,iops_wr_max=wt]]\n"
     "                use 'file' as a drive image\n", QEMU_ARCH_ALL)
 STEXI
 @item -drive @var{option}[,@var{option}[,@var{option}[,...]]]
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 2e59b0d..1610831 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -1389,7 +1389,7 @@ EQMP
 
     {
         .name       = "block_set_io_throttle",
-        .args_type  = "device:B,bps:l,bps_rd:l,bps_wr:l,iops:l,iops_rd:l,iops_wr:l",
+        .args_type  = "device:B,bps:l,bps_rd:l,bps_wr:l,iops:l,iops_rd:l,iops_wr:l,bps_max:l?,bps_rd_max:l?,bps_wr_max:l?,iops_max:l?,iops_rd_max:l?,iops_wr_max:l?",
         .mhandler.cmd_new = qmp_marshal_input_block_set_io_throttle,
     },
 
@@ -1404,10 +1404,16 @@ Arguments:
 - "device": device name (json-string)
 - "bps":  total throughput limit in bytes per second(json-int)
 - "bps_rd":  read throughput limit in bytes per second(json-int)
-- "bps_wr":  read throughput limit in bytes per second(json-int)
+- "bps_wr":  write throughput limit in bytes per second(json-int)
 - "iops":  total I/O operations per second(json-int)
 - "iops_rd":  read I/O operations per second(json-int)
 - "iops_wr":  write I/O operations per second(json-int)
+- "bps_max":  total max in bytes(json-int)
+- "bps_rd_max":  read max in bytes(json-int)
+- "bps_wr_max":  write max in bytes(json-int)
+- "iops_max":  total I/O operations max(json-int)
+- "iops_rd_max":  read I/O operations max(json-int)
+- "iops_wr_max":  write I/O operations max(json-int)
 
 Example:
 
@@ -1417,7 +1423,13 @@ Example:
                                                "bps_wr": "0",
                                                "iops": "0",
                                                "iops_rd": "0",
-                                               "iops_wr": "0" } }
+                                               "iops_wr": "0",
+                                               "bps_max": "8000000",
+                                               "bps_rd_max": "0",
+                                               "bps_wr_max": "0",
+                                               "iops_max": "0",
+                                               "iops_rd_max": "0",
+                                               "iops_wr_max": "0" } }
 <- { "return": {} }
 
 EQMP
@@ -1758,6 +1770,12 @@ Each json-object contain the following:
          - "iops": limit total I/O operations per second (json-int)
          - "iops_rd": limit read operations per second (json-int)
          - "iops_wr": limit write operations per second (json-int)
+         - "bps_max":  total max in bytes(json-int)
+         - "bps_rd_max":  read max in bytes(json-int)
+         - "bps_wr_max":  write max in bytes(json-int)
+         - "iops_max":  total I/O operations max(json-int)
+         - "iops_rd_max":  read I/O operations max(json-int)
+         - "iops_wr_max":  write I/O operations max(json-int)
          - "image": the detail of the image, it is a json-object containing
             the following:
              - "filename": image file name (json-string)
@@ -1827,6 +1845,12 @@ Example:
                "iops":1000000,
                "iops_rd":0,
                "iops_wr":0,
+               "bps_max": "8000000",
+               "bps_rd_max": "0",
+               "bps_wr_max": "0",
+               "iops_max": "0",
+               "iops_rd_max": "0",
+               "iops_wr_max": "0",
                "image":{
                   "filename":"disks/test.qcow2",
                   "format":"qcow2",
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH V5 4/5] block: Add support for throttling burst max in QMP and the command line.
  2013-08-12 16:53 ` [Qemu-devel] [PATCH V5 4/5] block: Add support for throttling burst max in QMP and the command line Benoît Canet
@ 2013-08-16 12:07   ` Stefan Hajnoczi
  0 siblings, 0 replies; 17+ messages in thread
From: Stefan Hajnoczi @ 2013-08-16 12:07 UTC (permalink / raw)
  To: Benoît Canet; +Cc: kwolf, pbonzini, qemu-devel, stefanha

On Mon, Aug 12, 2013 at 06:53:15PM +0200, Benoît Canet wrote:
> The max parameter of the leaky bycket throttling algoritm can be used to

s/bycket/bucket/
s/algoritm/algorithm/

> @@ -1098,7 +1112,19 @@ void hmp_block_set_io_throttle(Monitor *mon, const QDict *qdict)
>                                qdict_get_int(qdict, "bps_wr"),
>                                qdict_get_int(qdict, "iops"),
>                                qdict_get_int(qdict, "iops_rd"),
> -                              qdict_get_int(qdict, "iops_wr"), &err);
> +                              qdict_get_int(qdict, "iops_wr"),
> +                              false, /* no burst max via QMP */

s/QMP/HMP/

> @@ -1758,6 +1770,12 @@ Each json-object contain the following:
>           - "iops": limit total I/O operations per second (json-int)
>           - "iops_rd": limit read operations per second (json-int)
>           - "iops_wr": limit write operations per second (json-int)
> +         - "bps_max":  total max in bytes(json-int)

Space before "(json-int)" and for the others new fields too.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH V5 5/5] block: Add iops_sector_count to do the iops accounting for a given io size.
  2013-08-12 16:53 [Qemu-devel] [PATCH V5 0/5] Continuous Leaky Bucket Throttling Benoît Canet
                   ` (3 preceding siblings ...)
  2013-08-12 16:53 ` [Qemu-devel] [PATCH V5 4/5] block: Add support for throttling burst max in QMP and the command line Benoît Canet
@ 2013-08-12 16:53 ` Benoît Canet
  2013-08-14  9:48   ` Fam Zheng
  2013-08-16 12:10   ` Stefan Hajnoczi
  2013-08-16 12:14 ` [Qemu-devel] [PATCH V5 0/5] Continuous Leaky Bucket Throttling Stefan Hajnoczi
  5 siblings, 2 replies; 17+ messages in thread
From: Benoît Canet @ 2013-08-12 16:53 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

This feature can be used in case where users are avoiding the iops limit by
doing jumbo I/Os hammering the storage backend.

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/qapi.c     |    3 +++
 blockdev.c       |   22 +++++++++++++++++++---
 hmp.c            |    8 ++++++--
 qapi-schema.json |   10 ++++++++--
 qemu-options.hx  |    2 +-
 qmp-commands.hx  |    8 ++++++--
 6 files changed, 43 insertions(+), 10 deletions(-)

diff --git a/block/qapi.c b/block/qapi.c
index 5ba10f4..f98ff64 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -258,6 +258,9 @@ void bdrv_query_info(BlockDriverState *bs,
                 cfg.buckets[THROTTLE_OPS_WRITE].max;
             info->inserted->iops_wr_max     =
                 cfg.buckets[THROTTLE_OPS_WRITE].max;
+
+            info->inserted->has_iops_sector_count = cfg.op_size;
+            info->inserted->iops_sector_count = cfg.op_size;
         }
 
         bs0 = bs;
diff --git a/blockdev.c b/blockdev.c
index 22016a2..e2b0ee0 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -500,7 +500,7 @@ static DriveInfo *blockdev_init(QemuOpts *all_opts,
         qemu_opt_get_number(opts, "throttling.iops-write-max", 0);
 
     cfg.unit_size = BDRV_SECTOR_SIZE;
-    cfg.op_size = 0;
+    cfg.op_size = qemu_opt_get_number(opts, "throttling.iops-sector-count", 0);
 
     if (!check_throttle_config(&cfg, &error)) {
         error_report("%s", error_get_pretty(error));
@@ -786,6 +786,9 @@ DriveInfo *drive_init(QemuOpts *all_opts, BlockInterfaceType block_default_type)
     qemu_opt_rename(all_opts, "bps_rd_max", "throttling.bps-read-max");
     qemu_opt_rename(all_opts, "bps_wr_max", "throttling.bps-write-max");
 
+    qemu_opt_rename(all_opts,
+                    "iops_sector_count", "throttling.iops-sector-count");
+
     qemu_opt_rename(all_opts, "readonly", "read-only");
 
     value = qemu_opt_get(all_opts, "cache");
@@ -1285,7 +1288,9 @@ void qmp_block_set_io_throttle(const char *device, int64_t bps, int64_t bps_rd,
                                bool has_iops_rd_max,
                                int64_t iops_rd_max,
                                bool has_iops_wr_max,
-                               int64_t iops_wr_max, Error **errp)
+                               int64_t iops_wr_max,
+                               bool has_iops_sector_count,
+                               int64_t iops_sector_count, Error **errp)
 {
     ThrottleConfig cfg;
     BlockDriverState *bs;
@@ -1325,7 +1330,10 @@ void qmp_block_set_io_throttle(const char *device, int64_t bps, int64_t bps_rd,
     }
 
     cfg.unit_size = BDRV_SECTOR_SIZE;
-    cfg.op_size = 0;
+
+    if (has_iops_sector_count) {
+        cfg.op_size = iops_sector_count;
+    }
 
     if (!check_throttle_config(&cfg, errp)) {
         return;
@@ -2051,6 +2059,10 @@ QemuOptsList qemu_common_drive_opts = {
             .type = QEMU_OPT_NUMBER,
             .help = "total bytes write burst",
         },{
+            .name = "throttling.iops-sector-count",
+            .type = QEMU_OPT_NUMBER,
+            .help = "when limiting by iops max size of an I/O in sector",
+        },{
             .name = "copy-on-read",
             .type = QEMU_OPT_BOOL,
             .help = "copy read data from backing file into image file",
@@ -2197,6 +2209,10 @@ QemuOptsList qemu_old_drive_opts = {
             .type = QEMU_OPT_NUMBER,
             .help = "total bytes write burst",
         },{
+            .name = "iops_sector_count",
+            .type = QEMU_OPT_NUMBER,
+            .help = "when limiting by iops max size of an I/O in sector",
+        },{
             .name = "copy-on-read",
             .type = QEMU_OPT_BOOL,
             .help = "copy read data from backing file into image file",
diff --git a/hmp.c b/hmp.c
index e0c387c..01f7685 100644
--- a/hmp.c
+++ b/hmp.c
@@ -351,7 +351,8 @@ void hmp_info_block(Monitor *mon, const QDict *qdict)
                             " iops_wr=%" PRId64
                             " iops_max=%" PRId64
                             " iops_rd_max=%" PRId64
-                            " iops_wr_max=%" PRId64 "\n",
+                            " iops_wr_max=%" PRId64
+                            " iops_sector_count=%" PRId64 "\n",
                             info->value->inserted->bps,
                             info->value->inserted->bps_rd,
                             info->value->inserted->bps_wr,
@@ -363,7 +364,8 @@ void hmp_info_block(Monitor *mon, const QDict *qdict)
                             info->value->inserted->iops_wr,
                             info->value->inserted->iops_max,
                             info->value->inserted->iops_rd_max,
-                            info->value->inserted->iops_wr_max);
+                            info->value->inserted->iops_wr_max,
+                            info->value->inserted->iops_sector_count);
         } else {
             monitor_printf(mon, " [not inserted]");
         }
@@ -1124,6 +1126,8 @@ void hmp_block_set_io_throttle(Monitor *mon, const QDict *qdict)
                               false,
                               0,
                               false,
+                              0,
+                              false, /* No default I/O size */
                               0, &err);
     hmp_handle_error(mon, &err);
 }
diff --git a/qapi-schema.json b/qapi-schema.json
index 5e5461e..c7b1d5e 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -797,6 +797,8 @@
 #
 # @iops_wr_max: #optional write I/O operations max (Since 1.7)
 #
+# @iops_sector_count: #optional an I/O size in sector (Since 1.6)
+#
 # Since: 0.14.0
 #
 # Notes: This interface is only found in @BlockInfo.
@@ -810,7 +812,8 @@
             'image': 'ImageInfo',
             '*bps_max': 'int', '*bps_rd_max': 'int',
             '*bps_wr_max': 'int', '*iops_max': 'int',
-            '*iops_rd_max': 'int', '*iops_wr_max': 'int' }}
+            '*iops_rd_max': 'int', '*iops_wr_max': 'int',
+            '*iops_sector_count': 'int' } }
 
 ##
 # @BlockDeviceIoStatus:
@@ -2201,6 +2204,8 @@
 #
 # @iops_wr_max: #optional write I/O operations max (Since 1.7)
 #
+# @iops_sector_count: #optional an I/O size in sector (Since 1.6)
+#
 # Returns: Nothing on success
 #          If @device is not a valid block device, DeviceNotFound
 #
@@ -2211,7 +2216,8 @@
             'iops': 'int', 'iops_rd': 'int', 'iops_wr': 'int',
             '*bps_max': 'int', '*bps_rd_max': 'int',
             '*bps_wr_max': 'int', '*iops_max': 'int',
-            '*iops_rd_max': 'int', '*iops_wr_max': 'int' }}
+            '*iops_rd_max': 'int', '*iops_wr_max': 'int',
+            '*iops_sector_count': 'int' }}
 
 ##
 # @block-stream:
diff --git a/qemu-options.hx b/qemu-options.hx
index 3aa8f9e..c539dd0 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -411,7 +411,7 @@ DEF("drive", HAS_ARG, QEMU_OPTION_drive,
     "       [,readonly=on|off][,copy-on-read=on|off]\n"
     "       [[,bps=b]|[[,bps_rd=r][,bps_wr=w]]][[,iops=i]|[[,iops_rd=r]"
     "        [,iops_wr=w][,bps_max=bt]|[[,bps_rd_max=rt][,bps_wr_max=wt]]]"
-    "       [[,iops_max=it]|[[,iops_rd_max=rt][,iops_wr_max=wt]]\n"
+    "       [[,iops_max=it]|[[,iops_rd_max=rt][,iops_wr_max=wt][,iops_sector_count=cnt]]\n"
     "                use 'file' as a drive image\n", QEMU_ARCH_ALL)
 STEXI
 @item -drive @var{option}[,@var{option}[,@var{option}[,...]]]
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 1610831..bcd255f 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -1389,7 +1389,7 @@ EQMP
 
     {
         .name       = "block_set_io_throttle",
-        .args_type  = "device:B,bps:l,bps_rd:l,bps_wr:l,iops:l,iops_rd:l,iops_wr:l,bps_max:l?,bps_rd_max:l?,bps_wr_max:l?,iops_max:l?,iops_rd_max:l?,iops_wr_max:l?",
+        .args_type  = "device:B,bps:l,bps_rd:l,bps_wr:l,iops:l,iops_rd:l,iops_wr:l,bps_max:l?,bps_rd_max:l?,bps_wr_max:l?,iops_max:l?,iops_rd_max:l?,iops_wr_max:l?,iops_sector_count:l?",
         .mhandler.cmd_new = qmp_marshal_input_block_set_io_throttle,
     },
 
@@ -1414,6 +1414,7 @@ Arguments:
 - "iops_max":  total I/O operations max(json-int)
 - "iops_rd_max":  read I/O operations max(json-int)
 - "iops_wr_max":  write I/O operations max(json-int)
+- "iops_sector_count":  I/O sector count when limiting(json-int)
 
 Example:
 
@@ -1429,7 +1430,8 @@ Example:
                                                "bps_wr_max": "0",
                                                "iops_max": "0",
                                                "iops_rd_max": "0",
-                                               "iops_wr_max": "0" } }
+                                               "iops_wr_max": "0",
+                                               "iops_sector_count": "0" } }
 <- { "return": {} }
 
 EQMP
@@ -1776,6 +1778,7 @@ Each json-object contain the following:
          - "iops_max":  total I/O operations max(json-int)
          - "iops_rd_max":  read I/O operations max(json-int)
          - "iops_wr_max":  write I/O operations max(json-int)
+         - "iops_sector_count":  I/O sector count when limiting(json-int)
          - "image": the detail of the image, it is a json-object containing
             the following:
              - "filename": image file name (json-string)
@@ -1851,6 +1854,7 @@ Example:
                "iops_max": "0",
                "iops_rd_max": "0",
                "iops_wr_max": "0",
+               "iops_sector_count": "0",
                "image":{
                   "filename":"disks/test.qcow2",
                   "format":"qcow2",
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH V5 5/5] block: Add iops_sector_count to do the iops accounting for a given io size.
  2013-08-12 16:53 ` [Qemu-devel] [PATCH V5 5/5] block: Add iops_sector_count to do the iops accounting for a given io size Benoît Canet
@ 2013-08-14  9:48   ` Fam Zheng
  2013-08-14 18:31     ` Benoît Canet
  2013-08-16 12:10   ` Stefan Hajnoczi
  1 sibling, 1 reply; 17+ messages in thread
From: Fam Zheng @ 2013-08-14  9:48 UTC (permalink / raw)
  To: Benoît Canet; +Cc: kwolf, pbonzini, qemu-devel, stefanha

On Mon, 08/12 18:53, Benoît Canet wrote:
> This feature can be used in case where users are avoiding the iops limit by
> doing jumbo I/Os hammering the storage backend.
> 
You are accounting io ops by the op size:
(unit = size / iops_sector_count), which is equivelant to bps.  So I'm
still not understanding why can't such jumbo IO be throttled by bps
limits.

> Signed-off-by: Benoit Canet <benoit@irqsave.net>
> ---
>  block/qapi.c     |    3 +++
>  blockdev.c       |   22 +++++++++++++++++++---
>  hmp.c            |    8 ++++++--
>  qapi-schema.json |   10 ++++++++--
>  qemu-options.hx  |    2 +-
>  qmp-commands.hx  |    8 ++++++--
>  6 files changed, 43 insertions(+), 10 deletions(-)
> 
> diff --git a/block/qapi.c b/block/qapi.c
> index 5ba10f4..f98ff64 100644
> --- a/block/qapi.c
> +++ b/block/qapi.c
> @@ -258,6 +258,9 @@ void bdrv_query_info(BlockDriverState *bs,
>                  cfg.buckets[THROTTLE_OPS_WRITE].max;
>              info->inserted->iops_wr_max     =
>                  cfg.buckets[THROTTLE_OPS_WRITE].max;
> +
> +            info->inserted->has_iops_sector_count = cfg.op_size;
> +            info->inserted->iops_sector_count = cfg.op_size;
>          }
>  
>          bs0 = bs;
> diff --git a/blockdev.c b/blockdev.c
> index 22016a2..e2b0ee0 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -500,7 +500,7 @@ static DriveInfo *blockdev_init(QemuOpts *all_opts,
>          qemu_opt_get_number(opts, "throttling.iops-write-max", 0);
>  
>      cfg.unit_size = BDRV_SECTOR_SIZE;
> -    cfg.op_size = 0;
> +    cfg.op_size = qemu_opt_get_number(opts, "throttling.iops-sector-count", 0);
>  
>      if (!check_throttle_config(&cfg, &error)) {
>          error_report("%s", error_get_pretty(error));
> @@ -786,6 +786,9 @@ DriveInfo *drive_init(QemuOpts *all_opts, BlockInterfaceType block_default_type)
>      qemu_opt_rename(all_opts, "bps_rd_max", "throttling.bps-read-max");
>      qemu_opt_rename(all_opts, "bps_wr_max", "throttling.bps-write-max");
>  
> +    qemu_opt_rename(all_opts,
> +                    "iops_sector_count", "throttling.iops-sector-count");
> +
>      qemu_opt_rename(all_opts, "readonly", "read-only");
>  
>      value = qemu_opt_get(all_opts, "cache");
> @@ -1285,7 +1288,9 @@ void qmp_block_set_io_throttle(const char *device, int64_t bps, int64_t bps_rd,
>                                 bool has_iops_rd_max,
>                                 int64_t iops_rd_max,
>                                 bool has_iops_wr_max,
> -                               int64_t iops_wr_max, Error **errp)
> +                               int64_t iops_wr_max,
> +                               bool has_iops_sector_count,
> +                               int64_t iops_sector_count, Error **errp)
>  {
>      ThrottleConfig cfg;
>      BlockDriverState *bs;
> @@ -1325,7 +1330,10 @@ void qmp_block_set_io_throttle(const char *device, int64_t bps, int64_t bps_rd,
>      }
>  
>      cfg.unit_size = BDRV_SECTOR_SIZE;
> -    cfg.op_size = 0;
> +
> +    if (has_iops_sector_count) {
> +        cfg.op_size = iops_sector_count;
> +    }
>  
>      if (!check_throttle_config(&cfg, errp)) {
>          return;
> @@ -2051,6 +2059,10 @@ QemuOptsList qemu_common_drive_opts = {
>              .type = QEMU_OPT_NUMBER,
>              .help = "total bytes write burst",
>          },{
> +            .name = "throttling.iops-sector-count",
> +            .type = QEMU_OPT_NUMBER,
> +            .help = "when limiting by iops max size of an I/O in sector",
> +        },{
>              .name = "copy-on-read",
>              .type = QEMU_OPT_BOOL,
>              .help = "copy read data from backing file into image file",
> @@ -2197,6 +2209,10 @@ QemuOptsList qemu_old_drive_opts = {
>              .type = QEMU_OPT_NUMBER,
>              .help = "total bytes write burst",
>          },{
> +            .name = "iops_sector_count",
> +            .type = QEMU_OPT_NUMBER,
> +            .help = "when limiting by iops max size of an I/O in sector",
> +        },{
>              .name = "copy-on-read",
>              .type = QEMU_OPT_BOOL,
>              .help = "copy read data from backing file into image file",
> diff --git a/hmp.c b/hmp.c
> index e0c387c..01f7685 100644
> --- a/hmp.c
> +++ b/hmp.c
> @@ -351,7 +351,8 @@ void hmp_info_block(Monitor *mon, const QDict *qdict)
>                              " iops_wr=%" PRId64
>                              " iops_max=%" PRId64
>                              " iops_rd_max=%" PRId64
> -                            " iops_wr_max=%" PRId64 "\n",
> +                            " iops_wr_max=%" PRId64
> +                            " iops_sector_count=%" PRId64 "\n",
>                              info->value->inserted->bps,
>                              info->value->inserted->bps_rd,
>                              info->value->inserted->bps_wr,
> @@ -363,7 +364,8 @@ void hmp_info_block(Monitor *mon, const QDict *qdict)
>                              info->value->inserted->iops_wr,
>                              info->value->inserted->iops_max,
>                              info->value->inserted->iops_rd_max,
> -                            info->value->inserted->iops_wr_max);
> +                            info->value->inserted->iops_wr_max,
> +                            info->value->inserted->iops_sector_count);
>          } else {
>              monitor_printf(mon, " [not inserted]");
>          }
> @@ -1124,6 +1126,8 @@ void hmp_block_set_io_throttle(Monitor *mon, const QDict *qdict)
>                                false,
>                                0,
>                                false,
> +                              0,
> +                              false, /* No default I/O size */
>                                0, &err);
>      hmp_handle_error(mon, &err);
>  }
> diff --git a/qapi-schema.json b/qapi-schema.json
> index 5e5461e..c7b1d5e 100644
> --- a/qapi-schema.json
> +++ b/qapi-schema.json
> @@ -797,6 +797,8 @@
>  #
>  # @iops_wr_max: #optional write I/O operations max (Since 1.7)
>  #
> +# @iops_sector_count: #optional an I/O size in sector (Since 1.6)
> +#
>  # Since: 0.14.0
>  #
>  # Notes: This interface is only found in @BlockInfo.
> @@ -810,7 +812,8 @@
>              'image': 'ImageInfo',
>              '*bps_max': 'int', '*bps_rd_max': 'int',
>              '*bps_wr_max': 'int', '*iops_max': 'int',
> -            '*iops_rd_max': 'int', '*iops_wr_max': 'int' }}
> +            '*iops_rd_max': 'int', '*iops_wr_max': 'int',
> +            '*iops_sector_count': 'int' } }
>  
>  ##
>  # @BlockDeviceIoStatus:
> @@ -2201,6 +2204,8 @@
>  #
>  # @iops_wr_max: #optional write I/O operations max (Since 1.7)
>  #
> +# @iops_sector_count: #optional an I/O size in sector (Since 1.6)
> +#
>  # Returns: Nothing on success
>  #          If @device is not a valid block device, DeviceNotFound
>  #
> @@ -2211,7 +2216,8 @@
>              'iops': 'int', 'iops_rd': 'int', 'iops_wr': 'int',
>              '*bps_max': 'int', '*bps_rd_max': 'int',
>              '*bps_wr_max': 'int', '*iops_max': 'int',
> -            '*iops_rd_max': 'int', '*iops_wr_max': 'int' }}
> +            '*iops_rd_max': 'int', '*iops_wr_max': 'int',
> +            '*iops_sector_count': 'int' }}
>  
>  ##
>  # @block-stream:
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 3aa8f9e..c539dd0 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -411,7 +411,7 @@ DEF("drive", HAS_ARG, QEMU_OPTION_drive,
>      "       [,readonly=on|off][,copy-on-read=on|off]\n"
>      "       [[,bps=b]|[[,bps_rd=r][,bps_wr=w]]][[,iops=i]|[[,iops_rd=r]"
>      "        [,iops_wr=w][,bps_max=bt]|[[,bps_rd_max=rt][,bps_wr_max=wt]]]"
> -    "       [[,iops_max=it]|[[,iops_rd_max=rt][,iops_wr_max=wt]]\n"
> +    "       [[,iops_max=it]|[[,iops_rd_max=rt][,iops_wr_max=wt][,iops_sector_count=cnt]]\n"
>      "                use 'file' as a drive image\n", QEMU_ARCH_ALL)
>  STEXI
>  @item -drive @var{option}[,@var{option}[,@var{option}[,...]]]
> diff --git a/qmp-commands.hx b/qmp-commands.hx
> index 1610831..bcd255f 100644
> --- a/qmp-commands.hx
> +++ b/qmp-commands.hx
> @@ -1389,7 +1389,7 @@ EQMP
>  
>      {
>          .name       = "block_set_io_throttle",
> -        .args_type  = "device:B,bps:l,bps_rd:l,bps_wr:l,iops:l,iops_rd:l,iops_wr:l,bps_max:l?,bps_rd_max:l?,bps_wr_max:l?,iops_max:l?,iops_rd_max:l?,iops_wr_max:l?",
> +        .args_type  = "device:B,bps:l,bps_rd:l,bps_wr:l,iops:l,iops_rd:l,iops_wr:l,bps_max:l?,bps_rd_max:l?,bps_wr_max:l?,iops_max:l?,iops_rd_max:l?,iops_wr_max:l?,iops_sector_count:l?",
>          .mhandler.cmd_new = qmp_marshal_input_block_set_io_throttle,
>      },
>  
> @@ -1414,6 +1414,7 @@ Arguments:
>  - "iops_max":  total I/O operations max(json-int)
>  - "iops_rd_max":  read I/O operations max(json-int)
>  - "iops_wr_max":  write I/O operations max(json-int)
> +- "iops_sector_count":  I/O sector count when limiting(json-int)
>  
>  Example:
>  
> @@ -1429,7 +1430,8 @@ Example:
>                                                 "bps_wr_max": "0",
>                                                 "iops_max": "0",
>                                                 "iops_rd_max": "0",
> -                                               "iops_wr_max": "0" } }
> +                                               "iops_wr_max": "0",
> +                                               "iops_sector_count": "0" } }
>  <- { "return": {} }
>  
>  EQMP
> @@ -1776,6 +1778,7 @@ Each json-object contain the following:
>           - "iops_max":  total I/O operations max(json-int)
>           - "iops_rd_max":  read I/O operations max(json-int)
>           - "iops_wr_max":  write I/O operations max(json-int)
> +         - "iops_sector_count":  I/O sector count when limiting(json-int)
>           - "image": the detail of the image, it is a json-object containing
>              the following:
>               - "filename": image file name (json-string)
> @@ -1851,6 +1854,7 @@ Example:
>                 "iops_max": "0",
>                 "iops_rd_max": "0",
>                 "iops_wr_max": "0",
> +               "iops_sector_count": "0",
>                 "image":{
>                    "filename":"disks/test.qcow2",
>                    "format":"qcow2",
> -- 
> 1.7.10.4
> 
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH V5 5/5] block: Add iops_sector_count to do the iops accounting for a given io size.
  2013-08-14  9:48   ` Fam Zheng
@ 2013-08-14 18:31     ` Benoît Canet
  0 siblings, 0 replies; 17+ messages in thread
From: Benoît Canet @ 2013-08-14 18:31 UTC (permalink / raw)
  To: Fam Zheng; +Cc: kwolf, pbonzini, qemu-devel, stefanha

Le Wednesday 14 Aug 2013 à 17:48:46 (+0800), Fam Zheng a écrit :
> On Mon, 08/12 18:53, Benoît Canet wrote:
> > This feature can be used in case where users are avoiding the iops limit by
> > doing jumbo I/Os hammering the storage backend.
> > 
> You are accounting io ops by the op size:
> (unit = size / iops_sector_count), which is equivelant to bps.  So I'm
> still not understanding why can't such jumbo IO be throttled by bps
> limits.

I am aiming at providing to the users an amazon like feature set.
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AmazonEBS.html

>From the provisioned volume section:

The read and write operations have a block size of 16 KB or less. If the I/O
increases above 16 KB, the IOPS delivered drop in proportion to the increase in
the size of the I/O. For example, a 1000 IOPS volume can deliver 1000 16 KB
writes per second, 500 32 KB writes per second, or 250 64 KB writes per second.

Best regards

Benoît

> 
> > Signed-off-by: Benoit Canet <benoit@irqsave.net>
> > ---
> >  block/qapi.c     |    3 +++
> >  blockdev.c       |   22 +++++++++++++++++++---
> >  hmp.c            |    8 ++++++--
> >  qapi-schema.json |   10 ++++++++--
> >  qemu-options.hx  |    2 +-
> >  qmp-commands.hx  |    8 ++++++--
> >  6 files changed, 43 insertions(+), 10 deletions(-)
> > 
> > diff --git a/block/qapi.c b/block/qapi.c
> > index 5ba10f4..f98ff64 100644
> > --- a/block/qapi.c
> > +++ b/block/qapi.c
> > @@ -258,6 +258,9 @@ void bdrv_query_info(BlockDriverState *bs,
> >                  cfg.buckets[THROTTLE_OPS_WRITE].max;
> >              info->inserted->iops_wr_max     =
> >                  cfg.buckets[THROTTLE_OPS_WRITE].max;
> > +
> > +            info->inserted->has_iops_sector_count = cfg.op_size;
> > +            info->inserted->iops_sector_count = cfg.op_size;
> >          }
> >  
> >          bs0 = bs;
> > diff --git a/blockdev.c b/blockdev.c
> > index 22016a2..e2b0ee0 100644
> > --- a/blockdev.c
> > +++ b/blockdev.c
> > @@ -500,7 +500,7 @@ static DriveInfo *blockdev_init(QemuOpts *all_opts,
> >          qemu_opt_get_number(opts, "throttling.iops-write-max", 0);
> >  
> >      cfg.unit_size = BDRV_SECTOR_SIZE;
> > -    cfg.op_size = 0;
> > +    cfg.op_size = qemu_opt_get_number(opts, "throttling.iops-sector-count", 0);
> >  
> >      if (!check_throttle_config(&cfg, &error)) {
> >          error_report("%s", error_get_pretty(error));
> > @@ -786,6 +786,9 @@ DriveInfo *drive_init(QemuOpts *all_opts, BlockInterfaceType block_default_type)
> >      qemu_opt_rename(all_opts, "bps_rd_max", "throttling.bps-read-max");
> >      qemu_opt_rename(all_opts, "bps_wr_max", "throttling.bps-write-max");
> >  
> > +    qemu_opt_rename(all_opts,
> > +                    "iops_sector_count", "throttling.iops-sector-count");
> > +
> >      qemu_opt_rename(all_opts, "readonly", "read-only");
> >  
> >      value = qemu_opt_get(all_opts, "cache");
> > @@ -1285,7 +1288,9 @@ void qmp_block_set_io_throttle(const char *device, int64_t bps, int64_t bps_rd,
> >                                 bool has_iops_rd_max,
> >                                 int64_t iops_rd_max,
> >                                 bool has_iops_wr_max,
> > -                               int64_t iops_wr_max, Error **errp)
> > +                               int64_t iops_wr_max,
> > +                               bool has_iops_sector_count,
> > +                               int64_t iops_sector_count, Error **errp)
> >  {
> >      ThrottleConfig cfg;
> >      BlockDriverState *bs;
> > @@ -1325,7 +1330,10 @@ void qmp_block_set_io_throttle(const char *device, int64_t bps, int64_t bps_rd,
> >      }
> >  
> >      cfg.unit_size = BDRV_SECTOR_SIZE;
> > -    cfg.op_size = 0;
> > +
> > +    if (has_iops_sector_count) {
> > +        cfg.op_size = iops_sector_count;
> > +    }
> >  
> >      if (!check_throttle_config(&cfg, errp)) {
> >          return;
> > @@ -2051,6 +2059,10 @@ QemuOptsList qemu_common_drive_opts = {
> >              .type = QEMU_OPT_NUMBER,
> >              .help = "total bytes write burst",
> >          },{
> > +            .name = "throttling.iops-sector-count",
> > +            .type = QEMU_OPT_NUMBER,
> > +            .help = "when limiting by iops max size of an I/O in sector",
> > +        },{
> >              .name = "copy-on-read",
> >              .type = QEMU_OPT_BOOL,
> >              .help = "copy read data from backing file into image file",
> > @@ -2197,6 +2209,10 @@ QemuOptsList qemu_old_drive_opts = {
> >              .type = QEMU_OPT_NUMBER,
> >              .help = "total bytes write burst",
> >          },{
> > +            .name = "iops_sector_count",
> > +            .type = QEMU_OPT_NUMBER,
> > +            .help = "when limiting by iops max size of an I/O in sector",
> > +        },{
> >              .name = "copy-on-read",
> >              .type = QEMU_OPT_BOOL,
> >              .help = "copy read data from backing file into image file",
> > diff --git a/hmp.c b/hmp.c
> > index e0c387c..01f7685 100644
> > --- a/hmp.c
> > +++ b/hmp.c
> > @@ -351,7 +351,8 @@ void hmp_info_block(Monitor *mon, const QDict *qdict)
> >                              " iops_wr=%" PRId64
> >                              " iops_max=%" PRId64
> >                              " iops_rd_max=%" PRId64
> > -                            " iops_wr_max=%" PRId64 "\n",
> > +                            " iops_wr_max=%" PRId64
> > +                            " iops_sector_count=%" PRId64 "\n",
> >                              info->value->inserted->bps,
> >                              info->value->inserted->bps_rd,
> >                              info->value->inserted->bps_wr,
> > @@ -363,7 +364,8 @@ void hmp_info_block(Monitor *mon, const QDict *qdict)
> >                              info->value->inserted->iops_wr,
> >                              info->value->inserted->iops_max,
> >                              info->value->inserted->iops_rd_max,
> > -                            info->value->inserted->iops_wr_max);
> > +                            info->value->inserted->iops_wr_max,
> > +                            info->value->inserted->iops_sector_count);
> >          } else {
> >              monitor_printf(mon, " [not inserted]");
> >          }
> > @@ -1124,6 +1126,8 @@ void hmp_block_set_io_throttle(Monitor *mon, const QDict *qdict)
> >                                false,
> >                                0,
> >                                false,
> > +                              0,
> > +                              false, /* No default I/O size */
> >                                0, &err);
> >      hmp_handle_error(mon, &err);
> >  }
> > diff --git a/qapi-schema.json b/qapi-schema.json
> > index 5e5461e..c7b1d5e 100644
> > --- a/qapi-schema.json
> > +++ b/qapi-schema.json
> > @@ -797,6 +797,8 @@
> >  #
> >  # @iops_wr_max: #optional write I/O operations max (Since 1.7)
> >  #
> > +# @iops_sector_count: #optional an I/O size in sector (Since 1.6)
> > +#
> >  # Since: 0.14.0
> >  #
> >  # Notes: This interface is only found in @BlockInfo.
> > @@ -810,7 +812,8 @@
> >              'image': 'ImageInfo',
> >              '*bps_max': 'int', '*bps_rd_max': 'int',
> >              '*bps_wr_max': 'int', '*iops_max': 'int',
> > -            '*iops_rd_max': 'int', '*iops_wr_max': 'int' }}
> > +            '*iops_rd_max': 'int', '*iops_wr_max': 'int',
> > +            '*iops_sector_count': 'int' } }
> >  
> >  ##
> >  # @BlockDeviceIoStatus:
> > @@ -2201,6 +2204,8 @@
> >  #
> >  # @iops_wr_max: #optional write I/O operations max (Since 1.7)
> >  #
> > +# @iops_sector_count: #optional an I/O size in sector (Since 1.6)
> > +#
> >  # Returns: Nothing on success
> >  #          If @device is not a valid block device, DeviceNotFound
> >  #
> > @@ -2211,7 +2216,8 @@
> >              'iops': 'int', 'iops_rd': 'int', 'iops_wr': 'int',
> >              '*bps_max': 'int', '*bps_rd_max': 'int',
> >              '*bps_wr_max': 'int', '*iops_max': 'int',
> > -            '*iops_rd_max': 'int', '*iops_wr_max': 'int' }}
> > +            '*iops_rd_max': 'int', '*iops_wr_max': 'int',
> > +            '*iops_sector_count': 'int' }}
> >  
> >  ##
> >  # @block-stream:
> > diff --git a/qemu-options.hx b/qemu-options.hx
> > index 3aa8f9e..c539dd0 100644
> > --- a/qemu-options.hx
> > +++ b/qemu-options.hx
> > @@ -411,7 +411,7 @@ DEF("drive", HAS_ARG, QEMU_OPTION_drive,
> >      "       [,readonly=on|off][,copy-on-read=on|off]\n"
> >      "       [[,bps=b]|[[,bps_rd=r][,bps_wr=w]]][[,iops=i]|[[,iops_rd=r]"
> >      "        [,iops_wr=w][,bps_max=bt]|[[,bps_rd_max=rt][,bps_wr_max=wt]]]"
> > -    "       [[,iops_max=it]|[[,iops_rd_max=rt][,iops_wr_max=wt]]\n"
> > +    "       [[,iops_max=it]|[[,iops_rd_max=rt][,iops_wr_max=wt][,iops_sector_count=cnt]]\n"
> >      "                use 'file' as a drive image\n", QEMU_ARCH_ALL)
> >  STEXI
> >  @item -drive @var{option}[,@var{option}[,@var{option}[,...]]]
> > diff --git a/qmp-commands.hx b/qmp-commands.hx
> > index 1610831..bcd255f 100644
> > --- a/qmp-commands.hx
> > +++ b/qmp-commands.hx
> > @@ -1389,7 +1389,7 @@ EQMP
> >  
> >      {
> >          .name       = "block_set_io_throttle",
> > -        .args_type  = "device:B,bps:l,bps_rd:l,bps_wr:l,iops:l,iops_rd:l,iops_wr:l,bps_max:l?,bps_rd_max:l?,bps_wr_max:l?,iops_max:l?,iops_rd_max:l?,iops_wr_max:l?",
> > +        .args_type  = "device:B,bps:l,bps_rd:l,bps_wr:l,iops:l,iops_rd:l,iops_wr:l,bps_max:l?,bps_rd_max:l?,bps_wr_max:l?,iops_max:l?,iops_rd_max:l?,iops_wr_max:l?,iops_sector_count:l?",
> >          .mhandler.cmd_new = qmp_marshal_input_block_set_io_throttle,
> >      },
> >  
> > @@ -1414,6 +1414,7 @@ Arguments:
> >  - "iops_max":  total I/O operations max(json-int)
> >  - "iops_rd_max":  read I/O operations max(json-int)
> >  - "iops_wr_max":  write I/O operations max(json-int)
> > +- "iops_sector_count":  I/O sector count when limiting(json-int)
> >  
> >  Example:
> >  
> > @@ -1429,7 +1430,8 @@ Example:
> >                                                 "bps_wr_max": "0",
> >                                                 "iops_max": "0",
> >                                                 "iops_rd_max": "0",
> > -                                               "iops_wr_max": "0" } }
> > +                                               "iops_wr_max": "0",
> > +                                               "iops_sector_count": "0" } }
> >  <- { "return": {} }
> >  
> >  EQMP
> > @@ -1776,6 +1778,7 @@ Each json-object contain the following:
> >           - "iops_max":  total I/O operations max(json-int)
> >           - "iops_rd_max":  read I/O operations max(json-int)
> >           - "iops_wr_max":  write I/O operations max(json-int)
> > +         - "iops_sector_count":  I/O sector count when limiting(json-int)
> >           - "image": the detail of the image, it is a json-object containing
> >              the following:
> >               - "filename": image file name (json-string)
> > @@ -1851,6 +1854,7 @@ Example:
> >                 "iops_max": "0",
> >                 "iops_rd_max": "0",
> >                 "iops_wr_max": "0",
> > +               "iops_sector_count": "0",
> >                 "image":{
> >                    "filename":"disks/test.qcow2",
> >                    "format":"qcow2",
> > -- 
> > 1.7.10.4
> > 
> > 
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH V5 5/5] block: Add iops_sector_count to do the iops accounting for a given io size.
  2013-08-12 16:53 ` [Qemu-devel] [PATCH V5 5/5] block: Add iops_sector_count to do the iops accounting for a given io size Benoît Canet
  2013-08-14  9:48   ` Fam Zheng
@ 2013-08-16 12:10   ` Stefan Hajnoczi
  1 sibling, 0 replies; 17+ messages in thread
From: Stefan Hajnoczi @ 2013-08-16 12:10 UTC (permalink / raw)
  To: Benoît Canet; +Cc: kwolf, pbonzini, qemu-devel, stefanha

On Mon, Aug 12, 2013 at 06:53:16PM +0200, Benoît Canet wrote:
> diff --git a/block/qapi.c b/block/qapi.c
> index 5ba10f4..f98ff64 100644
> --- a/block/qapi.c
> +++ b/block/qapi.c
> @@ -258,6 +258,9 @@ void bdrv_query_info(BlockDriverState *bs,
>                  cfg.buckets[THROTTLE_OPS_WRITE].max;
>              info->inserted->iops_wr_max     =
>                  cfg.buckets[THROTTLE_OPS_WRITE].max;
> +
> +            info->inserted->has_iops_sector_count = cfg.op_size;
> +            info->inserted->iops_sector_count = cfg.op_size;

Why introduce the concept of sectors here?  Throttling uses bytes and
that will help prevent confusion if the user has 4 KB sector disks, for
example, instead of 512 bytes.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH V5 0/5] Continuous Leaky Bucket Throttling
  2013-08-12 16:53 [Qemu-devel] [PATCH V5 0/5] Continuous Leaky Bucket Throttling Benoît Canet
                   ` (4 preceding siblings ...)
  2013-08-12 16:53 ` [Qemu-devel] [PATCH V5 5/5] block: Add iops_sector_count to do the iops accounting for a given io size Benoît Canet
@ 2013-08-16 12:14 ` Stefan Hajnoczi
  5 siblings, 0 replies; 17+ messages in thread
From: Stefan Hajnoczi @ 2013-08-16 12:14 UTC (permalink / raw)
  To: Benoît Canet; +Cc: kwolf, pbonzini, qemu-devel, stefanha

On Mon, Aug 12, 2013 at 06:53:11PM +0200, Benoît Canet wrote:
> This patchset implement continous leaky bucket throttling.
> 
> It use two requests queue to enable to do silly unbalanced throttling like
> block_set_io_throttle 0 0 0 0 6000 1
> 
> It use two timer to get the timer callbacks and the throttle.c code simple
> 
> It add unit tests.
> 
> The throttling core is pretty solid and the surrouding of the patchset needs
> polish. (new options ...)

Looks promising.  I still need to try it out and get familiar with how
bursting ("max") works but I've left comments on the patches for now.

Stefan

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2013-08-19 11:28 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-08-12 16:53 [Qemu-devel] [PATCH V5 0/5] Continuous Leaky Bucket Throttling Benoît Canet
2013-08-12 16:53 ` [Qemu-devel] [PATCH V5 1/5] throttle: Add a new throttling API implementing continuous leaky bucket Benoît Canet
2013-08-14  8:52   ` Fam Zheng
2013-08-16 11:45   ` Stefan Hajnoczi
2013-08-19 11:27     ` Paolo Bonzini
2013-08-12 16:53 ` [Qemu-devel] [PATCH V5 2/5] throttle: Add units tests Benoît Canet
2013-08-12 16:53 ` [Qemu-devel] [PATCH V5 3/5] block: Enable the new throttling code in the block layer Benoît Canet
2013-08-14  9:31   ` Fam Zheng
2013-08-14  9:50     ` Fam Zheng
2013-08-16 12:02   ` Stefan Hajnoczi
2013-08-12 16:53 ` [Qemu-devel] [PATCH V5 4/5] block: Add support for throttling burst max in QMP and the command line Benoît Canet
2013-08-16 12:07   ` Stefan Hajnoczi
2013-08-12 16:53 ` [Qemu-devel] [PATCH V5 5/5] block: Add iops_sector_count to do the iops accounting for a given io size Benoît Canet
2013-08-14  9:48   ` Fam Zheng
2013-08-14 18:31     ` Benoît Canet
2013-08-16 12:10   ` Stefan Hajnoczi
2013-08-16 12:14 ` [Qemu-devel] [PATCH V5 0/5] Continuous Leaky Bucket Throttling Stefan Hajnoczi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).