qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH V4 0/5] Continuous Leaky Bucket Throttling
@ 2013-08-08 14:29 Benoît Canet
  2013-08-08 14:29 ` [Qemu-devel] [PATCH V4 1/5] throttle: Add a new throttling API implementing continuous leaky bucket Benoît Canet
                   ` (5 more replies)
  0 siblings, 6 replies; 10+ messages in thread
From: Benoît Canet @ 2013-08-08 14:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

This patchset implement continous leaky bucket throttling.

It use two requests queue to enable to do silly unbalanced throttling like
block_set_io_throttle 0 0 0 0 6000 1

It use two timer to get the timer callbacks and the throttle.c code simple

It add unit tests.

It fail with the following error message at exit and I don't know why yet.
qemu-system-x86_64: block.c:1489: bdrv_drain_all: Assertion
`((&bs->tracked_requests)->lh_first == ((void *)0))' failed.

The throttling core is pretty solid and the surrouding of the patchset needs
polish. (new options ...)

since previous version:
    wrap qemu-option.hx declararation [Eric]
    continuus -> continuous [Fam]
    unit test [Paolo]

Benoît Canet (5):
  throttle: Add a new throttling API implementing continuous leaky
    bucket.
  throttle: Add units tests
  block: Enable the new throttling code in the block layer.
  block: Add support for throttling burst max in QMP and the command
    line.
  block: Add iops_sector_count to do the iops accounting for a given io
    size.

 block.c                   |  351 ++++++++++----------------------
 block/qapi.c              |   50 +++--
 blockdev.c                |  207 ++++++++++++++-----
 hmp.c                     |   36 +++-
 include/block/block.h     |    1 -
 include/block/block_int.h |   32 +--
 include/qemu/throttle.h   |  105 ++++++++++
 qapi-schema.json          |   40 +++-
 qemu-options.hx           |    4 +-
 qmp-commands.hx           |   34 +++-
 tests/Makefile            |    2 +
 tests/test-throttle.c     |  494 +++++++++++++++++++++++++++++++++++++++++++++
 util/Makefile.objs        |    1 +
 util/throttle.c           |  391 +++++++++++++++++++++++++++++++++++
 14 files changed, 1405 insertions(+), 343 deletions(-)
 create mode 100644 include/qemu/throttle.h
 create mode 100644 tests/test-throttle.c
 create mode 100644 util/throttle.c

-- 
1.7.10.4

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Qemu-devel] [PATCH V4 1/5] throttle: Add a new throttling API implementing continuous leaky bucket.
  2013-08-08 14:29 [Qemu-devel] [PATCH V4 0/5] Continuous Leaky Bucket Throttling Benoît Canet
@ 2013-08-08 14:29 ` Benoît Canet
  2013-08-08 14:29 ` [Qemu-devel] [PATCH V4 2/5] throttle: Add units tests Benoît Canet
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Benoît Canet @ 2013-08-08 14:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

Implement the continuous leaky bucket algorithm devised on IRC as a separate
module.

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 include/qemu/throttle.h |  105 +++++++++++++
 util/Makefile.objs      |    1 +
 util/throttle.c         |  391 +++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 497 insertions(+)
 create mode 100644 include/qemu/throttle.h
 create mode 100644 util/throttle.c

diff --git a/include/qemu/throttle.h b/include/qemu/throttle.h
new file mode 100644
index 0000000..e03bc3e
--- /dev/null
+++ b/include/qemu/throttle.h
@@ -0,0 +1,105 @@
+/*
+ * QEMU throttling infrastructure
+ *
+ * Copyright (C) Nodalink, SARL. 2013
+ *
+ * Author:
+ *   Benoît Canet <benoit.canet@irqsave.net>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 or
+ * (at your option) version 3 of the License.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef THROTTLING_H
+#define THROTTLING_H
+
+#include <stdint.h>
+#include "qemu-common.h"
+#include "qemu/timer.h"
+
+#define NANOSECONDS_PER_SECOND  1000000000.0
+
+#define BUCKETS_COUNT 6
+
+typedef enum {
+    THROTTLE_BPS_TOTAL = 0,
+    THROTTLE_BPS_READ  = 1,
+    THROTTLE_BPS_WRITE = 2,
+    THROTTLE_OPS_TOTAL = 3,
+    THROTTLE_OPS_READ  = 4,
+    THROTTLE_OPS_WRITE = 5,
+} BucketType;
+
+typedef struct LeakyBucket {
+    double  ups;            /* units per second */
+    double  max;            /* leaky bucket max in units */
+    double  bucket;         /* bucket in units */
+} LeakyBucket;
+
+/* The following structure is used to configure a ThrottleState
+ * It contains a bit of state: the bucket field of the LeakyBucket structure.
+ * However it allows to keep the code clean and the bucket field is reset to
+ * zero at the right time.
+ */
+typedef struct ThrottleConfig {
+    LeakyBucket buckets[6]; /* leaky buckets */
+    uint64_t unit_size;     /* size of an unit in bytes */
+    uint64_t op_size;       /* size of an operation in units */
+} ThrottleConfig;
+
+typedef struct ThrottleState {
+    ThrottleConfig cfg;     /* configuration */
+    int64_t previous_leak;  /* timestamp of the last leak done */
+    QEMUTimer * timers[2];  /* timers used to do the throttling */
+    QEMUClock *clock;       /* the clock used */
+} ThrottleState;
+
+/* operations on single leaky buckets */
+void throttle_leak_bucket(LeakyBucket *bkt, int64_t delta);
+
+int64_t throttle_compute_wait(LeakyBucket *bkt);
+
+/* expose timer computation function for unit tests */
+bool throttle_compute_timer(ThrottleState *ts,
+                            bool is_write,
+                            int64_t now,
+                            int64_t *next_timer);
+
+/* init/destroy cycle */
+void throttle_init(ThrottleState *ts,
+                   QEMUClock *clock,
+                   void (read_timer)(void *),
+                   void (write_timer)(void *),
+                   void *timer_opaque);
+
+void throttle_destroy(ThrottleState *ts);
+
+bool throttle_have_timer(ThrottleState *ts);
+
+/* configuration */
+bool throttle_enabled(ThrottleConfig *cfg);
+
+bool throttle_conflicting(ThrottleConfig *cfg);
+
+bool throttle_is_valid(ThrottleConfig *cfg);
+
+void throttle_config(ThrottleState *ts, ThrottleConfig *cfg);
+
+void throttle_get_config(ThrottleState *ts, ThrottleConfig *cfg);
+
+/* usage */
+bool throttle_allowed(ThrottleState *ts, bool is_write);
+
+void throttle_account(ThrottleState *ts, bool is_write, uint64_t size);
+
+#endif
diff --git a/util/Makefile.objs b/util/Makefile.objs
index dc72ab0..2bb13a2 100644
--- a/util/Makefile.objs
+++ b/util/Makefile.objs
@@ -11,3 +11,4 @@ util-obj-y += iov.o aes.o qemu-config.o qemu-sockets.o uri.o notify.o
 util-obj-y += qemu-option.o qemu-progress.o
 util-obj-y += hexdump.o
 util-obj-y += crc32c.o
+util-obj-y += throttle.o
diff --git a/util/throttle.c b/util/throttle.c
new file mode 100644
index 0000000..2f25d44
--- /dev/null
+++ b/util/throttle.c
@@ -0,0 +1,391 @@
+/*
+ * QEMU throttling infrastructure
+ *
+ * Copyright (C) Nodalink, SARL. 2013
+ *
+ * Author:
+ *   Benoît Canet <benoit.canet@irqsave.net>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 or
+ * (at your option) version 3 of the License.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/throttle.h"
+#include "qemu/timer.h"
+
+/* This function make a bucket leak
+ *
+ * @bkt:   the bucket to make leak
+ * @delta: the time delta
+ */
+void throttle_leak_bucket(LeakyBucket *bkt, int64_t delta)
+{
+    double leak;
+
+    /* compute how much to leak */
+    leak = (bkt->ups * (double) delta) / NANOSECONDS_PER_SECOND;
+
+    /* make the bucket leak */
+    bkt->bucket = MAX(bkt->bucket - leak, 0);
+}
+
+/* Calculate the time delta since last leak and make proportionals leaks
+ *
+ * @now:      the current timestamp in ns
+ */
+static void throttle_do_leak(ThrottleState *ts, int64_t now)
+{
+    /* compute the time elapsed since the last leak */
+    int64_t delta = now - ts->previous_leak;
+    int i;
+
+    ts->previous_leak = now;
+
+    if (delta <= 0) {
+        return;
+    }
+
+    /* make each bucket leak */
+    for (i = 0; i < BUCKETS_COUNT; i++) {
+        throttle_leak_bucket(&ts->cfg.buckets[i], delta);
+    }
+}
+
+/* do the real job of computing the time to wait
+ *
+ * @limit: the throttling limit
+ * @extra: the number of operation to delay
+ * @ret:   the time to wait in ns
+ */
+static int64_t throttle_do_compute_wait(double limit, double extra)
+{
+    double wait = extra * NANOSECONDS_PER_SECOND;
+    wait /= limit;
+    return wait;
+}
+
+/* This function compute the wait time in ns that a leaky bucket should trigger
+ *
+ * @bkt: the leaky bucket we operate on
+ * @ret: the resulting wait time in ns or 0 if the operation can go through
+ */
+int64_t throttle_compute_wait(LeakyBucket *bkt)
+{
+    double extra; /* the number of extra units blocking the io */
+
+    if (!bkt->ups) {
+        return 0;
+    }
+
+    extra = bkt->bucket - bkt->max;
+
+    if (extra <= 0) {
+        return 0;
+    }
+
+    return throttle_do_compute_wait(bkt->ups, extra);
+}
+
+/* This function compute the time that must be waited while this IO
+ *
+ * @is_write:   true if the current IO is a write, false if it's a read
+ * @ret:        time to wait
+ */
+static int64_t throttle_compute_wait_for(ThrottleState *ts,
+                                         bool is_write,
+                                         int64_t now)
+{
+    BucketType to_check[2][4] = { {THROTTLE_BPS_TOTAL,
+                                   THROTTLE_OPS_TOTAL,
+                                   THROTTLE_BPS_READ,
+                                   THROTTLE_OPS_READ},
+                                  {THROTTLE_BPS_TOTAL,
+                                   THROTTLE_OPS_TOTAL,
+                                   THROTTLE_BPS_WRITE,
+                                   THROTTLE_OPS_WRITE}, };
+    int64_t wait, max_wait = 0;
+    int i;
+
+    for (i = 0; i < 4; i++) {
+        BucketType index = to_check[is_write][i];
+        wait = throttle_compute_wait(&ts->cfg.buckets[index]);
+        if (wait > max_wait) {
+            max_wait = wait;
+        }
+    }
+
+    return max_wait;
+}
+
+/* compute the timer for this type of operation
+ *
+ * @is_write:   the type of operation
+ * @now:        the current clock timerstamp
+ * @next_timer: the resulting timer
+ * @ret:        true if a timer must be set
+ */
+bool throttle_compute_timer(ThrottleState *ts,
+                            bool is_write,
+                            int64_t now,
+                            int64_t *next_timer)
+{
+    int64_t wait;
+
+    /* leak proportionally to the time elapsed */
+    throttle_do_leak(ts, now);
+
+    /* compute the wait time if any */
+    wait = throttle_compute_wait_for(ts, is_write, now);
+
+    /* if the code must wait compute when the next timer should fire */
+    if (wait) {
+        *next_timer = now + wait;
+        return true;
+    }
+
+    /* else no need to wait at all */
+    *next_timer = now;
+    return false;
+}
+
+/* To be called first on the ThrottleState */
+void throttle_init(ThrottleState *ts,
+                   QEMUClock *clock,
+                   void (read_timer)(void *),
+                   void (write_timer)(void *),
+                   void *timer_opaque)
+{
+    memset(ts, 0, sizeof(ThrottleState));
+
+    ts->clock = clock;
+    ts->timers[0] = qemu_new_timer_ns(ts->clock, read_timer, timer_opaque);
+    ts->timers[1] = qemu_new_timer_ns(ts->clock, write_timer, timer_opaque);
+}
+
+/* destroy a timer */
+static void throttle_timer_destroy(QEMUTimer **timer)
+{
+    assert(*timer != NULL);
+
+    if (qemu_timer_pending(*timer)) {
+        qemu_del_timer(*timer);
+    }
+
+    qemu_free_timer(*timer);
+    *timer = NULL;
+}
+
+/* To be called last on the ThrottleState */
+void throttle_destroy(ThrottleState *ts)
+{
+    int i;
+
+    for (i = 0; i < 2; i++) {
+        throttle_timer_destroy(&ts->timers[i]);
+    }
+}
+
+/* is any throttling timer configured */
+bool throttle_have_timer(ThrottleState *ts)
+{
+    if (ts->timers[0]) {
+        return true;
+    }
+
+    return false;
+}
+
+/* Does any throttling must be done
+ *
+ * @cfg: the throttling configuration to inspect
+ * @ret: true if throttling must be done else false
+ */
+bool throttle_enabled(ThrottleConfig *cfg)
+{
+    int i;
+
+    for (i = 0; i < BUCKETS_COUNT; i++) {
+        if (cfg->buckets[i].ups > 0) {
+            return true;
+        }
+    }
+
+    return false;
+}
+
+/* return true if any two throttling parameters conflicts
+ *
+ * @cfg: the throttling configuration to inspect
+ * @ret: true if any conflict detected else false
+ */
+bool throttle_conflicting(ThrottleConfig *cfg)
+{
+    bool bps_flag, ops_flag;
+    bool bps_max_flag, ops_max_flag;
+
+    bps_flag = cfg->buckets[THROTTLE_BPS_TOTAL].ups &&
+               (cfg->buckets[THROTTLE_BPS_READ].ups ||
+                cfg->buckets[THROTTLE_BPS_WRITE].ups);
+
+    ops_flag = cfg->buckets[THROTTLE_OPS_TOTAL].ups &&
+               (cfg->buckets[THROTTLE_OPS_READ].ups ||
+                cfg->buckets[THROTTLE_OPS_WRITE].ups);
+
+    bps_max_flag = cfg->buckets[THROTTLE_BPS_TOTAL].max &&
+                  (cfg->buckets[THROTTLE_BPS_READ].max  ||
+                   cfg->buckets[THROTTLE_BPS_WRITE].max);
+
+    ops_max_flag = cfg->buckets[THROTTLE_OPS_TOTAL].max &&
+                   (cfg->buckets[THROTTLE_OPS_READ].max ||
+                   cfg->buckets[THROTTLE_OPS_WRITE].max);
+
+    return bps_flag || ops_flag || bps_max_flag || ops_max_flag;
+}
+
+/* check if a throttling configuration is valid
+ * @cfg: the throttling configuration to inspect
+ * @ret: true if valid else false
+ */
+bool throttle_is_valid(ThrottleConfig *cfg)
+{
+    bool invalid = false;
+    int i;
+
+    for (i = 0; i < BUCKETS_COUNT; i++) {
+        if (cfg->buckets[i].ups < 0) {
+            invalid = true;
+        }
+    }
+
+    for (i = 0; i < BUCKETS_COUNT; i++) {
+        if (cfg->buckets[i].max < 0) {
+            invalid = true;
+        }
+    }
+
+    return !invalid;
+}
+
+/* fix bucket parameters */
+static void throttle_fix_bucket(LeakyBucket *bkt)
+{
+    double min = bkt->ups / 10;
+    /* zero bucket level */
+    bkt->bucket = 0;
+
+    /* take care of not using cpu and also improve throttling precision */
+    if (bkt->ups &&
+        bkt->max < min) {
+        bkt->max = min;
+    }
+}
+
+/* take care of canceling a timer */
+static void throttle_cancel_timer(QEMUTimer *timer)
+{
+    assert(timer != NULL);
+    if (!qemu_timer_pending(timer)) {
+        return;
+    }
+
+    qemu_del_timer(timer);
+}
+
+/* Used to configure the throttle
+ *
+ * @ts: the throttle state we are working on
+ * @cfg: the config to set
+ */
+void throttle_config(ThrottleState *ts, ThrottleConfig *cfg)
+{
+    int i;
+
+    ts->cfg = *cfg;
+
+    for (i = 0; i < BUCKETS_COUNT; i++) {
+        throttle_fix_bucket(&ts->cfg.buckets[i]);
+    }
+
+    ts->previous_leak = qemu_get_clock_ns(ts->clock);
+
+    for (i = 0; i < 2; i++) {
+        throttle_cancel_timer(ts->timers[i]);
+    }
+}
+
+/* used to get config
+ *
+ * @ts: the throttle state we are working on
+ * @cfg: where to write the config
+ */
+void throttle_get_config(ThrottleState *ts, ThrottleConfig *cfg)
+{
+    *cfg = ts->cfg;
+}
+
+
+/* compute if an operation must be allowed and set a timer if not
+ *
+ * NOTE: this function is not unit tested due to it's usage of qemu_mod_timer
+ *
+ * @is_write: the type of operation (read/write)
+ * @ret:      true if the operation is allowed to flow else if must wait
+ */
+bool throttle_allowed(ThrottleState *ts, bool is_write)
+{
+    int64_t now = qemu_get_clock_ns(ts->clock);
+    int64_t next_timer;
+    bool must_wait;
+
+    must_wait = throttle_compute_timer(ts,
+                                       is_write,
+                                       now,
+                                       &next_timer);
+
+    /* if the request is throttled arm timer */
+    if (must_wait) {
+        qemu_mod_timer(ts->timers[is_write], next_timer);
+    }
+
+    return !must_wait;
+}
+
+/* do the accounting for this operation
+ *
+ * @is_write: the type of operation (read/write)
+ * size:      the size of the operation
+ */
+void throttle_account(ThrottleState *ts, bool is_write, uint64_t size)
+{
+    double bytes_size;
+
+    /* if cfg.op_size is not defined we will acccount exactly 1 operation */
+    double units = 1.0;
+    if (ts->cfg.op_size) {
+        units = (double) size / ts->cfg.op_size;
+    }
+
+    bytes_size = size * ts->cfg.unit_size;
+
+    ts->cfg.buckets[THROTTLE_BPS_TOTAL].bucket += bytes_size;
+    ts->cfg.buckets[THROTTLE_OPS_TOTAL].bucket += units;
+
+    if (is_write) {
+        ts->cfg.buckets[THROTTLE_BPS_WRITE].bucket += bytes_size;
+        ts->cfg.buckets[THROTTLE_OPS_WRITE].bucket += units;
+    } else {
+        ts->cfg.buckets[THROTTLE_BPS_READ].bucket += bytes_size;
+        ts->cfg.buckets[THROTTLE_OPS_READ].bucket += units;
+    }
+}
+
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Qemu-devel] [PATCH V4 2/5] throttle: Add units tests
  2013-08-08 14:29 [Qemu-devel] [PATCH V4 0/5] Continuous Leaky Bucket Throttling Benoît Canet
  2013-08-08 14:29 ` [Qemu-devel] [PATCH V4 1/5] throttle: Add a new throttling API implementing continuous leaky bucket Benoît Canet
@ 2013-08-08 14:29 ` Benoît Canet
  2013-08-08 14:29 ` [Qemu-devel] [PATCH V4 3/5] block: Enable the new throttling code in the block layer Benoît Canet
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Benoît Canet @ 2013-08-08 14:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 tests/Makefile        |    2 +
 tests/test-throttle.c |  494 +++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 496 insertions(+)
 create mode 100644 tests/test-throttle.c

diff --git a/tests/Makefile b/tests/Makefile
index d044908..fb1e06a 100644
--- a/tests/Makefile
+++ b/tests/Makefile
@@ -29,6 +29,7 @@ check-unit-y += tests/test-visitor-serialization$(EXESUF)
 check-unit-y += tests/test-iov$(EXESUF)
 gcov-files-test-iov-y = util/iov.c
 check-unit-y += tests/test-aio$(EXESUF)
+check-unit-y += tests/test-throttle$(EXESUF)
 gcov-files-test-aio-$(CONFIG_WIN32) = aio-win32.c
 gcov-files-test-aio-$(CONFIG_POSIX) = aio-posix.c
 check-unit-y += tests/test-thread-pool$(EXESUF)
@@ -116,6 +117,7 @@ tests/check-qfloat$(EXESUF): tests/check-qfloat.o libqemuutil.a
 tests/check-qjson$(EXESUF): tests/check-qjson.o libqemuutil.a libqemustub.a
 tests/test-coroutine$(EXESUF): tests/test-coroutine.o $(block-obj-y) libqemuutil.a libqemustub.a
 tests/test-aio$(EXESUF): tests/test-aio.o $(block-obj-y) libqemuutil.a libqemustub.a
+tests/test-throttle$(EXESUF): tests/test-throttle.o $(block-obj-y) libqemuutil.a libqemustub.a
 tests/test-thread-pool$(EXESUF): tests/test-thread-pool.o $(block-obj-y) libqemuutil.a libqemustub.a
 tests/test-iov$(EXESUF): tests/test-iov.o libqemuutil.a
 tests/test-hbitmap$(EXESUF): tests/test-hbitmap.o libqemuutil.a libqemustub.a
diff --git a/tests/test-throttle.c b/tests/test-throttle.c
new file mode 100644
index 0000000..c687f5f
--- /dev/null
+++ b/tests/test-throttle.c
@@ -0,0 +1,494 @@
+/*
+ * Throttle infrastructure tests
+ *
+ * Copyright Nodalink, SARL. 2013
+ *
+ * Authors:
+ *  Benoît Canet     <benoit.canet@irqsave.net>
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ */
+
+#include <glib.h>
+#include <math.h>
+#include "qemu/throttle.h"
+
+LeakyBucket    bkt;
+ThrottleConfig cfg;
+ThrottleState  ts;
+
+/* usefull function */
+static bool double_cmp(double x, double y)
+{
+    return fabsl(x - y) < 1e-6;
+}
+
+/* tests for single bucket operations */
+static void test_leak_bucket(void)
+{
+    /* set initial value */
+    bkt.ups = 150;
+    bkt.max = 15;
+    bkt.bucket = 1.5;
+
+    /* leak an op work of time */
+    throttle_leak_bucket(&bkt, NANOSECONDS_PER_SECOND / 150);
+    g_assert(bkt.ups == 150);
+    g_assert(bkt.max == 15);
+    g_assert(double_cmp(bkt.bucket, 0.5));
+
+    /* leak again emptying the bucket */
+    throttle_leak_bucket(&bkt, NANOSECONDS_PER_SECOND / 150);
+    g_assert(bkt.ups == 150);
+    g_assert(bkt.max == 15);
+    g_assert(double_cmp(bkt.bucket, 0));
+
+    /* check that the bucket level won't go lower */
+    throttle_leak_bucket(&bkt, NANOSECONDS_PER_SECOND / 150);
+    g_assert(bkt.ups == 150);
+    g_assert(bkt.max == 15);
+    g_assert(double_cmp(bkt.bucket, 0));
+}
+
+static void test_compute_wait(void)
+{
+    int64_t wait;
+    int64_t result;
+
+    /* no operation limit set */
+    bkt.ups = 0;
+    bkt.max = 15;
+    bkt.bucket = 1.5;
+    wait = throttle_compute_wait(&bkt);
+    g_assert(!wait);
+
+    /* zero delta */
+    bkt.ups = 150;
+    bkt.max = 15;
+    bkt.bucket = 15;
+    wait = throttle_compute_wait(&bkt);
+    g_assert(!wait);
+
+    /* below zero delta */
+    bkt.ups = 150;
+    bkt.max = 15;
+    bkt.bucket = 9;
+    wait = throttle_compute_wait(&bkt);
+    g_assert(!wait);
+
+    /* half an operation above max */
+    bkt.ups = 150;
+    bkt.max = 15;
+    bkt.bucket = 15.5;
+    wait = throttle_compute_wait(&bkt);
+    /* time required to do half an operation */
+    result = (int64_t)  NANOSECONDS_PER_SECOND / 150 / 2;
+    g_assert(wait == result);
+}
+
+/* functions to test ThrottleState initialization/destroy methods */
+static void read_timer_cb(void *opaque)
+{
+}
+
+static void write_timer_cb(void *opaque)
+{
+}
+
+static void test_init(void)
+{
+    int i;
+
+    /* fill the structure with crap */
+    memset(&ts, 1, sizeof(ts));
+
+    /* init the structure */
+    throttle_init(&ts, vm_clock, read_timer_cb, write_timer_cb, &ts);
+
+    /* check initialized fields */
+    g_assert(ts.clock == vm_clock);
+    g_assert(ts.timers[0]);
+    g_assert(ts.timers[1]);
+
+    /* check other fields where cleared */
+    g_assert(!ts.previous_leak);
+    g_assert(!ts.cfg.op_size);
+    g_assert(!ts.cfg.unit_size);
+    for (i = 0; i < BUCKETS_COUNT; i++) {
+        g_assert(!ts.cfg.buckets[i].ups);
+        g_assert(!ts.cfg.buckets[i].max);
+        g_assert(!ts.cfg.buckets[i].bucket);
+    }
+
+    throttle_destroy(&ts);
+}
+
+static void test_destroy(void)
+{
+    int i;
+    throttle_init(&ts, vm_clock, read_timer_cb, write_timer_cb, &ts);
+    throttle_destroy(&ts);
+    for (i = 0; i < 2; i++) {
+        g_assert(!ts.timers[i]);
+    }
+}
+
+/* function to test throttle_config and throttle_get_config */
+static void test_config_functions(void)
+{
+    int i;
+    ThrottleConfig orig_cfg, final_cfg;
+
+    orig_cfg.buckets[THROTTLE_BPS_TOTAL].ups = 153;
+    orig_cfg.buckets[THROTTLE_BPS_READ].ups  = 56;
+    orig_cfg.buckets[THROTTLE_BPS_WRITE].ups = 1;
+
+    orig_cfg.buckets[THROTTLE_OPS_TOTAL].ups = 150;
+    orig_cfg.buckets[THROTTLE_OPS_READ].ups  = 69;
+    orig_cfg.buckets[THROTTLE_OPS_WRITE].ups = 23;
+
+    orig_cfg.buckets[THROTTLE_BPS_TOTAL].max = 0; /* should be corrected */
+    orig_cfg.buckets[THROTTLE_BPS_READ].max  = 1; /* should be corrected */
+    orig_cfg.buckets[THROTTLE_BPS_WRITE].max = 120;
+
+    orig_cfg.buckets[THROTTLE_OPS_TOTAL].max = 150;
+    orig_cfg.buckets[THROTTLE_OPS_READ].max  = 400;
+    orig_cfg.buckets[THROTTLE_OPS_WRITE].max = 500;
+
+    orig_cfg.buckets[THROTTLE_BPS_TOTAL].bucket = 45;
+    orig_cfg.buckets[THROTTLE_BPS_READ].bucket  = 65;
+    orig_cfg.buckets[THROTTLE_BPS_WRITE].bucket = 23;
+
+    orig_cfg.buckets[THROTTLE_OPS_TOTAL].bucket = 1;
+    orig_cfg.buckets[THROTTLE_OPS_READ].bucket  = 90;
+    orig_cfg.buckets[THROTTLE_OPS_WRITE].bucket = 75;
+
+    orig_cfg.unit_size = 333;
+    orig_cfg.op_size = 1;
+
+    throttle_init(&ts, vm_clock, read_timer_cb, write_timer_cb, &ts);
+    /* structure reset by throttle_init previous_leak should be null */
+    g_assert(!ts.previous_leak);
+    throttle_config(&ts, &orig_cfg);
+
+    /* has previous leak been initialized by throttle_config ? */
+    g_assert(ts.previous_leak);
+
+    /* get back the fixed configuration */
+    throttle_get_config(&ts, &final_cfg);
+    throttle_destroy(&ts);
+
+    g_assert(final_cfg.buckets[THROTTLE_BPS_TOTAL].ups == 153);
+    g_assert(final_cfg.buckets[THROTTLE_BPS_READ].ups  == 56);
+    g_assert(final_cfg.buckets[THROTTLE_BPS_WRITE].ups == 1);
+
+    g_assert(final_cfg.buckets[THROTTLE_OPS_TOTAL].ups == 150);
+    g_assert(final_cfg.buckets[THROTTLE_OPS_READ].ups  == 69);
+    g_assert(final_cfg.buckets[THROTTLE_OPS_WRITE].ups == 23);
+
+    g_assert(final_cfg.buckets[THROTTLE_BPS_TOTAL].max == 15.3); /* fixed */
+    g_assert(final_cfg.buckets[THROTTLE_BPS_READ].max  == 5.6);  /* fixed */
+    g_assert(final_cfg.buckets[THROTTLE_BPS_WRITE].max == 120);
+
+    g_assert(final_cfg.buckets[THROTTLE_OPS_TOTAL].max == 150);
+    g_assert(final_cfg.buckets[THROTTLE_OPS_READ].max  == 400);
+    g_assert(final_cfg.buckets[THROTTLE_OPS_WRITE].max == 500);
+
+    g_assert(final_cfg.unit_size == 333);
+    g_assert(final_cfg.op_size == 1);
+
+    /* check bucket have been cleared */
+    for (i = 0; i < BUCKETS_COUNT; i++) {
+        g_assert(!final_cfg.buckets[i].bucket);
+    }
+
+
+}
+
+/* functions to test is throttle is enabled by a config */
+static void set_cfg_value(bool is_max, int index, int value)
+{
+    if (is_max) {
+        cfg.buckets[index].max = value;
+    } else {
+        cfg.buckets[index].ups = value;
+    }
+}
+
+static void test_enabled(void)
+{
+    int i;
+
+    memset(&cfg, 0, sizeof(cfg));
+    g_assert(!throttle_enabled(&cfg));
+
+    for (i = 0; i < BUCKETS_COUNT; i++) {
+        memset(&cfg, 0, sizeof(cfg));
+        set_cfg_value(false, i, 150);
+        g_assert(throttle_enabled(&cfg));
+    }
+
+    for (i = 0; i < BUCKETS_COUNT; i++) {
+        memset(&cfg, 0, sizeof(cfg));
+        set_cfg_value(false, i, -150);
+        g_assert(!throttle_enabled(&cfg));
+    }
+}
+
+/* tests functions for throttle_conflicting */
+
+static void test_conflicts_for_one_set(bool is_max,
+                                       int total,
+                                       int read,
+                                       int write)
+{
+    memset(&cfg, 0, sizeof(cfg));
+    g_assert(!throttle_conflicting(&cfg));
+
+    set_cfg_value(is_max, total, 1);
+    set_cfg_value(is_max, read,  1);
+    g_assert(throttle_conflicting(&cfg));
+
+    memset(&cfg, 0, sizeof(cfg));
+    set_cfg_value(is_max, total, 1);
+    set_cfg_value(is_max, write, 1);
+    g_assert(throttle_conflicting(&cfg));
+
+    memset(&cfg, 0, sizeof(cfg));
+    set_cfg_value(is_max, total, 1);
+    set_cfg_value(is_max, read,  1);
+    set_cfg_value(is_max, write, 1);
+    g_assert(throttle_conflicting(&cfg));
+
+    memset(&cfg, 0, sizeof(cfg));
+    set_cfg_value(is_max, total, 1);
+    g_assert(!throttle_conflicting(&cfg));
+
+    memset(&cfg, 0, sizeof(cfg));
+    set_cfg_value(is_max, read,  1);
+    set_cfg_value(is_max, write, 1);
+    g_assert(!throttle_conflicting(&cfg));
+}
+
+static void test_conflicting_config(void)
+{
+    /* bps average conflicts */
+    test_conflicts_for_one_set(false,
+                               THROTTLE_BPS_TOTAL,
+                               THROTTLE_BPS_READ,
+                               THROTTLE_BPS_WRITE);
+
+    /* ops average conflicts */
+    test_conflicts_for_one_set(false,
+                               THROTTLE_OPS_TOTAL,
+                               THROTTLE_OPS_READ,
+                               THROTTLE_OPS_WRITE);
+
+    /* bps average conflicts */
+    test_conflicts_for_one_set(true,
+                               THROTTLE_BPS_TOTAL,
+                               THROTTLE_BPS_READ,
+                               THROTTLE_BPS_WRITE);
+    /* ops average conflicts */
+    test_conflicts_for_one_set(true,
+                               THROTTLE_OPS_TOTAL,
+                               THROTTLE_OPS_READ,
+                               THROTTLE_OPS_WRITE);
+}
+/* functions to test the throttle_is_valid function */
+static void test_is_valid_for_value(int value, bool should_be_valid)
+{
+    int is_max, index;
+    for (is_max = 0; is_max < 2; is_max++) {
+        for (index = 0; index < BUCKETS_COUNT; index++) {
+            memset(&cfg, 0, sizeof(cfg));
+            set_cfg_value(is_max, index, value);
+            g_assert(throttle_is_valid(&cfg) == should_be_valid);
+        }
+    }
+}
+
+static void test_is_valid(void)
+{
+    /* negative number are invalid */
+    test_is_valid_for_value(-1, false);
+    /* zero are valids */
+    test_is_valid_for_value(0, true);
+    /* positives numers are valids */
+    test_is_valid_for_value(1, true);
+}
+
+static void test_have_timer(void)
+{
+    /* zero the structure */
+    memset(&ts, 0, sizeof(ts));
+
+    /* no timer set shoudl return false */
+    g_assert(!throttle_have_timer(&ts));
+
+    /* init the structure */
+    throttle_init(&ts, vm_clock, read_timer_cb, write_timer_cb, &ts);
+
+    /* timer set by init should return true */
+    g_assert(throttle_have_timer(&ts));
+
+    throttle_destroy(&ts);
+}
+
+static bool do_test_accounting(bool is_ops, /* are we testing bps or ops */
+                int size,                   /* size of the operation to do */
+                double ups,                 /* io limit */
+                uint64_t unit_size,         /* the byte size of the unit */
+                uint64_t op_size,           /* ideal size of an io */
+                double total_result,
+                double read_result,
+                double write_result)
+{
+    BucketType to_test[2][3] = { { THROTTLE_BPS_TOTAL,
+                                   THROTTLE_BPS_READ,
+                                   THROTTLE_BPS_WRITE, },
+                                 { THROTTLE_OPS_TOTAL,
+                                   THROTTLE_OPS_READ,
+                                   THROTTLE_OPS_WRITE, } };
+    ThrottleConfig cfg;
+    BucketType index;
+    int i;
+
+    for (i = 0; i < 3; i++) {
+        BucketType index = to_test[is_ops][i];
+        cfg.buckets[index].ups = ups;
+    }
+
+    cfg.unit_size = unit_size;
+    cfg.op_size = op_size;
+
+    throttle_init(&ts, vm_clock, read_timer_cb, write_timer_cb, &ts);
+    throttle_config(&ts, &cfg);
+
+    /* account a read */
+    throttle_account(&ts, false, size);
+    /* account a write */
+    throttle_account(&ts, true, size);
+
+    /* check total result */
+    index = to_test[is_ops][0];
+    if (!double_cmp(ts.cfg.buckets[index].bucket, total_result)) {
+        return false;
+    }
+
+    /* check read result */
+    index = to_test[is_ops][1];
+    if (!double_cmp(ts.cfg.buckets[index].bucket, read_result)) {
+        return false;
+    }
+
+    /* check write result */
+    index = to_test[is_ops][2];
+    if (!double_cmp(ts.cfg.buckets[index].bucket, write_result)) {
+        return false;
+    }
+
+    throttle_destroy(&ts);
+
+    return true;
+}
+
+static void test_accounting(void)
+{
+    /* tests for bps */
+
+    /* op of size 1 */
+    g_assert(do_test_accounting(false,
+                                1,
+                                150,
+                                512,
+                                0,
+                                1024,
+                                512,
+                                512));
+
+    /* op of size 2 */
+    g_assert(do_test_accounting(false,
+                                2,
+                                150,
+                                512,
+                                0,
+                                2048,
+                                1024,
+                                1024));
+
+    /* op of size 2 and orthogonal parameter change */
+    g_assert(do_test_accounting(false,
+                                2,
+                                150,
+                                512,
+                                17,
+                                2048,
+                                1024,
+                                1024));
+
+
+    /* tests for ops */
+
+    /* op of size 1 */
+    g_assert(do_test_accounting(true,
+                                1,
+                                150,
+                                512,
+                                0,
+                                2,
+                                1,
+                                1));
+
+    /* op of size 2 */
+    g_assert(do_test_accounting(true,
+                                2,
+                                150,
+                                512,
+                                0,
+                                2,
+                                1,
+                                1));
+
+    /* jumbo op accounting fragmentation : size 64 with op size of 13 units */
+    g_assert(do_test_accounting(true,
+                                64,
+                                150,
+                                512,
+                                13,
+                                (64.0 * 2) / 13,
+                                (64.0 / 13),
+                                (64.0 / 13)));
+
+    /* same with orthogonal parameters changes */
+    g_assert(do_test_accounting(true,
+                                64,
+                                300,
+                                3433,
+                                13,
+                                (64.0 * 2) / 13,
+                                (64.0 / 13),
+                                (64.0 / 13)));
+}
+
+int main(int argc, char **argv)
+{
+    init_clocks();
+    do {} while (g_main_context_iteration(NULL, false));
+
+    /* tests in the same order as the header function declarations */
+    g_test_init(&argc, &argv, NULL);
+    g_test_add_func("/throttle/leak_bucket",        test_leak_bucket);
+    g_test_add_func("/throttle/compute_wait",       test_compute_wait);
+    g_test_add_func("/throttle/init",               test_init);
+    g_test_add_func("/throttle/destroy",            test_destroy);
+    g_test_add_func("/throttle/have_timer",         test_have_timer);
+    g_test_add_func("/throttle/config/enabled",     test_enabled);
+    g_test_add_func("/throttle/config/conflicting", test_conflicting_config);
+    g_test_add_func("/throttle/config/is_valid",    test_is_valid);
+    g_test_add_func("/throttle/config_functions",   test_config_functions);
+    g_test_add_func("/throttle/accounting",         test_accounting);
+    return g_test_run();
+}
+
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Qemu-devel] [PATCH V4 3/5] block: Enable the new throttling code in the block layer.
  2013-08-08 14:29 [Qemu-devel] [PATCH V4 0/5] Continuous Leaky Bucket Throttling Benoît Canet
  2013-08-08 14:29 ` [Qemu-devel] [PATCH V4 1/5] throttle: Add a new throttling API implementing continuous leaky bucket Benoît Canet
  2013-08-08 14:29 ` [Qemu-devel] [PATCH V4 2/5] throttle: Add units tests Benoît Canet
@ 2013-08-08 14:29 ` Benoît Canet
  2013-08-08 15:30   ` Paolo Bonzini
  2013-08-08 14:29 ` [Qemu-devel] [PATCH V4 4/5] block: Add support for throttling burst max in QMP and the command line Benoît Canet
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 10+ messages in thread
From: Benoît Canet @ 2013-08-08 14:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

tip: Do not ever use the cfg scheduler in the guest with this code.
     It gives incorrect throttling.

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block.c                   |  351 ++++++++++++++-------------------------------
 block/qapi.c              |   21 ++-
 blockdev.c                |  100 +++++++------
 include/block/block.h     |    1 -
 include/block/block_int.h |   32 +----
 5 files changed, 173 insertions(+), 332 deletions(-)

diff --git a/block.c b/block.c
index 01b66d8..92f0aa1 100644
--- a/block.c
+++ b/block.c
@@ -86,13 +86,6 @@ static void coroutine_fn bdrv_co_do_rw(void *opaque);
 static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs,
     int64_t sector_num, int nb_sectors);
 
-static bool bdrv_exceed_bps_limits(BlockDriverState *bs, int nb_sectors,
-        bool is_write, double elapsed_time, uint64_t *wait);
-static bool bdrv_exceed_iops_limits(BlockDriverState *bs, bool is_write,
-        double elapsed_time, uint64_t *wait);
-static bool bdrv_exceed_io_limits(BlockDriverState *bs, int nb_sectors,
-        bool is_write, int64_t *wait);
-
 static QTAILQ_HEAD(, BlockDriverState) bdrv_states =
     QTAILQ_HEAD_INITIALIZER(bdrv_states);
 
@@ -123,70 +116,110 @@ int is_windows_drive(const char *filename)
 #endif
 
 /* throttling disk I/O limits */
-void bdrv_io_limits_disable(BlockDriverState *bs)
+void bdrv_set_io_limits(BlockDriverState *bs,
+                        ThrottleConfig *cfg)
 {
-    bs->io_limits_enabled = false;
+    int i;
+
+    throttle_config(&bs->throttle_state, cfg);
+
+    for (i = 0; i < 2; i++) {
+        qemu_co_enter_next(&bs->throttled_reqs[i]);
+    }
+}
+
+/* this function drain all the throttled IOs */
+static bool bdrv_drain_throttled(BlockDriverState *bs)
+{
+    bool drained = false;
+    bool enabled = bs->io_limits_enabled;
+    int i;
 
-    do {} while (qemu_co_enter_next(&bs->throttled_reqs));
+    bs->io_limits_enabled = false;
 
-    if (bs->block_timer) {
-        qemu_del_timer(bs->block_timer);
-        qemu_free_timer(bs->block_timer);
-        bs->block_timer = NULL;
+    for (i = 0; i < 2; i++) {
+        while (qemu_co_enter_next(&bs->throttled_reqs[i])) {
+            drained = true;
+        }
     }
 
-    bs->slice_start = 0;
-    bs->slice_end   = 0;
+    bs->io_limits_enabled = enabled;
+
+    return drained;
 }
 
-static void bdrv_block_timer(void *opaque)
+void bdrv_io_limits_disable(BlockDriverState *bs)
+{
+    bs->io_limits_enabled = false;
+
+    bdrv_drain_throttled(bs);
+
+    throttle_destroy(&bs->throttle_state);
+}
+
+static void bdrv_throttle_read_timer_cb(void *opaque)
 {
     BlockDriverState *bs = opaque;
+    qemu_co_enter_next(&bs->throttled_reqs[0]);
+}
 
-    qemu_co_enter_next(&bs->throttled_reqs);
+static void bdrv_throttle_write_timer_cb(void *opaque)
+{
+    BlockDriverState *bs = opaque;
+    qemu_co_enter_next(&bs->throttled_reqs[1]);
 }
 
+/* should be called before bdrv_set_io_limits if a limit is set */
 void bdrv_io_limits_enable(BlockDriverState *bs)
 {
-    qemu_co_queue_init(&bs->throttled_reqs);
-    bs->block_timer = qemu_new_timer_ns(vm_clock, bdrv_block_timer, bs);
+    throttle_init(&bs->throttle_state,
+                  vm_clock,
+                  bdrv_throttle_read_timer_cb,
+                  bdrv_throttle_write_timer_cb,
+                  bs);
+    qemu_co_queue_init(&bs->throttled_reqs[0]);
+    qemu_co_queue_init(&bs->throttled_reqs[1]);
     bs->io_limits_enabled = true;
 }
 
-bool bdrv_io_limits_enabled(BlockDriverState *bs)
+/* This function make an IO wait if needed
+ *
+ * @is_write:   is the IO a write
+ * @nb_sectors: the number of sectors of the IO
+ */
+static void bdrv_io_limits_intercept(BlockDriverState *bs,
+                                     bool is_write,
+                                     int nb_sectors)
 {
-    BlockIOLimit *io_limits = &bs->io_limits;
-    return io_limits->bps[BLOCK_IO_LIMIT_READ]
-         || io_limits->bps[BLOCK_IO_LIMIT_WRITE]
-         || io_limits->bps[BLOCK_IO_LIMIT_TOTAL]
-         || io_limits->iops[BLOCK_IO_LIMIT_READ]
-         || io_limits->iops[BLOCK_IO_LIMIT_WRITE]
-         || io_limits->iops[BLOCK_IO_LIMIT_TOTAL];
+    /* does this io must wait */
+    bool must_wait = !throttle_allowed(&bs->throttle_state, is_write);
+
+    /* if must wait or any request of this type throttled queue the IO */
+    if (must_wait ||
+        !qemu_co_queue_empty(&bs->throttled_reqs[is_write])) {
+        qemu_co_queue_wait(&bs->throttled_reqs[is_write]);
+    }
+
+    /* the IO will be executed do the accounting */
+    throttle_account(&bs->throttle_state, is_write, nb_sectors);
 }
 
-static void bdrv_io_limits_intercept(BlockDriverState *bs,
-                                     bool is_write, int nb_sectors)
+/* This function will schedule the next IO throttled IO of this type if needed
+ *
+ * @is_write: is the IO a write
+ */
+static void bdrv_io_limits_resched(BlockDriverState *bs, bool is_write)
 {
-    int64_t wait_time = -1;
 
-    if (!qemu_co_queue_empty(&bs->throttled_reqs)) {
-        qemu_co_queue_wait(&bs->throttled_reqs);
+    if (qemu_co_queue_empty(&bs->throttled_reqs[is_write])) {
+        return;
     }
 
-    /* In fact, we hope to keep each request's timing, in FIFO mode. The next
-     * throttled requests will not be dequeued until the current request is
-     * allowed to be serviced. So if the current request still exceeds the
-     * limits, it will be inserted to the head. All requests followed it will
-     * be still in throttled_reqs queue.
-     */
-
-    while (bdrv_exceed_io_limits(bs, nb_sectors, is_write, &wait_time)) {
-        qemu_mod_timer(bs->block_timer,
-                       wait_time + qemu_get_clock_ns(vm_clock));
-        qemu_co_queue_wait_insert_head(&bs->throttled_reqs);
+    if (!throttle_allowed(&bs->throttle_state, is_write)) {
+        return;
     }
 
-    qemu_co_queue_next(&bs->throttled_reqs);
+    qemu_co_queue_next(&bs->throttled_reqs[is_write]);
 }
 
 /* check if the path starts with "<protocol>:" */
@@ -1112,11 +1145,6 @@ int bdrv_open(BlockDriverState *bs, const char *filename, QDict *options,
         bdrv_dev_change_media_cb(bs, true);
     }
 
-    /* throttling disk I/O limits */
-    if (bs->io_limits_enabled) {
-        bdrv_io_limits_enable(bs);
-    }
-
     return 0;
 
 unlink_and_fail:
@@ -1452,16 +1480,15 @@ void bdrv_drain_all(void)
          * a busy wait.
          */
         QTAILQ_FOREACH(bs, &bdrv_states, list) {
-            while (qemu_co_enter_next(&bs->throttled_reqs)) {
-                busy = true;
-            }
+            busy = bdrv_drain_throttled(bs);
         }
     } while (busy);
 
     /* If requests are still pending there is a bug somewhere */
     QTAILQ_FOREACH(bs, &bdrv_states, list) {
         assert(QLIST_EMPTY(&bs->tracked_requests));
-        assert(qemu_co_queue_empty(&bs->throttled_reqs));
+        assert(qemu_co_queue_empty(&bs->throttled_reqs[0]));
+        assert(qemu_co_queue_empty(&bs->throttled_reqs[1]));
     }
 }
 
@@ -1497,13 +1524,12 @@ static void bdrv_move_feature_fields(BlockDriverState *bs_dest,
 
     bs_dest->enable_write_cache = bs_src->enable_write_cache;
 
-    /* i/o timing parameters */
-    bs_dest->slice_start        = bs_src->slice_start;
-    bs_dest->slice_end          = bs_src->slice_end;
-    bs_dest->slice_submitted    = bs_src->slice_submitted;
-    bs_dest->io_limits          = bs_src->io_limits;
-    bs_dest->throttled_reqs     = bs_src->throttled_reqs;
-    bs_dest->block_timer        = bs_src->block_timer;
+    /* i/o throttled req */
+    memcpy(&bs_dest->throttle_state,
+           &bs_src->throttle_state,
+           sizeof(ThrottleState));
+    bs_dest->throttled_reqs[0]  = bs_src->throttled_reqs[0];
+    bs_dest->throttled_reqs[1]  = bs_src->throttled_reqs[1];
     bs_dest->io_limits_enabled  = bs_src->io_limits_enabled;
 
     /* r/w error */
@@ -1550,7 +1576,7 @@ void bdrv_swap(BlockDriverState *bs_new, BlockDriverState *bs_old)
     assert(bs_new->dev == NULL);
     assert(bs_new->in_use == 0);
     assert(bs_new->io_limits_enabled == false);
-    assert(bs_new->block_timer == NULL);
+    assert(!throttle_have_timer(&bs_new->throttle_state));
 
     tmp = *bs_new;
     *bs_new = *bs_old;
@@ -1569,7 +1595,7 @@ void bdrv_swap(BlockDriverState *bs_new, BlockDriverState *bs_old)
     assert(bs_new->job == NULL);
     assert(bs_new->in_use == 0);
     assert(bs_new->io_limits_enabled == false);
-    assert(bs_new->block_timer == NULL);
+    assert(!throttle_have_timer(&bs_new->throttle_state));
 
     bdrv_rebind(bs_new);
     bdrv_rebind(bs_old);
@@ -1860,8 +1886,15 @@ int bdrv_commit_all(void)
  *
  * This function should be called when a tracked request is completing.
  */
-static void tracked_request_end(BdrvTrackedRequest *req)
+static void tracked_request_end(BlockDriverState *bs,
+                                BdrvTrackedRequest *req,
+                                bool is_write)
 {
+    /* throttling disk I/O */
+    if (bs->io_limits_enabled) {
+        bdrv_io_limits_resched(bs, is_write);
+    }
+
     QLIST_REMOVE(req, list);
     qemu_co_queue_restart_all(&req->wait_queue);
 }
@@ -1874,6 +1907,11 @@ static void tracked_request_begin(BdrvTrackedRequest *req,
                                   int64_t sector_num,
                                   int nb_sectors, bool is_write)
 {
+    /* throttling disk I/O */
+    if (bs->io_limits_enabled) {
+        bdrv_io_limits_intercept(bs, is_write, nb_sectors);
+    }
+
     *req = (BdrvTrackedRequest){
         .bs = bs,
         .sector_num = sector_num,
@@ -2512,11 +2550,6 @@ static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs,
         return -EIO;
     }
 
-    /* throttling disk read I/O */
-    if (bs->io_limits_enabled) {
-        bdrv_io_limits_intercept(bs, false, nb_sectors);
-    }
-
     if (bs->copy_on_read) {
         flags |= BDRV_REQ_COPY_ON_READ;
     }
@@ -2547,7 +2580,7 @@ static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs,
     ret = drv->bdrv_co_readv(bs, sector_num, nb_sectors, qiov);
 
 out:
-    tracked_request_end(&req);
+    tracked_request_end(bs, &req, false);
 
     if (flags & BDRV_REQ_COPY_ON_READ) {
         bs->copy_on_read_in_flight--;
@@ -2625,11 +2658,6 @@ static int coroutine_fn bdrv_co_do_writev(BlockDriverState *bs,
         return -EIO;
     }
 
-    /* throttling disk write I/O */
-    if (bs->io_limits_enabled) {
-        bdrv_io_limits_intercept(bs, true, nb_sectors);
-    }
-
     if (bs->copy_on_read_in_flight) {
         wait_for_overlapping_requests(bs, sector_num, nb_sectors);
     }
@@ -2658,7 +2686,7 @@ static int coroutine_fn bdrv_co_do_writev(BlockDriverState *bs,
         bs->wr_highest_sector = sector_num + nb_sectors - 1;
     }
 
-    tracked_request_end(&req);
+    tracked_request_end(bs, &req, true);
 
     return ret;
 }
@@ -2751,14 +2779,6 @@ void bdrv_get_geometry(BlockDriverState *bs, uint64_t *nb_sectors_ptr)
     *nb_sectors_ptr = length;
 }
 
-/* throttling disk io limits */
-void bdrv_set_io_limits(BlockDriverState *bs,
-                        BlockIOLimit *io_limits)
-{
-    bs->io_limits = *io_limits;
-    bs->io_limits_enabled = bdrv_io_limits_enabled(bs);
-}
-
 void bdrv_set_on_error(BlockDriverState *bs, BlockdevOnError on_read_error,
                        BlockdevOnError on_write_error)
 {
@@ -3568,169 +3588,6 @@ void bdrv_aio_cancel(BlockDriverAIOCB *acb)
     acb->aiocb_info->cancel(acb);
 }
 
-/* block I/O throttling */
-static bool bdrv_exceed_bps_limits(BlockDriverState *bs, int nb_sectors,
-                 bool is_write, double elapsed_time, uint64_t *wait)
-{
-    uint64_t bps_limit = 0;
-    uint64_t extension;
-    double   bytes_limit, bytes_base, bytes_res;
-    double   slice_time, wait_time;
-
-    if (bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL]) {
-        bps_limit = bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL];
-    } else if (bs->io_limits.bps[is_write]) {
-        bps_limit = bs->io_limits.bps[is_write];
-    } else {
-        if (wait) {
-            *wait = 0;
-        }
-
-        return false;
-    }
-
-    slice_time = bs->slice_end - bs->slice_start;
-    slice_time /= (NANOSECONDS_PER_SECOND);
-    bytes_limit = bps_limit * slice_time;
-    bytes_base  = bs->slice_submitted.bytes[is_write];
-    if (bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL]) {
-        bytes_base += bs->slice_submitted.bytes[!is_write];
-    }
-
-    /* bytes_base: the bytes of data which have been read/written; and
-     *             it is obtained from the history statistic info.
-     * bytes_res: the remaining bytes of data which need to be read/written.
-     * (bytes_base + bytes_res) / bps_limit: used to calcuate
-     *             the total time for completing reading/writting all data.
-     */
-    bytes_res   = (unsigned) nb_sectors * BDRV_SECTOR_SIZE;
-
-    if (bytes_base + bytes_res <= bytes_limit) {
-        if (wait) {
-            *wait = 0;
-        }
-
-        return false;
-    }
-
-    /* Calc approx time to dispatch */
-    wait_time = (bytes_base + bytes_res) / bps_limit - elapsed_time;
-
-    /* When the I/O rate at runtime exceeds the limits,
-     * bs->slice_end need to be extended in order that the current statistic
-     * info can be kept until the timer fire, so it is increased and tuned
-     * based on the result of experiment.
-     */
-    extension = wait_time * NANOSECONDS_PER_SECOND;
-    extension = DIV_ROUND_UP(extension, BLOCK_IO_SLICE_TIME) *
-                BLOCK_IO_SLICE_TIME;
-    bs->slice_end += extension;
-    if (wait) {
-        *wait = wait_time * NANOSECONDS_PER_SECOND;
-    }
-
-    return true;
-}
-
-static bool bdrv_exceed_iops_limits(BlockDriverState *bs, bool is_write,
-                             double elapsed_time, uint64_t *wait)
-{
-    uint64_t iops_limit = 0;
-    double   ios_limit, ios_base;
-    double   slice_time, wait_time;
-
-    if (bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL]) {
-        iops_limit = bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL];
-    } else if (bs->io_limits.iops[is_write]) {
-        iops_limit = bs->io_limits.iops[is_write];
-    } else {
-        if (wait) {
-            *wait = 0;
-        }
-
-        return false;
-    }
-
-    slice_time = bs->slice_end - bs->slice_start;
-    slice_time /= (NANOSECONDS_PER_SECOND);
-    ios_limit  = iops_limit * slice_time;
-    ios_base   = bs->slice_submitted.ios[is_write];
-    if (bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL]) {
-        ios_base += bs->slice_submitted.ios[!is_write];
-    }
-
-    if (ios_base + 1 <= ios_limit) {
-        if (wait) {
-            *wait = 0;
-        }
-
-        return false;
-    }
-
-    /* Calc approx time to dispatch, in seconds */
-    wait_time = (ios_base + 1) / iops_limit;
-    if (wait_time > elapsed_time) {
-        wait_time = wait_time - elapsed_time;
-    } else {
-        wait_time = 0;
-    }
-
-    /* Exceeded current slice, extend it by another slice time */
-    bs->slice_end += BLOCK_IO_SLICE_TIME;
-    if (wait) {
-        *wait = wait_time * NANOSECONDS_PER_SECOND;
-    }
-
-    return true;
-}
-
-static bool bdrv_exceed_io_limits(BlockDriverState *bs, int nb_sectors,
-                           bool is_write, int64_t *wait)
-{
-    int64_t  now, max_wait;
-    uint64_t bps_wait = 0, iops_wait = 0;
-    double   elapsed_time;
-    int      bps_ret, iops_ret;
-
-    now = qemu_get_clock_ns(vm_clock);
-    if (now > bs->slice_end) {
-        bs->slice_start = now;
-        bs->slice_end   = now + BLOCK_IO_SLICE_TIME;
-        memset(&bs->slice_submitted, 0, sizeof(bs->slice_submitted));
-    }
-
-    elapsed_time  = now - bs->slice_start;
-    elapsed_time  /= (NANOSECONDS_PER_SECOND);
-
-    bps_ret  = bdrv_exceed_bps_limits(bs, nb_sectors,
-                                      is_write, elapsed_time, &bps_wait);
-    iops_ret = bdrv_exceed_iops_limits(bs, is_write,
-                                      elapsed_time, &iops_wait);
-    if (bps_ret || iops_ret) {
-        max_wait = bps_wait > iops_wait ? bps_wait : iops_wait;
-        if (wait) {
-            *wait = max_wait;
-        }
-
-        now = qemu_get_clock_ns(vm_clock);
-        if (bs->slice_end < now + max_wait) {
-            bs->slice_end = now + max_wait;
-        }
-
-        return true;
-    }
-
-    if (wait) {
-        *wait = 0;
-    }
-
-    bs->slice_submitted.bytes[is_write] += (int64_t)nb_sectors *
-                                           BDRV_SECTOR_SIZE;
-    bs->slice_submitted.ios[is_write]++;
-
-    return false;
-}
-
 /**************************************************************/
 /* async block device emulation */
 
diff --git a/block/qapi.c b/block/qapi.c
index a4bc411..45f806b 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -223,18 +223,15 @@ void bdrv_query_info(BlockDriverState *bs,
         info->inserted->backing_file_depth = bdrv_get_backing_file_depth(bs);
 
         if (bs->io_limits_enabled) {
-            info->inserted->bps =
-                           bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL];
-            info->inserted->bps_rd =
-                           bs->io_limits.bps[BLOCK_IO_LIMIT_READ];
-            info->inserted->bps_wr =
-                           bs->io_limits.bps[BLOCK_IO_LIMIT_WRITE];
-            info->inserted->iops =
-                           bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL];
-            info->inserted->iops_rd =
-                           bs->io_limits.iops[BLOCK_IO_LIMIT_READ];
-            info->inserted->iops_wr =
-                           bs->io_limits.iops[BLOCK_IO_LIMIT_WRITE];
+            ThrottleConfig cfg;
+            throttle_get_config(&bs->throttle_state, &cfg);
+            info->inserted->bps     = cfg.buckets[THROTTLE_BPS_TOTAL].ups;
+            info->inserted->bps_rd  = cfg.buckets[THROTTLE_BPS_READ].ups;
+            info->inserted->bps_wr  = cfg.buckets[THROTTLE_BPS_WRITE].ups;
+
+            info->inserted->iops    = cfg.buckets[THROTTLE_OPS_TOTAL].ups;
+            info->inserted->iops_rd = cfg.buckets[THROTTLE_OPS_READ].ups;
+            info->inserted->iops_wr = cfg.buckets[THROTTLE_OPS_WRITE].ups;
         }
 
         bs0 = bs;
diff --git a/blockdev.c b/blockdev.c
index 41b0a49..7559ea5 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -281,32 +281,16 @@ static int parse_block_error_action(const char *buf, bool is_read)
     }
 }
 
-static bool do_check_io_limits(BlockIOLimit *io_limits, Error **errp)
+static bool check_throttle_config(ThrottleConfig *cfg, Error **errp)
 {
-    bool bps_flag;
-    bool iops_flag;
-
-    assert(io_limits);
-
-    bps_flag  = (io_limits->bps[BLOCK_IO_LIMIT_TOTAL] != 0)
-                 && ((io_limits->bps[BLOCK_IO_LIMIT_READ] != 0)
-                 || (io_limits->bps[BLOCK_IO_LIMIT_WRITE] != 0));
-    iops_flag = (io_limits->iops[BLOCK_IO_LIMIT_TOTAL] != 0)
-                 && ((io_limits->iops[BLOCK_IO_LIMIT_READ] != 0)
-                 || (io_limits->iops[BLOCK_IO_LIMIT_WRITE] != 0));
-    if (bps_flag || iops_flag) {
-        error_setg(errp, "bps(iops) and bps_rd/bps_wr(iops_rd/iops_wr) "
+    if (throttle_conflicting(cfg)) {
+        error_setg(errp, "bps/iops/max total values and read/write values"
                          "cannot be used at the same time");
         return false;
     }
 
-    if (io_limits->bps[BLOCK_IO_LIMIT_TOTAL] < 0 ||
-        io_limits->bps[BLOCK_IO_LIMIT_WRITE] < 0 ||
-        io_limits->bps[BLOCK_IO_LIMIT_READ] < 0 ||
-        io_limits->iops[BLOCK_IO_LIMIT_TOTAL] < 0 ||
-        io_limits->iops[BLOCK_IO_LIMIT_WRITE] < 0 ||
-        io_limits->iops[BLOCK_IO_LIMIT_READ] < 0) {
-        error_setg(errp, "bps and iops values must be 0 or greater");
+    if (!throttle_is_valid(cfg)) {
+        error_setg(errp, "bps/iops/maxs values must be 0 or greater");
         return false;
     }
 
@@ -331,7 +315,7 @@ static DriveInfo *blockdev_init(QemuOpts *all_opts,
     int on_read_error, on_write_error;
     const char *devaddr;
     DriveInfo *dinfo;
-    BlockIOLimit io_limits;
+    ThrottleConfig cfg;
     int snapshot = 0;
     bool copy_on_read;
     int ret;
@@ -489,20 +473,31 @@ static DriveInfo *blockdev_init(QemuOpts *all_opts,
     }
 
     /* disk I/O throttling */
-    io_limits.bps[BLOCK_IO_LIMIT_TOTAL]  =
+    cfg.buckets[THROTTLE_BPS_TOTAL].ups =
         qemu_opt_get_number(opts, "throttling.bps-total", 0);
-    io_limits.bps[BLOCK_IO_LIMIT_READ]   =
+    cfg.buckets[THROTTLE_BPS_READ].ups  =
         qemu_opt_get_number(opts, "throttling.bps-read", 0);
-    io_limits.bps[BLOCK_IO_LIMIT_WRITE]  =
+    cfg.buckets[THROTTLE_BPS_WRITE].ups =
         qemu_opt_get_number(opts, "throttling.bps-write", 0);
-    io_limits.iops[BLOCK_IO_LIMIT_TOTAL] =
+    cfg.buckets[THROTTLE_OPS_TOTAL].ups =
         qemu_opt_get_number(opts, "throttling.iops-total", 0);
-    io_limits.iops[BLOCK_IO_LIMIT_READ]  =
+    cfg.buckets[THROTTLE_OPS_READ].ups =
         qemu_opt_get_number(opts, "throttling.iops-read", 0);
-    io_limits.iops[BLOCK_IO_LIMIT_WRITE] =
+    cfg.buckets[THROTTLE_OPS_WRITE].ups =
         qemu_opt_get_number(opts, "throttling.iops-write", 0);
 
-    if (!do_check_io_limits(&io_limits, &error)) {
+    cfg.buckets[THROTTLE_BPS_TOTAL].max = 0;
+    cfg.buckets[THROTTLE_BPS_READ].max  = 0;
+    cfg.buckets[THROTTLE_BPS_WRITE].max = 0;
+
+    cfg.buckets[THROTTLE_OPS_TOTAL].max = 0;
+    cfg.buckets[THROTTLE_OPS_READ].max  = 0;
+    cfg.buckets[THROTTLE_OPS_WRITE].max = 0;
+
+    cfg.unit_size = BDRV_SECTOR_SIZE;
+    cfg.op_size = 0;
+
+    if (!check_throttle_config(&cfg, &error)) {
         error_report("%s", error_get_pretty(error));
         error_free(error);
         return NULL;
@@ -629,7 +624,10 @@ static DriveInfo *blockdev_init(QemuOpts *all_opts,
     bdrv_set_on_error(dinfo->bdrv, on_read_error, on_write_error);
 
     /* disk I/O throttling */
-    bdrv_set_io_limits(dinfo->bdrv, &io_limits);
+    if (throttle_enabled(&cfg)) {
+        bdrv_io_limits_enable(dinfo->bdrv);
+        bdrv_set_io_limits(dinfo->bdrv, &cfg);
+    }
 
     switch(type) {
     case IF_IDE:
@@ -1262,7 +1260,7 @@ void qmp_block_set_io_throttle(const char *device, int64_t bps, int64_t bps_rd,
                                int64_t bps_wr, int64_t iops, int64_t iops_rd,
                                int64_t iops_wr, Error **errp)
 {
-    BlockIOLimit io_limits;
+    ThrottleConfig cfg;
     BlockDriverState *bs;
 
     bs = bdrv_find(device);
@@ -1271,27 +1269,37 @@ void qmp_block_set_io_throttle(const char *device, int64_t bps, int64_t bps_rd,
         return;
     }
 
-    io_limits.bps[BLOCK_IO_LIMIT_TOTAL] = bps;
-    io_limits.bps[BLOCK_IO_LIMIT_READ]  = bps_rd;
-    io_limits.bps[BLOCK_IO_LIMIT_WRITE] = bps_wr;
-    io_limits.iops[BLOCK_IO_LIMIT_TOTAL]= iops;
-    io_limits.iops[BLOCK_IO_LIMIT_READ] = iops_rd;
-    io_limits.iops[BLOCK_IO_LIMIT_WRITE]= iops_wr;
+    cfg.buckets[THROTTLE_BPS_TOTAL].ups = bps;
+    cfg.buckets[THROTTLE_BPS_READ].ups  = bps_rd;
+    cfg.buckets[THROTTLE_BPS_WRITE].ups = bps_wr;
+
+    cfg.buckets[THROTTLE_OPS_TOTAL].ups = iops;
+    cfg.buckets[THROTTLE_OPS_READ].ups  = iops_rd;
+    cfg.buckets[THROTTLE_OPS_WRITE].ups = iops_wr;
+
+    cfg.buckets[THROTTLE_BPS_TOTAL].max = 0;
+    cfg.buckets[THROTTLE_BPS_READ].max  = 0;
+    cfg.buckets[THROTTLE_BPS_WRITE].max = 0;
+
+    cfg.buckets[THROTTLE_OPS_TOTAL].max = 0;
+    cfg.buckets[THROTTLE_OPS_READ].max  = 0;
+    cfg.buckets[THROTTLE_OPS_WRITE].max = 0;
 
-    if (!do_check_io_limits(&io_limits, errp)) {
+    cfg.unit_size = BDRV_SECTOR_SIZE;
+    cfg.op_size = 0;
+
+    if (!check_throttle_config(&cfg, errp)) {
         return;
     }
 
-    bs->io_limits = io_limits;
-
-    if (!bs->io_limits_enabled && bdrv_io_limits_enabled(bs)) {
+    if (!bs->io_limits_enabled && throttle_enabled(&cfg)) {
         bdrv_io_limits_enable(bs);
-    } else if (bs->io_limits_enabled && !bdrv_io_limits_enabled(bs)) {
+    } else if (bs->io_limits_enabled && !throttle_enabled(&cfg)) {
         bdrv_io_limits_disable(bs);
-    } else {
-        if (bs->block_timer) {
-            qemu_mod_timer(bs->block_timer, qemu_get_clock_ns(vm_clock));
-        }
+    }
+
+    if (bs->io_limits_enabled) {
+        bdrv_set_io_limits(bs, &cfg);
     }
 }
 
diff --git a/include/block/block.h b/include/block/block.h
index 742fce5..b16d579 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -107,7 +107,6 @@ void bdrv_info_stats(Monitor *mon, QObject **ret_data);
 /* disk I/O throttling */
 void bdrv_io_limits_enable(BlockDriverState *bs);
 void bdrv_io_limits_disable(BlockDriverState *bs);
-bool bdrv_io_limits_enabled(BlockDriverState *bs);
 
 void bdrv_init(void);
 void bdrv_init_with_whitelist(void);
diff --git a/include/block/block_int.h b/include/block/block_int.h
index e45f2a0..a0b6bc1 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -34,18 +34,12 @@
 #include "monitor/monitor.h"
 #include "qemu/hbitmap.h"
 #include "block/snapshot.h"
+#include "qemu/throttle.h"
 
 #define BLOCK_FLAG_ENCRYPT          1
 #define BLOCK_FLAG_COMPAT6          4
 #define BLOCK_FLAG_LAZY_REFCOUNTS   8
 
-#define BLOCK_IO_LIMIT_READ     0
-#define BLOCK_IO_LIMIT_WRITE    1
-#define BLOCK_IO_LIMIT_TOTAL    2
-
-#define BLOCK_IO_SLICE_TIME     100000000
-#define NANOSECONDS_PER_SECOND  1000000000.0
-
 #define BLOCK_OPT_SIZE              "size"
 #define BLOCK_OPT_ENCRYPT           "encryption"
 #define BLOCK_OPT_COMPAT6           "compat6"
@@ -69,17 +63,6 @@ typedef struct BdrvTrackedRequest {
     CoQueue wait_queue; /* coroutines blocked on this request */
 } BdrvTrackedRequest;
 
-
-typedef struct BlockIOLimit {
-    int64_t bps[3];
-    int64_t iops[3];
-} BlockIOLimit;
-
-typedef struct BlockIOBaseValue {
-    uint64_t bytes[2];
-    uint64_t ios[2];
-} BlockIOBaseValue;
-
 struct BlockDriver {
     const char *format_name;
     int instance_size;
@@ -263,13 +246,9 @@ struct BlockDriverState {
     /* number of in-flight copy-on-read requests */
     unsigned int copy_on_read_in_flight;
 
-    /* the time for latest disk I/O */
-    int64_t slice_start;
-    int64_t slice_end;
-    BlockIOLimit io_limits;
-    BlockIOBaseValue slice_submitted;
-    CoQueue      throttled_reqs;
-    QEMUTimer    *block_timer;
+    /* I/O throttling */
+    ThrottleState throttle_state;
+    CoQueue      throttled_reqs[2];
     bool         io_limits_enabled;
 
     /* I/O stats (display with "info blockstats"). */
@@ -308,7 +287,8 @@ struct BlockDriverState {
 int get_tmp_filename(char *filename, int size);
 
 void bdrv_set_io_limits(BlockDriverState *bs,
-                        BlockIOLimit *io_limits);
+                        ThrottleConfig *cfg);
+
 
 /**
  * bdrv_add_before_write_notifier:
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Qemu-devel] [PATCH V4 4/5] block: Add support for throttling burst max in QMP and the command line.
  2013-08-08 14:29 [Qemu-devel] [PATCH V4 0/5] Continuous Leaky Bucket Throttling Benoît Canet
                   ` (2 preceding siblings ...)
  2013-08-08 14:29 ` [Qemu-devel] [PATCH V4 3/5] block: Enable the new throttling code in the block layer Benoît Canet
@ 2013-08-08 14:29 ` Benoît Canet
  2013-08-08 14:29 ` [Qemu-devel] [PATCH V4 5/5] block: Add iops_sector_count to do the iops accounting for a given io size Benoît Canet
  2013-08-09 12:05 ` [Qemu-devel] [PATCH V4 0/5] Continuous Leaky Bucket Throttling* Benoît Canet
  5 siblings, 0 replies; 10+ messages in thread
From: Benoît Canet @ 2013-08-08 14:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

The max parameter of the leaky bycket throttling algoritm can be used to
allow the guest to do bursts.
The max value is a pool of I/O that the guest can use without being throttled
at all. Throttling is triggered once this pool is empty.

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/qapi.c     |   26 ++++++++++++
 blockdev.c       |  119 ++++++++++++++++++++++++++++++++++++++++++++++--------
 hmp.c            |   32 +++++++++++++--
 qapi-schema.json |   34 +++++++++++++++-
 qemu-options.hx  |    4 +-
 qmp-commands.hx  |   30 ++++++++++++--
 6 files changed, 220 insertions(+), 25 deletions(-)

diff --git a/block/qapi.c b/block/qapi.c
index 45f806b..5ba10f4 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -232,6 +232,32 @@ void bdrv_query_info(BlockDriverState *bs,
             info->inserted->iops    = cfg.buckets[THROTTLE_OPS_TOTAL].ups;
             info->inserted->iops_rd = cfg.buckets[THROTTLE_OPS_READ].ups;
             info->inserted->iops_wr = cfg.buckets[THROTTLE_OPS_WRITE].ups;
+
+            info->inserted->has_bps_max     =
+                cfg.buckets[THROTTLE_BPS_TOTAL].max;
+            info->inserted->bps_max         =
+                cfg.buckets[THROTTLE_BPS_TOTAL].max;
+            info->inserted->has_bps_rd_max  =
+                cfg.buckets[THROTTLE_BPS_READ].max;
+            info->inserted->bps_rd_max      =
+                cfg.buckets[THROTTLE_BPS_READ].max;
+            info->inserted->has_bps_wr_max  =
+                cfg.buckets[THROTTLE_BPS_WRITE].max;
+            info->inserted->bps_wr_max      =
+                cfg.buckets[THROTTLE_BPS_WRITE].max;
+
+            info->inserted->has_iops_max    =
+                cfg.buckets[THROTTLE_OPS_TOTAL].max;
+            info->inserted->iops_max        =
+                cfg.buckets[THROTTLE_OPS_TOTAL].max;
+            info->inserted->has_iops_rd_max =
+                cfg.buckets[THROTTLE_OPS_READ].max;
+            info->inserted->iops_rd_max     =
+                cfg.buckets[THROTTLE_OPS_READ].max;
+            info->inserted->has_iops_wr_max =
+                cfg.buckets[THROTTLE_OPS_WRITE].max;
+            info->inserted->iops_wr_max     =
+                cfg.buckets[THROTTLE_OPS_WRITE].max;
         }
 
         bs0 = bs;
diff --git a/blockdev.c b/blockdev.c
index 7559ea5..22016a2 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -486,13 +486,18 @@ static DriveInfo *blockdev_init(QemuOpts *all_opts,
     cfg.buckets[THROTTLE_OPS_WRITE].ups =
         qemu_opt_get_number(opts, "throttling.iops-write", 0);
 
-    cfg.buckets[THROTTLE_BPS_TOTAL].max = 0;
-    cfg.buckets[THROTTLE_BPS_READ].max  = 0;
-    cfg.buckets[THROTTLE_BPS_WRITE].max = 0;
-
-    cfg.buckets[THROTTLE_OPS_TOTAL].max = 0;
-    cfg.buckets[THROTTLE_OPS_READ].max  = 0;
-    cfg.buckets[THROTTLE_OPS_WRITE].max = 0;
+    cfg.buckets[THROTTLE_BPS_TOTAL].max =
+        qemu_opt_get_number(opts, "throttling.bps-total-max", 0);
+    cfg.buckets[THROTTLE_BPS_READ].max  =
+        qemu_opt_get_number(opts, "throttling.bps-read-max", 0);
+    cfg.buckets[THROTTLE_BPS_WRITE].max =
+        qemu_opt_get_number(opts, "throttling.bps-write-max", 0);
+    cfg.buckets[THROTTLE_OPS_TOTAL].max =
+        qemu_opt_get_number(opts, "throttling.iops-total-max", 0);
+    cfg.buckets[THROTTLE_OPS_READ].max =
+        qemu_opt_get_number(opts, "throttling.iops-read-max", 0);
+    cfg.buckets[THROTTLE_OPS_WRITE].max =
+        qemu_opt_get_number(opts, "throttling.iops-write-max", 0);
 
     cfg.unit_size = BDRV_SECTOR_SIZE;
     cfg.op_size = 0;
@@ -773,6 +778,14 @@ DriveInfo *drive_init(QemuOpts *all_opts, BlockInterfaceType block_default_type)
     qemu_opt_rename(all_opts, "bps_rd", "throttling.bps-read");
     qemu_opt_rename(all_opts, "bps_wr", "throttling.bps-write");
 
+    qemu_opt_rename(all_opts, "iops_max", "throttling.iops-total-max");
+    qemu_opt_rename(all_opts, "iops_rd_max", "throttling.iops-read-max");
+    qemu_opt_rename(all_opts, "iops_wr_max", "throttling.iops-write-max");
+
+    qemu_opt_rename(all_opts, "bps_max", "throttling.bps-total-max");
+    qemu_opt_rename(all_opts, "bps_rd_max", "throttling.bps-read-max");
+    qemu_opt_rename(all_opts, "bps_wr_max", "throttling.bps-write-max");
+
     qemu_opt_rename(all_opts, "readonly", "read-only");
 
     value = qemu_opt_get(all_opts, "cache");
@@ -1257,8 +1270,22 @@ void qmp_change_blockdev(const char *device, const char *filename,
 
 /* throttling disk I/O limits */
 void qmp_block_set_io_throttle(const char *device, int64_t bps, int64_t bps_rd,
-                               int64_t bps_wr, int64_t iops, int64_t iops_rd,
-                               int64_t iops_wr, Error **errp)
+                               int64_t bps_wr,
+                               int64_t iops,
+                               int64_t iops_rd,
+                               int64_t iops_wr,
+                               bool has_bps_max,
+                               int64_t bps_max,
+                               bool has_bps_rd_max,
+                               int64_t bps_rd_max,
+                               bool has_bps_wr_max,
+                               int64_t bps_wr_max,
+                               bool has_iops_max,
+                               int64_t iops_max,
+                               bool has_iops_rd_max,
+                               int64_t iops_rd_max,
+                               bool has_iops_wr_max,
+                               int64_t iops_wr_max, Error **errp)
 {
     ThrottleConfig cfg;
     BlockDriverState *bs;
@@ -1269,6 +1296,7 @@ void qmp_block_set_io_throttle(const char *device, int64_t bps, int64_t bps_rd,
         return;
     }
 
+    memset(&cfg, 0, sizeof(cfg));
     cfg.buckets[THROTTLE_BPS_TOTAL].ups = bps;
     cfg.buckets[THROTTLE_BPS_READ].ups  = bps_rd;
     cfg.buckets[THROTTLE_BPS_WRITE].ups = bps_wr;
@@ -1277,13 +1305,24 @@ void qmp_block_set_io_throttle(const char *device, int64_t bps, int64_t bps_rd,
     cfg.buckets[THROTTLE_OPS_READ].ups  = iops_rd;
     cfg.buckets[THROTTLE_OPS_WRITE].ups = iops_wr;
 
-    cfg.buckets[THROTTLE_BPS_TOTAL].max = 0;
-    cfg.buckets[THROTTLE_BPS_READ].max  = 0;
-    cfg.buckets[THROTTLE_BPS_WRITE].max = 0;
-
-    cfg.buckets[THROTTLE_OPS_TOTAL].max = 0;
-    cfg.buckets[THROTTLE_OPS_READ].max  = 0;
-    cfg.buckets[THROTTLE_OPS_WRITE].max = 0;
+    if (has_bps_max) {
+        cfg.buckets[THROTTLE_BPS_TOTAL].max = bps_max;
+    }
+    if (has_bps_rd_max) {
+        cfg.buckets[THROTTLE_BPS_READ].max = bps_rd_max;
+    }
+    if (has_bps_wr_max) {
+        cfg.buckets[THROTTLE_BPS_WRITE].max = bps_wr_max;
+    }
+    if (has_iops_max) {
+        cfg.buckets[THROTTLE_OPS_TOTAL].max = iops_max;
+    }
+    if (has_iops_rd_max) {
+        cfg.buckets[THROTTLE_OPS_READ].max = iops_rd_max;
+    }
+    if (has_iops_wr_max) {
+        cfg.buckets[THROTTLE_OPS_WRITE].max = iops_wr_max;
+    }
 
     cfg.unit_size = BDRV_SECTOR_SIZE;
     cfg.op_size = 0;
@@ -1988,6 +2027,30 @@ QemuOptsList qemu_common_drive_opts = {
             .type = QEMU_OPT_NUMBER,
             .help = "limit write bytes per second",
         },{
+            .name = "throttling.iops-total-max",
+            .type = QEMU_OPT_NUMBER,
+            .help = "I/O operations burst",
+        },{
+            .name = "throttling.iops-read-max",
+            .type = QEMU_OPT_NUMBER,
+            .help = "I/O operations read burst",
+        },{
+            .name = "throttling.iops-write-max",
+            .type = QEMU_OPT_NUMBER,
+            .help = "I/O operations write burst",
+        },{
+            .name = "throttling.bps-total-max",
+            .type = QEMU_OPT_NUMBER,
+            .help = "total bytes burst",
+        },{
+            .name = "throttling.bps-read-max",
+            .type = QEMU_OPT_NUMBER,
+            .help = "total bytes read burst",
+        },{
+            .name = "throttling.bps-write-max",
+            .type = QEMU_OPT_NUMBER,
+            .help = "total bytes write burst",
+        },{
             .name = "copy-on-read",
             .type = QEMU_OPT_BOOL,
             .help = "copy read data from backing file into image file",
@@ -2110,6 +2173,30 @@ QemuOptsList qemu_old_drive_opts = {
             .type = QEMU_OPT_NUMBER,
             .help = "limit write bytes per second",
         },{
+            .name = "iops_max",
+            .type = QEMU_OPT_NUMBER,
+            .help = "I/O operations burst",
+        },{
+            .name = "iops_rd_max",
+            .type = QEMU_OPT_NUMBER,
+            .help = "I/O operations read burst",
+        },{
+            .name = "iops_wr_max",
+            .type = QEMU_OPT_NUMBER,
+            .help = "I/O operations write burst",
+        },{
+            .name = "bps_max",
+            .type = QEMU_OPT_NUMBER,
+            .help = "total bytes burst",
+        },{
+            .name = "bps_rd_max",
+            .type = QEMU_OPT_NUMBER,
+            .help = "total bytes read burst",
+        },{
+            .name = "bps_wr_max",
+            .type = QEMU_OPT_NUMBER,
+            .help = "total bytes write burst",
+        },{
             .name = "copy-on-read",
             .type = QEMU_OPT_BOOL,
             .help = "copy read data from backing file into image file",
diff --git a/hmp.c b/hmp.c
index c45514b..e0c387c 100644
--- a/hmp.c
+++ b/hmp.c
@@ -344,14 +344,28 @@ void hmp_info_block(Monitor *mon, const QDict *qdict)
         {
             monitor_printf(mon, "    I/O throttling:   bps=%" PRId64
                             " bps_rd=%" PRId64  " bps_wr=%" PRId64
+                            " bps_max=%" PRId64
+                            " bps_rd_max=%" PRId64
+                            " bps_wr_max=%" PRId64
                             " iops=%" PRId64 " iops_rd=%" PRId64
-                            " iops_wr=%" PRId64 "\n",
+                            " iops_wr=%" PRId64
+                            " iops_max=%" PRId64
+                            " iops_rd_max=%" PRId64
+                            " iops_wr_max=%" PRId64 "\n",
                             info->value->inserted->bps,
                             info->value->inserted->bps_rd,
                             info->value->inserted->bps_wr,
+                            info->value->inserted->bps_max,
+                            info->value->inserted->bps_rd_max,
+                            info->value->inserted->bps_wr_max,
                             info->value->inserted->iops,
                             info->value->inserted->iops_rd,
-                            info->value->inserted->iops_wr);
+                            info->value->inserted->iops_wr,
+                            info->value->inserted->iops_max,
+                            info->value->inserted->iops_rd_max,
+                            info->value->inserted->iops_wr_max);
+        } else {
+            monitor_printf(mon, " [not inserted]");
         }
 
         if (verbose) {
@@ -1098,7 +1112,19 @@ void hmp_block_set_io_throttle(Monitor *mon, const QDict *qdict)
                               qdict_get_int(qdict, "bps_wr"),
                               qdict_get_int(qdict, "iops"),
                               qdict_get_int(qdict, "iops_rd"),
-                              qdict_get_int(qdict, "iops_wr"), &err);
+                              qdict_get_int(qdict, "iops_wr"),
+                              false, /* no burst max via QMP */
+                              0,
+                              false,
+                              0,
+                              false,
+                              0,
+                              false,
+                              0,
+                              false,
+                              0,
+                              false,
+                              0, &err);
     hmp_handle_error(mon, &err);
 }
 
diff --git a/qapi-schema.json b/qapi-schema.json
index a51f7d2..5e5461e 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -785,6 +785,18 @@
 #
 # @image: the info of image used (since: 1.6)
 #
+# @bps_max: #optional total max in bytes (Since 1.7)
+#
+# @bps_rd_max: #optional read max in bytes (Since 1.7)
+#
+# @bps_wr_max: #optional write max in bytes (Since 1.7)
+#
+# @iops_max: #optional total I/O operations max (Since 1.7)
+#
+# @iops_rd_max: #optional read I/O operations max (Since 1.7)
+#
+# @iops_wr_max: #optional write I/O operations max (Since 1.7)
+#
 # Since: 0.14.0
 #
 # Notes: This interface is only found in @BlockInfo.
@@ -795,7 +807,10 @@
             'encrypted': 'bool', 'encryption_key_missing': 'bool',
             'bps': 'int', 'bps_rd': 'int', 'bps_wr': 'int',
             'iops': 'int', 'iops_rd': 'int', 'iops_wr': 'int',
-            'image': 'ImageInfo' } }
+            'image': 'ImageInfo',
+            '*bps_max': 'int', '*bps_rd_max': 'int',
+            '*bps_wr_max': 'int', '*iops_max': 'int',
+            '*iops_rd_max': 'int', '*iops_wr_max': 'int' }}
 
 ##
 # @BlockDeviceIoStatus:
@@ -2174,6 +2189,18 @@
 #
 # @iops_wr: write I/O operations per second
 #
+# @bps_max: #optional total max in bytes (Since 1.7)
+#
+# @bps_rd_max: #optional read max in bytes (Since 1.7)
+#
+# @bps_wr_max: #optional write max in bytes (Since 1.7)
+#
+# @iops_max: #optional total I/O operations max (Since 1.7)
+#
+# @iops_rd_max: #optional read I/O operations max (Since 1.7)
+#
+# @iops_wr_max: #optional write I/O operations max (Since 1.7)
+#
 # Returns: Nothing on success
 #          If @device is not a valid block device, DeviceNotFound
 #
@@ -2181,7 +2208,10 @@
 ##
 { 'command': 'block_set_io_throttle',
   'data': { 'device': 'str', 'bps': 'int', 'bps_rd': 'int', 'bps_wr': 'int',
-            'iops': 'int', 'iops_rd': 'int', 'iops_wr': 'int' } }
+            'iops': 'int', 'iops_rd': 'int', 'iops_wr': 'int',
+            '*bps_max': 'int', '*bps_rd_max': 'int',
+            '*bps_wr_max': 'int', '*iops_max': 'int',
+            '*iops_rd_max': 'int', '*iops_wr_max': 'int' }}
 
 ##
 # @block-stream:
diff --git a/qemu-options.hx b/qemu-options.hx
index d15338e..3aa8f9e 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -409,7 +409,9 @@ DEF("drive", HAS_ARG, QEMU_OPTION_drive,
     "       [,cache=writethrough|writeback|none|directsync|unsafe][,format=f]\n"
     "       [,serial=s][,addr=A][,id=name][,aio=threads|native]\n"
     "       [,readonly=on|off][,copy-on-read=on|off]\n"
-    "       [[,bps=b]|[[,bps_rd=r][,bps_wr=w]]][[,iops=i]|[[,iops_rd=r][,iops_wr=w]]\n"
+    "       [[,bps=b]|[[,bps_rd=r][,bps_wr=w]]][[,iops=i]|[[,iops_rd=r]"
+    "        [,iops_wr=w][,bps_max=bt]|[[,bps_rd_max=rt][,bps_wr_max=wt]]]"
+    "       [[,iops_max=it]|[[,iops_rd_max=rt][,iops_wr_max=wt]]\n"
     "                use 'file' as a drive image\n", QEMU_ARCH_ALL)
 STEXI
 @item -drive @var{option}[,@var{option}[,@var{option}[,...]]]
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 2e59b0d..1610831 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -1389,7 +1389,7 @@ EQMP
 
     {
         .name       = "block_set_io_throttle",
-        .args_type  = "device:B,bps:l,bps_rd:l,bps_wr:l,iops:l,iops_rd:l,iops_wr:l",
+        .args_type  = "device:B,bps:l,bps_rd:l,bps_wr:l,iops:l,iops_rd:l,iops_wr:l,bps_max:l?,bps_rd_max:l?,bps_wr_max:l?,iops_max:l?,iops_rd_max:l?,iops_wr_max:l?",
         .mhandler.cmd_new = qmp_marshal_input_block_set_io_throttle,
     },
 
@@ -1404,10 +1404,16 @@ Arguments:
 - "device": device name (json-string)
 - "bps":  total throughput limit in bytes per second(json-int)
 - "bps_rd":  read throughput limit in bytes per second(json-int)
-- "bps_wr":  read throughput limit in bytes per second(json-int)
+- "bps_wr":  write throughput limit in bytes per second(json-int)
 - "iops":  total I/O operations per second(json-int)
 - "iops_rd":  read I/O operations per second(json-int)
 - "iops_wr":  write I/O operations per second(json-int)
+- "bps_max":  total max in bytes(json-int)
+- "bps_rd_max":  read max in bytes(json-int)
+- "bps_wr_max":  write max in bytes(json-int)
+- "iops_max":  total I/O operations max(json-int)
+- "iops_rd_max":  read I/O operations max(json-int)
+- "iops_wr_max":  write I/O operations max(json-int)
 
 Example:
 
@@ -1417,7 +1423,13 @@ Example:
                                                "bps_wr": "0",
                                                "iops": "0",
                                                "iops_rd": "0",
-                                               "iops_wr": "0" } }
+                                               "iops_wr": "0",
+                                               "bps_max": "8000000",
+                                               "bps_rd_max": "0",
+                                               "bps_wr_max": "0",
+                                               "iops_max": "0",
+                                               "iops_rd_max": "0",
+                                               "iops_wr_max": "0" } }
 <- { "return": {} }
 
 EQMP
@@ -1758,6 +1770,12 @@ Each json-object contain the following:
          - "iops": limit total I/O operations per second (json-int)
          - "iops_rd": limit read operations per second (json-int)
          - "iops_wr": limit write operations per second (json-int)
+         - "bps_max":  total max in bytes(json-int)
+         - "bps_rd_max":  read max in bytes(json-int)
+         - "bps_wr_max":  write max in bytes(json-int)
+         - "iops_max":  total I/O operations max(json-int)
+         - "iops_rd_max":  read I/O operations max(json-int)
+         - "iops_wr_max":  write I/O operations max(json-int)
          - "image": the detail of the image, it is a json-object containing
             the following:
              - "filename": image file name (json-string)
@@ -1827,6 +1845,12 @@ Example:
                "iops":1000000,
                "iops_rd":0,
                "iops_wr":0,
+               "bps_max": "8000000",
+               "bps_rd_max": "0",
+               "bps_wr_max": "0",
+               "iops_max": "0",
+               "iops_rd_max": "0",
+               "iops_wr_max": "0",
                "image":{
                   "filename":"disks/test.qcow2",
                   "format":"qcow2",
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Qemu-devel] [PATCH V4 5/5] block: Add iops_sector_count to do the iops accounting for a given io size.
  2013-08-08 14:29 [Qemu-devel] [PATCH V4 0/5] Continuous Leaky Bucket Throttling Benoît Canet
                   ` (3 preceding siblings ...)
  2013-08-08 14:29 ` [Qemu-devel] [PATCH V4 4/5] block: Add support for throttling burst max in QMP and the command line Benoît Canet
@ 2013-08-08 14:29 ` Benoît Canet
  2013-08-09 12:05 ` [Qemu-devel] [PATCH V4 0/5] Continuous Leaky Bucket Throttling* Benoît Canet
  5 siblings, 0 replies; 10+ messages in thread
From: Benoît Canet @ 2013-08-08 14:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha

This feature can be used in case where users are avoiding the iops limit by
doing jumbo I/Os hammering the storage backend.

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block/qapi.c     |    3 +++
 blockdev.c       |   22 +++++++++++++++++++---
 hmp.c            |    8 ++++++--
 qapi-schema.json |   10 ++++++++--
 qemu-options.hx  |    2 +-
 qmp-commands.hx  |    8 ++++++--
 6 files changed, 43 insertions(+), 10 deletions(-)

diff --git a/block/qapi.c b/block/qapi.c
index 5ba10f4..f98ff64 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -258,6 +258,9 @@ void bdrv_query_info(BlockDriverState *bs,
                 cfg.buckets[THROTTLE_OPS_WRITE].max;
             info->inserted->iops_wr_max     =
                 cfg.buckets[THROTTLE_OPS_WRITE].max;
+
+            info->inserted->has_iops_sector_count = cfg.op_size;
+            info->inserted->iops_sector_count = cfg.op_size;
         }
 
         bs0 = bs;
diff --git a/blockdev.c b/blockdev.c
index 22016a2..e2b0ee0 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -500,7 +500,7 @@ static DriveInfo *blockdev_init(QemuOpts *all_opts,
         qemu_opt_get_number(opts, "throttling.iops-write-max", 0);
 
     cfg.unit_size = BDRV_SECTOR_SIZE;
-    cfg.op_size = 0;
+    cfg.op_size = qemu_opt_get_number(opts, "throttling.iops-sector-count", 0);
 
     if (!check_throttle_config(&cfg, &error)) {
         error_report("%s", error_get_pretty(error));
@@ -786,6 +786,9 @@ DriveInfo *drive_init(QemuOpts *all_opts, BlockInterfaceType block_default_type)
     qemu_opt_rename(all_opts, "bps_rd_max", "throttling.bps-read-max");
     qemu_opt_rename(all_opts, "bps_wr_max", "throttling.bps-write-max");
 
+    qemu_opt_rename(all_opts,
+                    "iops_sector_count", "throttling.iops-sector-count");
+
     qemu_opt_rename(all_opts, "readonly", "read-only");
 
     value = qemu_opt_get(all_opts, "cache");
@@ -1285,7 +1288,9 @@ void qmp_block_set_io_throttle(const char *device, int64_t bps, int64_t bps_rd,
                                bool has_iops_rd_max,
                                int64_t iops_rd_max,
                                bool has_iops_wr_max,
-                               int64_t iops_wr_max, Error **errp)
+                               int64_t iops_wr_max,
+                               bool has_iops_sector_count,
+                               int64_t iops_sector_count, Error **errp)
 {
     ThrottleConfig cfg;
     BlockDriverState *bs;
@@ -1325,7 +1330,10 @@ void qmp_block_set_io_throttle(const char *device, int64_t bps, int64_t bps_rd,
     }
 
     cfg.unit_size = BDRV_SECTOR_SIZE;
-    cfg.op_size = 0;
+
+    if (has_iops_sector_count) {
+        cfg.op_size = iops_sector_count;
+    }
 
     if (!check_throttle_config(&cfg, errp)) {
         return;
@@ -2051,6 +2059,10 @@ QemuOptsList qemu_common_drive_opts = {
             .type = QEMU_OPT_NUMBER,
             .help = "total bytes write burst",
         },{
+            .name = "throttling.iops-sector-count",
+            .type = QEMU_OPT_NUMBER,
+            .help = "when limiting by iops max size of an I/O in sector",
+        },{
             .name = "copy-on-read",
             .type = QEMU_OPT_BOOL,
             .help = "copy read data from backing file into image file",
@@ -2197,6 +2209,10 @@ QemuOptsList qemu_old_drive_opts = {
             .type = QEMU_OPT_NUMBER,
             .help = "total bytes write burst",
         },{
+            .name = "iops_sector_count",
+            .type = QEMU_OPT_NUMBER,
+            .help = "when limiting by iops max size of an I/O in sector",
+        },{
             .name = "copy-on-read",
             .type = QEMU_OPT_BOOL,
             .help = "copy read data from backing file into image file",
diff --git a/hmp.c b/hmp.c
index e0c387c..01f7685 100644
--- a/hmp.c
+++ b/hmp.c
@@ -351,7 +351,8 @@ void hmp_info_block(Monitor *mon, const QDict *qdict)
                             " iops_wr=%" PRId64
                             " iops_max=%" PRId64
                             " iops_rd_max=%" PRId64
-                            " iops_wr_max=%" PRId64 "\n",
+                            " iops_wr_max=%" PRId64
+                            " iops_sector_count=%" PRId64 "\n",
                             info->value->inserted->bps,
                             info->value->inserted->bps_rd,
                             info->value->inserted->bps_wr,
@@ -363,7 +364,8 @@ void hmp_info_block(Monitor *mon, const QDict *qdict)
                             info->value->inserted->iops_wr,
                             info->value->inserted->iops_max,
                             info->value->inserted->iops_rd_max,
-                            info->value->inserted->iops_wr_max);
+                            info->value->inserted->iops_wr_max,
+                            info->value->inserted->iops_sector_count);
         } else {
             monitor_printf(mon, " [not inserted]");
         }
@@ -1124,6 +1126,8 @@ void hmp_block_set_io_throttle(Monitor *mon, const QDict *qdict)
                               false,
                               0,
                               false,
+                              0,
+                              false, /* No default I/O size */
                               0, &err);
     hmp_handle_error(mon, &err);
 }
diff --git a/qapi-schema.json b/qapi-schema.json
index 5e5461e..c7b1d5e 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -797,6 +797,8 @@
 #
 # @iops_wr_max: #optional write I/O operations max (Since 1.7)
 #
+# @iops_sector_count: #optional an I/O size in sector (Since 1.6)
+#
 # Since: 0.14.0
 #
 # Notes: This interface is only found in @BlockInfo.
@@ -810,7 +812,8 @@
             'image': 'ImageInfo',
             '*bps_max': 'int', '*bps_rd_max': 'int',
             '*bps_wr_max': 'int', '*iops_max': 'int',
-            '*iops_rd_max': 'int', '*iops_wr_max': 'int' }}
+            '*iops_rd_max': 'int', '*iops_wr_max': 'int',
+            '*iops_sector_count': 'int' } }
 
 ##
 # @BlockDeviceIoStatus:
@@ -2201,6 +2204,8 @@
 #
 # @iops_wr_max: #optional write I/O operations max (Since 1.7)
 #
+# @iops_sector_count: #optional an I/O size in sector (Since 1.6)
+#
 # Returns: Nothing on success
 #          If @device is not a valid block device, DeviceNotFound
 #
@@ -2211,7 +2216,8 @@
             'iops': 'int', 'iops_rd': 'int', 'iops_wr': 'int',
             '*bps_max': 'int', '*bps_rd_max': 'int',
             '*bps_wr_max': 'int', '*iops_max': 'int',
-            '*iops_rd_max': 'int', '*iops_wr_max': 'int' }}
+            '*iops_rd_max': 'int', '*iops_wr_max': 'int',
+            '*iops_sector_count': 'int' }}
 
 ##
 # @block-stream:
diff --git a/qemu-options.hx b/qemu-options.hx
index 3aa8f9e..c539dd0 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -411,7 +411,7 @@ DEF("drive", HAS_ARG, QEMU_OPTION_drive,
     "       [,readonly=on|off][,copy-on-read=on|off]\n"
     "       [[,bps=b]|[[,bps_rd=r][,bps_wr=w]]][[,iops=i]|[[,iops_rd=r]"
     "        [,iops_wr=w][,bps_max=bt]|[[,bps_rd_max=rt][,bps_wr_max=wt]]]"
-    "       [[,iops_max=it]|[[,iops_rd_max=rt][,iops_wr_max=wt]]\n"
+    "       [[,iops_max=it]|[[,iops_rd_max=rt][,iops_wr_max=wt][,iops_sector_count=cnt]]\n"
     "                use 'file' as a drive image\n", QEMU_ARCH_ALL)
 STEXI
 @item -drive @var{option}[,@var{option}[,@var{option}[,...]]]
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 1610831..bcd255f 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -1389,7 +1389,7 @@ EQMP
 
     {
         .name       = "block_set_io_throttle",
-        .args_type  = "device:B,bps:l,bps_rd:l,bps_wr:l,iops:l,iops_rd:l,iops_wr:l,bps_max:l?,bps_rd_max:l?,bps_wr_max:l?,iops_max:l?,iops_rd_max:l?,iops_wr_max:l?",
+        .args_type  = "device:B,bps:l,bps_rd:l,bps_wr:l,iops:l,iops_rd:l,iops_wr:l,bps_max:l?,bps_rd_max:l?,bps_wr_max:l?,iops_max:l?,iops_rd_max:l?,iops_wr_max:l?,iops_sector_count:l?",
         .mhandler.cmd_new = qmp_marshal_input_block_set_io_throttle,
     },
 
@@ -1414,6 +1414,7 @@ Arguments:
 - "iops_max":  total I/O operations max(json-int)
 - "iops_rd_max":  read I/O operations max(json-int)
 - "iops_wr_max":  write I/O operations max(json-int)
+- "iops_sector_count":  I/O sector count when limiting(json-int)
 
 Example:
 
@@ -1429,7 +1430,8 @@ Example:
                                                "bps_wr_max": "0",
                                                "iops_max": "0",
                                                "iops_rd_max": "0",
-                                               "iops_wr_max": "0" } }
+                                               "iops_wr_max": "0",
+                                               "iops_sector_count": "0" } }
 <- { "return": {} }
 
 EQMP
@@ -1776,6 +1778,7 @@ Each json-object contain the following:
          - "iops_max":  total I/O operations max(json-int)
          - "iops_rd_max":  read I/O operations max(json-int)
          - "iops_wr_max":  write I/O operations max(json-int)
+         - "iops_sector_count":  I/O sector count when limiting(json-int)
          - "image": the detail of the image, it is a json-object containing
             the following:
              - "filename": image file name (json-string)
@@ -1851,6 +1854,7 @@ Example:
                "iops_max": "0",
                "iops_rd_max": "0",
                "iops_wr_max": "0",
+               "iops_sector_count": "0",
                "image":{
                   "filename":"disks/test.qcow2",
                   "format":"qcow2",
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH V4 3/5] block: Enable the new throttling code in the block layer.
  2013-08-08 14:29 ` [Qemu-devel] [PATCH V4 3/5] block: Enable the new throttling code in the block layer Benoît Canet
@ 2013-08-08 15:30   ` Paolo Bonzini
  2013-08-08 15:43     ` [Qemu-devel] [PATCH V4 3/5] block: Enable the new throttling code in the block layer.o Benoît Canet
  2013-08-08 15:56     ` [Qemu-devel] [PATCH V4 3/5] block: Enable the new throttling code in the block layer Benoît Canet
  0 siblings, 2 replies; 10+ messages in thread
From: Paolo Bonzini @ 2013-08-08 15:30 UTC (permalink / raw)
  To: Benoît Canet; +Cc: kwolf, qemu-devel, stefanha

Il 08/08/2013 16:29, Benoît Canet ha scritto:
> tip: Do not ever use the cfg scheduler in the guest with this code.
>      It gives incorrect throttling.

This is not really accurate; the cfq scheduler reorders reads and writes
to have longer bursts, and these sometimes exceed the rate you set.  I
understood this is mostly for separate rd/wr settings, or did you
reproduce it with "total" rates too?

Also, it would be better to have a workaround for this.  Perhaps we
could simply make the default value of max nonzero?  In the old
throttling code the slice time is 0.1s, so perhaps we could see what
happens with max=0.1*avg.

Paolo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH V4 3/5] block: Enable the new throttling code in the block layer.o
  2013-08-08 15:30   ` Paolo Bonzini
@ 2013-08-08 15:43     ` Benoît Canet
  2013-08-08 15:56     ` [Qemu-devel] [PATCH V4 3/5] block: Enable the new throttling code in the block layer Benoît Canet
  1 sibling, 0 replies; 10+ messages in thread
From: Benoît Canet @ 2013-08-08 15:43 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kwolf, qemu-devel, stefanha

> This is not really accurate; the cfq scheduler reorders reads and writes
> to have longer bursts, and these sometimes exceed the rate you set.  I
> understood this is mostly for separate rd/wr settings, or did you
> reproduce it with "total" rates too?

I'll drop the comment.

It's only with read/write settings and not with total.

> 
> Also, it would be better to have a workaround for this.  Perhaps we
> could simply make the default value of max nonzero?  In the old
> throttling code the slice time is 0.1s, so perhaps we could see what
> happens with max=0.1*avg.

I already set max to 0.1*avg if needed in the code.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH V4 3/5] block: Enable the new throttling code in the block layer.
  2013-08-08 15:30   ` Paolo Bonzini
  2013-08-08 15:43     ` [Qemu-devel] [PATCH V4 3/5] block: Enable the new throttling code in the block layer.o Benoît Canet
@ 2013-08-08 15:56     ` Benoît Canet
  1 sibling, 0 replies; 10+ messages in thread
From: Benoît Canet @ 2013-08-08 15:56 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kwolf, qemu-devel, stefanha

> Also, it would be better to have a workaround for this.  Perhaps we
> could simply make the default value of max nonzero?  In the old
> throttling code the slice time is 0.1s, so perhaps we could see what
> happens with max=0.1*avg.

This gives correct results with cfg with some little auto compensated oscillations.

I will definitelly drop the bogus comment.

Best regards

Benoît

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH V4 0/5] Continuous Leaky Bucket Throttling*
  2013-08-08 14:29 [Qemu-devel] [PATCH V4 0/5] Continuous Leaky Bucket Throttling Benoît Canet
                   ` (4 preceding siblings ...)
  2013-08-08 14:29 ` [Qemu-devel] [PATCH V4 5/5] block: Add iops_sector_count to do the iops accounting for a given io size Benoît Canet
@ 2013-08-09 12:05 ` Benoît Canet
  5 siblings, 0 replies; 10+ messages in thread
From: Benoît Canet @ 2013-08-09 12:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, stefanha

> It fail with the following error message at exit and I don't know why yet.
> qemu-system-x86_64: block.c:1489: bdrv_drain_all: Assertion
> `((&bs->tracked_requests)->lh_first == ((void *)0))' failed.

I solved this issue: bdrv_drain_all was bogus.

Best regards

Benoît

> 
>  block.c                   |  351 ++++++++++----------------------
>  block/qapi.c              |   50 +++--
>  blockdev.c                |  207 ++++++++++++++-----
>  hmp.c                     |   36 +++-
>  include/block/block.h     |    1 -
>  include/block/block_int.h |   32 +--
>  include/qemu/throttle.h   |  105 ++++++++++
>  qapi-schema.json          |   40 +++-
>  qemu-options.hx           |    4 +-
>  qmp-commands.hx           |   34 +++-
>  tests/Makefile            |    2 +
>  tests/test-throttle.c     |  494 +++++++++++++++++++++++++++++++++++++++++++++
>  util/Makefile.objs        |    1 +
>  util/throttle.c           |  391 +++++++++++++++++++++++++++++++++++
>  14 files changed, 1405 insertions(+), 343 deletions(-)
>  create mode 100644 include/qemu/throttle.h
>  create mode 100644 tests/test-throttle.c
>  create mode 100644 util/throttle.c
> 
> -- 
> 1.7.10.4
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2013-08-09 12:04 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-08-08 14:29 [Qemu-devel] [PATCH V4 0/5] Continuous Leaky Bucket Throttling Benoît Canet
2013-08-08 14:29 ` [Qemu-devel] [PATCH V4 1/5] throttle: Add a new throttling API implementing continuous leaky bucket Benoît Canet
2013-08-08 14:29 ` [Qemu-devel] [PATCH V4 2/5] throttle: Add units tests Benoît Canet
2013-08-08 14:29 ` [Qemu-devel] [PATCH V4 3/5] block: Enable the new throttling code in the block layer Benoît Canet
2013-08-08 15:30   ` Paolo Bonzini
2013-08-08 15:43     ` [Qemu-devel] [PATCH V4 3/5] block: Enable the new throttling code in the block layer.o Benoît Canet
2013-08-08 15:56     ` [Qemu-devel] [PATCH V4 3/5] block: Enable the new throttling code in the block layer Benoît Canet
2013-08-08 14:29 ` [Qemu-devel] [PATCH V4 4/5] block: Add support for throttling burst max in QMP and the command line Benoît Canet
2013-08-08 14:29 ` [Qemu-devel] [PATCH V4 5/5] block: Add iops_sector_count to do the iops accounting for a given io size Benoît Canet
2013-08-09 12:05 ` [Qemu-devel] [PATCH V4 0/5] Continuous Leaky Bucket Throttling* Benoît Canet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).