* [Qemu-devel] [PATCH V5 1/5] throttle: Add a new throttling API implementing continuous leaky bucket.
2013-08-12 16:53 [Qemu-devel] [PATCH V5 0/5] Continuous Leaky Bucket Throttling Benoît Canet
@ 2013-08-12 16:53 ` Benoît Canet
2013-08-14 8:52 ` Fam Zheng
2013-08-16 11:45 ` Stefan Hajnoczi
2013-08-12 16:53 ` [Qemu-devel] [PATCH V5 2/5] throttle: Add units tests Benoît Canet
` (4 subsequent siblings)
5 siblings, 2 replies; 17+ messages in thread
From: Benoît Canet @ 2013-08-12 16:53 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha
Implement the continuous leaky bucket algorithm devised on IRC as a separate
module.
Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
include/qemu/throttle.h | 105 +++++++++++++
util/Makefile.objs | 1 +
util/throttle.c | 391 +++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 497 insertions(+)
create mode 100644 include/qemu/throttle.h
create mode 100644 util/throttle.c
diff --git a/include/qemu/throttle.h b/include/qemu/throttle.h
new file mode 100644
index 0000000..e03bc3e
--- /dev/null
+++ b/include/qemu/throttle.h
@@ -0,0 +1,105 @@
+/*
+ * QEMU throttling infrastructure
+ *
+ * Copyright (C) Nodalink, SARL. 2013
+ *
+ * Author:
+ * Benoît Canet <benoit.canet@irqsave.net>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 or
+ * (at your option) version 3 of the License.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef THROTTLING_H
+#define THROTTLING_H
+
+#include <stdint.h>
+#include "qemu-common.h"
+#include "qemu/timer.h"
+
+#define NANOSECONDS_PER_SECOND 1000000000.0
+
+#define BUCKETS_COUNT 6
+
+typedef enum {
+ THROTTLE_BPS_TOTAL = 0,
+ THROTTLE_BPS_READ = 1,
+ THROTTLE_BPS_WRITE = 2,
+ THROTTLE_OPS_TOTAL = 3,
+ THROTTLE_OPS_READ = 4,
+ THROTTLE_OPS_WRITE = 5,
+} BucketType;
+
+typedef struct LeakyBucket {
+ double ups; /* units per second */
+ double max; /* leaky bucket max in units */
+ double bucket; /* bucket in units */
+} LeakyBucket;
+
+/* The following structure is used to configure a ThrottleState
+ * It contains a bit of state: the bucket field of the LeakyBucket structure.
+ * However it allows to keep the code clean and the bucket field is reset to
+ * zero at the right time.
+ */
+typedef struct ThrottleConfig {
+ LeakyBucket buckets[6]; /* leaky buckets */
+ uint64_t unit_size; /* size of an unit in bytes */
+ uint64_t op_size; /* size of an operation in units */
+} ThrottleConfig;
+
+typedef struct ThrottleState {
+ ThrottleConfig cfg; /* configuration */
+ int64_t previous_leak; /* timestamp of the last leak done */
+ QEMUTimer * timers[2]; /* timers used to do the throttling */
+ QEMUClock *clock; /* the clock used */
+} ThrottleState;
+
+/* operations on single leaky buckets */
+void throttle_leak_bucket(LeakyBucket *bkt, int64_t delta);
+
+int64_t throttle_compute_wait(LeakyBucket *bkt);
+
+/* expose timer computation function for unit tests */
+bool throttle_compute_timer(ThrottleState *ts,
+ bool is_write,
+ int64_t now,
+ int64_t *next_timer);
+
+/* init/destroy cycle */
+void throttle_init(ThrottleState *ts,
+ QEMUClock *clock,
+ void (read_timer)(void *),
+ void (write_timer)(void *),
+ void *timer_opaque);
+
+void throttle_destroy(ThrottleState *ts);
+
+bool throttle_have_timer(ThrottleState *ts);
+
+/* configuration */
+bool throttle_enabled(ThrottleConfig *cfg);
+
+bool throttle_conflicting(ThrottleConfig *cfg);
+
+bool throttle_is_valid(ThrottleConfig *cfg);
+
+void throttle_config(ThrottleState *ts, ThrottleConfig *cfg);
+
+void throttle_get_config(ThrottleState *ts, ThrottleConfig *cfg);
+
+/* usage */
+bool throttle_allowed(ThrottleState *ts, bool is_write);
+
+void throttle_account(ThrottleState *ts, bool is_write, uint64_t size);
+
+#endif
diff --git a/util/Makefile.objs b/util/Makefile.objs
index dc72ab0..2bb13a2 100644
--- a/util/Makefile.objs
+++ b/util/Makefile.objs
@@ -11,3 +11,4 @@ util-obj-y += iov.o aes.o qemu-config.o qemu-sockets.o uri.o notify.o
util-obj-y += qemu-option.o qemu-progress.o
util-obj-y += hexdump.o
util-obj-y += crc32c.o
+util-obj-y += throttle.o
diff --git a/util/throttle.c b/util/throttle.c
new file mode 100644
index 0000000..2f25d44
--- /dev/null
+++ b/util/throttle.c
@@ -0,0 +1,391 @@
+/*
+ * QEMU throttling infrastructure
+ *
+ * Copyright (C) Nodalink, SARL. 2013
+ *
+ * Author:
+ * Benoît Canet <benoit.canet@irqsave.net>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 or
+ * (at your option) version 3 of the License.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/throttle.h"
+#include "qemu/timer.h"
+
+/* This function make a bucket leak
+ *
+ * @bkt: the bucket to make leak
+ * @delta: the time delta
+ */
+void throttle_leak_bucket(LeakyBucket *bkt, int64_t delta)
+{
+ double leak;
+
+ /* compute how much to leak */
+ leak = (bkt->ups * (double) delta) / NANOSECONDS_PER_SECOND;
+
+ /* make the bucket leak */
+ bkt->bucket = MAX(bkt->bucket - leak, 0);
+}
+
+/* Calculate the time delta since last leak and make proportionals leaks
+ *
+ * @now: the current timestamp in ns
+ */
+static void throttle_do_leak(ThrottleState *ts, int64_t now)
+{
+ /* compute the time elapsed since the last leak */
+ int64_t delta = now - ts->previous_leak;
+ int i;
+
+ ts->previous_leak = now;
+
+ if (delta <= 0) {
+ return;
+ }
+
+ /* make each bucket leak */
+ for (i = 0; i < BUCKETS_COUNT; i++) {
+ throttle_leak_bucket(&ts->cfg.buckets[i], delta);
+ }
+}
+
+/* do the real job of computing the time to wait
+ *
+ * @limit: the throttling limit
+ * @extra: the number of operation to delay
+ * @ret: the time to wait in ns
+ */
+static int64_t throttle_do_compute_wait(double limit, double extra)
+{
+ double wait = extra * NANOSECONDS_PER_SECOND;
+ wait /= limit;
+ return wait;
+}
+
+/* This function compute the wait time in ns that a leaky bucket should trigger
+ *
+ * @bkt: the leaky bucket we operate on
+ * @ret: the resulting wait time in ns or 0 if the operation can go through
+ */
+int64_t throttle_compute_wait(LeakyBucket *bkt)
+{
+ double extra; /* the number of extra units blocking the io */
+
+ if (!bkt->ups) {
+ return 0;
+ }
+
+ extra = bkt->bucket - bkt->max;
+
+ if (extra <= 0) {
+ return 0;
+ }
+
+ return throttle_do_compute_wait(bkt->ups, extra);
+}
+
+/* This function compute the time that must be waited while this IO
+ *
+ * @is_write: true if the current IO is a write, false if it's a read
+ * @ret: time to wait
+ */
+static int64_t throttle_compute_wait_for(ThrottleState *ts,
+ bool is_write,
+ int64_t now)
+{
+ BucketType to_check[2][4] = { {THROTTLE_BPS_TOTAL,
+ THROTTLE_OPS_TOTAL,
+ THROTTLE_BPS_READ,
+ THROTTLE_OPS_READ},
+ {THROTTLE_BPS_TOTAL,
+ THROTTLE_OPS_TOTAL,
+ THROTTLE_BPS_WRITE,
+ THROTTLE_OPS_WRITE}, };
+ int64_t wait, max_wait = 0;
+ int i;
+
+ for (i = 0; i < 4; i++) {
+ BucketType index = to_check[is_write][i];
+ wait = throttle_compute_wait(&ts->cfg.buckets[index]);
+ if (wait > max_wait) {
+ max_wait = wait;
+ }
+ }
+
+ return max_wait;
+}
+
+/* compute the timer for this type of operation
+ *
+ * @is_write: the type of operation
+ * @now: the current clock timerstamp
+ * @next_timer: the resulting timer
+ * @ret: true if a timer must be set
+ */
+bool throttle_compute_timer(ThrottleState *ts,
+ bool is_write,
+ int64_t now,
+ int64_t *next_timer)
+{
+ int64_t wait;
+
+ /* leak proportionally to the time elapsed */
+ throttle_do_leak(ts, now);
+
+ /* compute the wait time if any */
+ wait = throttle_compute_wait_for(ts, is_write, now);
+
+ /* if the code must wait compute when the next timer should fire */
+ if (wait) {
+ *next_timer = now + wait;
+ return true;
+ }
+
+ /* else no need to wait at all */
+ *next_timer = now;
+ return false;
+}
+
+/* To be called first on the ThrottleState */
+void throttle_init(ThrottleState *ts,
+ QEMUClock *clock,
+ void (read_timer)(void *),
+ void (write_timer)(void *),
+ void *timer_opaque)
+{
+ memset(ts, 0, sizeof(ThrottleState));
+
+ ts->clock = clock;
+ ts->timers[0] = qemu_new_timer_ns(ts->clock, read_timer, timer_opaque);
+ ts->timers[1] = qemu_new_timer_ns(ts->clock, write_timer, timer_opaque);
+}
+
+/* destroy a timer */
+static void throttle_timer_destroy(QEMUTimer **timer)
+{
+ assert(*timer != NULL);
+
+ if (qemu_timer_pending(*timer)) {
+ qemu_del_timer(*timer);
+ }
+
+ qemu_free_timer(*timer);
+ *timer = NULL;
+}
+
+/* To be called last on the ThrottleState */
+void throttle_destroy(ThrottleState *ts)
+{
+ int i;
+
+ for (i = 0; i < 2; i++) {
+ throttle_timer_destroy(&ts->timers[i]);
+ }
+}
+
+/* is any throttling timer configured */
+bool throttle_have_timer(ThrottleState *ts)
+{
+ if (ts->timers[0]) {
+ return true;
+ }
+
+ return false;
+}
+
+/* Does any throttling must be done
+ *
+ * @cfg: the throttling configuration to inspect
+ * @ret: true if throttling must be done else false
+ */
+bool throttle_enabled(ThrottleConfig *cfg)
+{
+ int i;
+
+ for (i = 0; i < BUCKETS_COUNT; i++) {
+ if (cfg->buckets[i].ups > 0) {
+ return true;
+ }
+ }
+
+ return false;
+}
+
+/* return true if any two throttling parameters conflicts
+ *
+ * @cfg: the throttling configuration to inspect
+ * @ret: true if any conflict detected else false
+ */
+bool throttle_conflicting(ThrottleConfig *cfg)
+{
+ bool bps_flag, ops_flag;
+ bool bps_max_flag, ops_max_flag;
+
+ bps_flag = cfg->buckets[THROTTLE_BPS_TOTAL].ups &&
+ (cfg->buckets[THROTTLE_BPS_READ].ups ||
+ cfg->buckets[THROTTLE_BPS_WRITE].ups);
+
+ ops_flag = cfg->buckets[THROTTLE_OPS_TOTAL].ups &&
+ (cfg->buckets[THROTTLE_OPS_READ].ups ||
+ cfg->buckets[THROTTLE_OPS_WRITE].ups);
+
+ bps_max_flag = cfg->buckets[THROTTLE_BPS_TOTAL].max &&
+ (cfg->buckets[THROTTLE_BPS_READ].max ||
+ cfg->buckets[THROTTLE_BPS_WRITE].max);
+
+ ops_max_flag = cfg->buckets[THROTTLE_OPS_TOTAL].max &&
+ (cfg->buckets[THROTTLE_OPS_READ].max ||
+ cfg->buckets[THROTTLE_OPS_WRITE].max);
+
+ return bps_flag || ops_flag || bps_max_flag || ops_max_flag;
+}
+
+/* check if a throttling configuration is valid
+ * @cfg: the throttling configuration to inspect
+ * @ret: true if valid else false
+ */
+bool throttle_is_valid(ThrottleConfig *cfg)
+{
+ bool invalid = false;
+ int i;
+
+ for (i = 0; i < BUCKETS_COUNT; i++) {
+ if (cfg->buckets[i].ups < 0) {
+ invalid = true;
+ }
+ }
+
+ for (i = 0; i < BUCKETS_COUNT; i++) {
+ if (cfg->buckets[i].max < 0) {
+ invalid = true;
+ }
+ }
+
+ return !invalid;
+}
+
+/* fix bucket parameters */
+static void throttle_fix_bucket(LeakyBucket *bkt)
+{
+ double min = bkt->ups / 10;
+ /* zero bucket level */
+ bkt->bucket = 0;
+
+ /* take care of not using cpu and also improve throttling precision */
+ if (bkt->ups &&
+ bkt->max < min) {
+ bkt->max = min;
+ }
+}
+
+/* take care of canceling a timer */
+static void throttle_cancel_timer(QEMUTimer *timer)
+{
+ assert(timer != NULL);
+ if (!qemu_timer_pending(timer)) {
+ return;
+ }
+
+ qemu_del_timer(timer);
+}
+
+/* Used to configure the throttle
+ *
+ * @ts: the throttle state we are working on
+ * @cfg: the config to set
+ */
+void throttle_config(ThrottleState *ts, ThrottleConfig *cfg)
+{
+ int i;
+
+ ts->cfg = *cfg;
+
+ for (i = 0; i < BUCKETS_COUNT; i++) {
+ throttle_fix_bucket(&ts->cfg.buckets[i]);
+ }
+
+ ts->previous_leak = qemu_get_clock_ns(ts->clock);
+
+ for (i = 0; i < 2; i++) {
+ throttle_cancel_timer(ts->timers[i]);
+ }
+}
+
+/* used to get config
+ *
+ * @ts: the throttle state we are working on
+ * @cfg: where to write the config
+ */
+void throttle_get_config(ThrottleState *ts, ThrottleConfig *cfg)
+{
+ *cfg = ts->cfg;
+}
+
+
+/* compute if an operation must be allowed and set a timer if not
+ *
+ * NOTE: this function is not unit tested due to it's usage of qemu_mod_timer
+ *
+ * @is_write: the type of operation (read/write)
+ * @ret: true if the operation is allowed to flow else if must wait
+ */
+bool throttle_allowed(ThrottleState *ts, bool is_write)
+{
+ int64_t now = qemu_get_clock_ns(ts->clock);
+ int64_t next_timer;
+ bool must_wait;
+
+ must_wait = throttle_compute_timer(ts,
+ is_write,
+ now,
+ &next_timer);
+
+ /* if the request is throttled arm timer */
+ if (must_wait) {
+ qemu_mod_timer(ts->timers[is_write], next_timer);
+ }
+
+ return !must_wait;
+}
+
+/* do the accounting for this operation
+ *
+ * @is_write: the type of operation (read/write)
+ * size: the size of the operation
+ */
+void throttle_account(ThrottleState *ts, bool is_write, uint64_t size)
+{
+ double bytes_size;
+
+ /* if cfg.op_size is not defined we will acccount exactly 1 operation */
+ double units = 1.0;
+ if (ts->cfg.op_size) {
+ units = (double) size / ts->cfg.op_size;
+ }
+
+ bytes_size = size * ts->cfg.unit_size;
+
+ ts->cfg.buckets[THROTTLE_BPS_TOTAL].bucket += bytes_size;
+ ts->cfg.buckets[THROTTLE_OPS_TOTAL].bucket += units;
+
+ if (is_write) {
+ ts->cfg.buckets[THROTTLE_BPS_WRITE].bucket += bytes_size;
+ ts->cfg.buckets[THROTTLE_OPS_WRITE].bucket += units;
+ } else {
+ ts->cfg.buckets[THROTTLE_BPS_READ].bucket += bytes_size;
+ ts->cfg.buckets[THROTTLE_OPS_READ].bucket += units;
+ }
+}
+
--
1.7.10.4
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [Qemu-devel] [PATCH V5 1/5] throttle: Add a new throttling API implementing continuous leaky bucket.
2013-08-12 16:53 ` [Qemu-devel] [PATCH V5 1/5] throttle: Add a new throttling API implementing continuous leaky bucket Benoît Canet
@ 2013-08-14 8:52 ` Fam Zheng
2013-08-16 11:45 ` Stefan Hajnoczi
1 sibling, 0 replies; 17+ messages in thread
From: Fam Zheng @ 2013-08-14 8:52 UTC (permalink / raw)
To: Benoît Canet; +Cc: kwolf, pbonzini, qemu-devel, stefanha
On Mon, 08/12 18:53, Benoît Canet wrote:
> Implement the continuous leaky bucket algorithm devised on IRC as a separate
> module.
>
> Signed-off-by: Benoit Canet <benoit@irqsave.net>
> ---
> include/qemu/throttle.h | 105 +++++++++++++
> util/Makefile.objs | 1 +
> util/throttle.c | 391 +++++++++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 497 insertions(+)
> create mode 100644 include/qemu/throttle.h
> create mode 100644 util/throttle.c
>
> diff --git a/include/qemu/throttle.h b/include/qemu/throttle.h
> new file mode 100644
> index 0000000..e03bc3e
> --- /dev/null
> +++ b/include/qemu/throttle.h
> @@ -0,0 +1,105 @@
> +/*
> + * QEMU throttling infrastructure
> + *
> + * Copyright (C) Nodalink, SARL. 2013
> + *
> + * Author:
> + * Benoît Canet <benoit.canet@irqsave.net>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License as
> + * published by the Free Software Foundation; either version 2 or
> + * (at your option) version 3 of the License.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#ifndef THROTTLING_H
> +#define THROTTLING_H
> +
> +#include <stdint.h>
> +#include "qemu-common.h"
> +#include "qemu/timer.h"
> +
> +#define NANOSECONDS_PER_SECOND 1000000000.0
> +
> +#define BUCKETS_COUNT 6
> +
> +typedef enum {
> + THROTTLE_BPS_TOTAL = 0,
> + THROTTLE_BPS_READ = 1,
> + THROTTLE_BPS_WRITE = 2,
> + THROTTLE_OPS_TOTAL = 3,
> + THROTTLE_OPS_READ = 4,
> + THROTTLE_OPS_WRITE = 5,
> +} BucketType;
> +
> +typedef struct LeakyBucket {
> + double ups; /* units per second */
> + double max; /* leaky bucket max in units */
> + double bucket; /* bucket in units */
> +} LeakyBucket;
> +
> +/* The following structure is used to configure a ThrottleState
> + * It contains a bit of state: the bucket field of the LeakyBucket structure.
> + * However it allows to keep the code clean and the bucket field is reset to
> + * zero at the right time.
> + */
> +typedef struct ThrottleConfig {
> + LeakyBucket buckets[6]; /* leaky buckets */
> + uint64_t unit_size; /* size of an unit in bytes */
> + uint64_t op_size; /* size of an operation in units */
> +} ThrottleConfig;
> +
> +typedef struct ThrottleState {
> + ThrottleConfig cfg; /* configuration */
> + int64_t previous_leak; /* timestamp of the last leak done */
> + QEMUTimer * timers[2]; /* timers used to do the throttling */
> + QEMUClock *clock; /* the clock used */
> +} ThrottleState;
> +
> +/* operations on single leaky buckets */
> +void throttle_leak_bucket(LeakyBucket *bkt, int64_t delta);
> +
> +int64_t throttle_compute_wait(LeakyBucket *bkt);
> +
> +/* expose timer computation function for unit tests */
> +bool throttle_compute_timer(ThrottleState *ts,
> + bool is_write,
> + int64_t now,
> + int64_t *next_timer);
> +
> +/* init/destroy cycle */
> +void throttle_init(ThrottleState *ts,
> + QEMUClock *clock,
> + void (read_timer)(void *),
> + void (write_timer)(void *),
> + void *timer_opaque);
> +
> +void throttle_destroy(ThrottleState *ts);
> +
> +bool throttle_have_timer(ThrottleState *ts);
> +
> +/* configuration */
> +bool throttle_enabled(ThrottleConfig *cfg);
> +
> +bool throttle_conflicting(ThrottleConfig *cfg);
> +
> +bool throttle_is_valid(ThrottleConfig *cfg);
> +
> +void throttle_config(ThrottleState *ts, ThrottleConfig *cfg);
> +
> +void throttle_get_config(ThrottleState *ts, ThrottleConfig *cfg);
> +
> +/* usage */
> +bool throttle_allowed(ThrottleState *ts, bool is_write);
> +
> +void throttle_account(ThrottleState *ts, bool is_write, uint64_t size);
> +
> +#endif
> diff --git a/util/Makefile.objs b/util/Makefile.objs
> index dc72ab0..2bb13a2 100644
> --- a/util/Makefile.objs
> +++ b/util/Makefile.objs
> @@ -11,3 +11,4 @@ util-obj-y += iov.o aes.o qemu-config.o qemu-sockets.o uri.o notify.o
> util-obj-y += qemu-option.o qemu-progress.o
> util-obj-y += hexdump.o
> util-obj-y += crc32c.o
> +util-obj-y += throttle.o
> diff --git a/util/throttle.c b/util/throttle.c
> new file mode 100644
> index 0000000..2f25d44
> --- /dev/null
> +++ b/util/throttle.c
> @@ -0,0 +1,391 @@
> +/*
> + * QEMU throttling infrastructure
> + *
> + * Copyright (C) Nodalink, SARL. 2013
> + *
> + * Author:
> + * Benoît Canet <benoit.canet@irqsave.net>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License as
> + * published by the Free Software Foundation; either version 2 or
> + * (at your option) version 3 of the License.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include "qemu/throttle.h"
> +#include "qemu/timer.h"
> +
> +/* This function make a bucket leak
> + *
> + * @bkt: the bucket to make leak
> + * @delta: the time delta
> + */
> +void throttle_leak_bucket(LeakyBucket *bkt, int64_t delta)
> +{
> + double leak;
> +
> + /* compute how much to leak */
> + leak = (bkt->ups * (double) delta) / NANOSECONDS_PER_SECOND;
> +
> + /* make the bucket leak */
> + bkt->bucket = MAX(bkt->bucket - leak, 0);
> +}
> +
> +/* Calculate the time delta since last leak and make proportionals leaks
> + *
> + * @now: the current timestamp in ns
> + */
> +static void throttle_do_leak(ThrottleState *ts, int64_t now)
> +{
> + /* compute the time elapsed since the last leak */
> + int64_t delta = now - ts->previous_leak;
> + int i;
> +
> + ts->previous_leak = now;
> +
> + if (delta <= 0) {
> + return;
> + }
> +
> + /* make each bucket leak */
> + for (i = 0; i < BUCKETS_COUNT; i++) {
> + throttle_leak_bucket(&ts->cfg.buckets[i], delta);
> + }
> +}
> +
> +/* do the real job of computing the time to wait
> + *
> + * @limit: the throttling limit
> + * @extra: the number of operation to delay
> + * @ret: the time to wait in ns
> + */
> +static int64_t throttle_do_compute_wait(double limit, double extra)
> +{
> + double wait = extra * NANOSECONDS_PER_SECOND;
> + wait /= limit;
> + return wait;
> +}
> +
> +/* This function compute the wait time in ns that a leaky bucket should trigger
> + *
> + * @bkt: the leaky bucket we operate on
> + * @ret: the resulting wait time in ns or 0 if the operation can go through
> + */
> +int64_t throttle_compute_wait(LeakyBucket *bkt)
> +{
> + double extra; /* the number of extra units blocking the io */
> +
> + if (!bkt->ups) {
> + return 0;
> + }
> +
> + extra = bkt->bucket - bkt->max;
> +
> + if (extra <= 0) {
> + return 0;
> + }
> +
> + return throttle_do_compute_wait(bkt->ups, extra);
> +}
> +
> +/* This function compute the time that must be waited while this IO
> + *
> + * @is_write: true if the current IO is a write, false if it's a read
> + * @ret: time to wait
> + */
> +static int64_t throttle_compute_wait_for(ThrottleState *ts,
> + bool is_write,
> + int64_t now)
Parameter "now" not used.
> +{
> + BucketType to_check[2][4] = { {THROTTLE_BPS_TOTAL,
> + THROTTLE_OPS_TOTAL,
> + THROTTLE_BPS_READ,
> + THROTTLE_OPS_READ},
> + {THROTTLE_BPS_TOTAL,
> + THROTTLE_OPS_TOTAL,
> + THROTTLE_BPS_WRITE,
> + THROTTLE_OPS_WRITE}, };
> + int64_t wait, max_wait = 0;
> + int i;
> +
> + for (i = 0; i < 4; i++) {
> + BucketType index = to_check[is_write][i];
> + wait = throttle_compute_wait(&ts->cfg.buckets[index]);
> + if (wait > max_wait) {
> + max_wait = wait;
> + }
> + }
> +
> + return max_wait;
> +}
> +
> +/* compute the timer for this type of operation
> + *
> + * @is_write: the type of operation
> + * @now: the current clock timerstamp
s/timerstamp/timestamp/
> + * @next_timer: the resulting timer
A bit confusing with the name and description. This is actually a computed
timestamp for next triggering of the timer, right?
> + * @ret: true if a timer must be set
> + */
> +bool throttle_compute_timer(ThrottleState *ts,
> + bool is_write,
> + int64_t now,
> + int64_t *next_timer)
> +{
> + int64_t wait;
> +
> + /* leak proportionally to the time elapsed */
> + throttle_do_leak(ts, now);
> +
> + /* compute the wait time if any */
> + wait = throttle_compute_wait_for(ts, is_write, now);
> +
> + /* if the code must wait compute when the next timer should fire */
> + if (wait) {
> + *next_timer = now + wait;
> + return true;
> + }
> +
> + /* else no need to wait at all */
> + *next_timer = now;
> + return false;
> +}
> +
> +/* To be called first on the ThrottleState */
> +void throttle_init(ThrottleState *ts,
> + QEMUClock *clock,
> + void (read_timer)(void *),
There's a type name for this:
QEMUTimerCB * read_timer_cb,
> + void (write_timer)(void *),
QEMUTimerCB * write_timer_cb,
> + void *timer_opaque)
> +{
> + memset(ts, 0, sizeof(ThrottleState));
> +
> + ts->clock = clock;
> + ts->timers[0] = qemu_new_timer_ns(ts->clock, read_timer, timer_opaque);
> + ts->timers[1] = qemu_new_timer_ns(ts->clock, write_timer, timer_opaque);
> +}
> +
> +/* destroy a timer */
> +static void throttle_timer_destroy(QEMUTimer **timer)
> +{
> + assert(*timer != NULL);
> +
> + if (qemu_timer_pending(*timer)) {
> + qemu_del_timer(*timer);
> + }
> +
> + qemu_free_timer(*timer);
> + *timer = NULL;
> +}
> +
> +/* To be called last on the ThrottleState */
> +void throttle_destroy(ThrottleState *ts)
> +{
> + int i;
> +
> + for (i = 0; i < 2; i++) {
> + throttle_timer_destroy(&ts->timers[i]);
> + }
> +}
> +
> +/* is any throttling timer configured */
> +bool throttle_have_timer(ThrottleState *ts)
> +{
> + if (ts->timers[0]) {
> + return true;
> + }
> +
> + return false;
> +}
> +
> +/* Does any throttling must be done
> + *
> + * @cfg: the throttling configuration to inspect
> + * @ret: true if throttling must be done else false
> + */
> +bool throttle_enabled(ThrottleConfig *cfg)
> +{
> + int i;
> +
> + for (i = 0; i < BUCKETS_COUNT; i++) {
> + if (cfg->buckets[i].ups > 0) {
> + return true;
> + }
> + }
> +
> + return false;
> +}
> +
> +/* return true if any two throttling parameters conflicts
> + *
> + * @cfg: the throttling configuration to inspect
> + * @ret: true if any conflict detected else false
> + */
> +bool throttle_conflicting(ThrottleConfig *cfg)
> +{
> + bool bps_flag, ops_flag;
> + bool bps_max_flag, ops_max_flag;
> +
> + bps_flag = cfg->buckets[THROTTLE_BPS_TOTAL].ups &&
> + (cfg->buckets[THROTTLE_BPS_READ].ups ||
> + cfg->buckets[THROTTLE_BPS_WRITE].ups);
> +
> + ops_flag = cfg->buckets[THROTTLE_OPS_TOTAL].ups &&
> + (cfg->buckets[THROTTLE_OPS_READ].ups ||
> + cfg->buckets[THROTTLE_OPS_WRITE].ups);
> +
> + bps_max_flag = cfg->buckets[THROTTLE_BPS_TOTAL].max &&
> + (cfg->buckets[THROTTLE_BPS_READ].max ||
> + cfg->buckets[THROTTLE_BPS_WRITE].max);
> +
> + ops_max_flag = cfg->buckets[THROTTLE_OPS_TOTAL].max &&
> + (cfg->buckets[THROTTLE_OPS_READ].max ||
> + cfg->buckets[THROTTLE_OPS_WRITE].max);
> +
> + return bps_flag || ops_flag || bps_max_flag || ops_max_flag;
> +}
> +
> +/* check if a throttling configuration is valid
> + * @cfg: the throttling configuration to inspect
> + * @ret: true if valid else false
> + */
> +bool throttle_is_valid(ThrottleConfig *cfg)
> +{
> + bool invalid = false;
> + int i;
> +
> + for (i = 0; i < BUCKETS_COUNT; i++) {
> + if (cfg->buckets[i].ups < 0) {
> + invalid = true;
> + }
> + }
> +
> + for (i = 0; i < BUCKETS_COUNT; i++) {
> + if (cfg->buckets[i].max < 0) {
> + invalid = true;
> + }
> + }
> +
> + return !invalid;
> +}
> +
> +/* fix bucket parameters */
> +static void throttle_fix_bucket(LeakyBucket *bkt)
> +{
> + double min = bkt->ups / 10;
> + /* zero bucket level */
> + bkt->bucket = 0;
> +
> + /* take care of not using cpu and also improve throttling precision */
> + if (bkt->ups &&
> + bkt->max < min) {
> + bkt->max = min;
> + }
> +}
> +
> +/* take care of canceling a timer */
> +static void throttle_cancel_timer(QEMUTimer *timer)
> +{
> + assert(timer != NULL);
> + if (!qemu_timer_pending(timer)) {
> + return;
> + }
> +
> + qemu_del_timer(timer);
> +}
> +
> +/* Used to configure the throttle
> + *
> + * @ts: the throttle state we are working on
> + * @cfg: the config to set
> + */
> +void throttle_config(ThrottleState *ts, ThrottleConfig *cfg)
> +{
> + int i;
> +
> + ts->cfg = *cfg;
> +
> + for (i = 0; i < BUCKETS_COUNT; i++) {
> + throttle_fix_bucket(&ts->cfg.buckets[i]);
> + }
> +
> + ts->previous_leak = qemu_get_clock_ns(ts->clock);
> +
> + for (i = 0; i < 2; i++) {
> + throttle_cancel_timer(ts->timers[i]);
> + }
> +}
> +
> +/* used to get config
> + *
> + * @ts: the throttle state we are working on
> + * @cfg: where to write the config
> + */
> +void throttle_get_config(ThrottleState *ts, ThrottleConfig *cfg)
> +{
> + *cfg = ts->cfg;
Any reason to copy the structure, instead of using the reference?
> +}
> +
> +
> +/* compute if an operation must be allowed and set a timer if not
> + *
> + * NOTE: this function is not unit tested due to it's usage of qemu_mod_timer
> + *
> + * @is_write: the type of operation (read/write)
> + * @ret: true if the operation is allowed to flow else if must wait
> + */
> +bool throttle_allowed(ThrottleState *ts, bool is_write)
The function name sounds like a immutable check operation, but it actually
reschedules timer. This is not a good style.
> +{
> + int64_t now = qemu_get_clock_ns(ts->clock);
> + int64_t next_timer;
> + bool must_wait;
> +
> + must_wait = throttle_compute_timer(ts,
> + is_write,
> + now,
> + &next_timer);
> +
> + /* if the request is throttled arm timer */
> + if (must_wait) {
> + qemu_mod_timer(ts->timers[is_write], next_timer);
> + }
> +
> + return !must_wait;
> +}
> +
> +/* do the accounting for this operation
> + *
> + * @is_write: the type of operation (read/write)
> + * size: the size of the operation
> + */
> +void throttle_account(ThrottleState *ts, bool is_write, uint64_t size)
> +{
> + double bytes_size;
> +
> + /* if cfg.op_size is not defined we will acccount exactly 1 operation */
> + double units = 1.0;
> + if (ts->cfg.op_size) {
> + units = (double) size / ts->cfg.op_size;
> + }
> +
> + bytes_size = size * ts->cfg.unit_size;
> +
> + ts->cfg.buckets[THROTTLE_BPS_TOTAL].bucket += bytes_size;
> + ts->cfg.buckets[THROTTLE_OPS_TOTAL].bucket += units;
> +
> + if (is_write) {
> + ts->cfg.buckets[THROTTLE_BPS_WRITE].bucket += bytes_size;
> + ts->cfg.buckets[THROTTLE_OPS_WRITE].bucket += units;
> + } else {
> + ts->cfg.buckets[THROTTLE_BPS_READ].bucket += bytes_size;
> + ts->cfg.buckets[THROTTLE_OPS_READ].bucket += units;
> + }
> +}
> +
> --
> 1.7.10.4
>
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Qemu-devel] [PATCH V5 1/5] throttle: Add a new throttling API implementing continuous leaky bucket.
2013-08-12 16:53 ` [Qemu-devel] [PATCH V5 1/5] throttle: Add a new throttling API implementing continuous leaky bucket Benoît Canet
2013-08-14 8:52 ` Fam Zheng
@ 2013-08-16 11:45 ` Stefan Hajnoczi
2013-08-19 11:27 ` Paolo Bonzini
1 sibling, 1 reply; 17+ messages in thread
From: Stefan Hajnoczi @ 2013-08-16 11:45 UTC (permalink / raw)
To: Benoît Canet; +Cc: kwolf, pbonzini, qemu-devel, stefanha
On Mon, Aug 12, 2013 at 06:53:12PM +0200, Benoît Canet wrote:
> +#ifndef THROTTLING_H
> +#define THROTTLING_H
THROTTLE_H
> +
> +#include <stdint.h>
> +#include "qemu-common.h"
> +#include "qemu/timer.h"
> +
> +#define NANOSECONDS_PER_SECOND 1000000000.0
> +
> +#define BUCKETS_COUNT 6
> +
> +typedef enum {
> + THROTTLE_BPS_TOTAL = 0,
> + THROTTLE_BPS_READ = 1,
> + THROTTLE_BPS_WRITE = 2,
> + THROTTLE_OPS_TOTAL = 3,
> + THROTTLE_OPS_READ = 4,
> + THROTTLE_OPS_WRITE = 5,
> +} BucketType;
> +
> +typedef struct LeakyBucket {
> + double ups; /* units per second */
> + double max; /* leaky bucket max in units */
> + double bucket; /* bucket in units */
These comments aren't very clear to me :). So I guess bps or iops would
be in ups. Max would be the total budget or maximum burst. Bucket
might be the current level.
> +} LeakyBucket;
> +
> +/* The following structure is used to configure a ThrottleState
> + * It contains a bit of state: the bucket field of the LeakyBucket structure.
> + * However it allows to keep the code clean and the bucket field is reset to
> + * zero at the right time.
> + */
> +typedef struct ThrottleConfig {
> + LeakyBucket buckets[6]; /* leaky buckets */
s/6/THROTTLE_TYPE_MAX/
> + uint64_t unit_size; /* size of an unit in bytes */
> + uint64_t op_size; /* size of an operation in units */
It's not clear yet why we need both unit_size *and* op_size. I thought
you would have a single granularity field for accounting big requests as
multiple iops.
> +/* This function make a bucket leak
> + *
> + * @bkt: the bucket to make leak
> + * @delta: the time delta
delta is in nanoseconds. Probably best to call it delta_ns.
> +/* destroy a timer */
> +static void throttle_timer_destroy(QEMUTimer **timer)
> +{
> + assert(*timer != NULL);
> +
> + if (qemu_timer_pending(*timer)) {
> + qemu_del_timer(*timer);
> + }
You can always call qemu_del_timer(), the timer doesn't need to be pending.
> +/* fix bucket parameters */
> +static void throttle_fix_bucket(LeakyBucket *bkt)
> +{
> + double min = bkt->ups / 10;
> + /* zero bucket level */
> + bkt->bucket = 0;
> +
> + /* take care of not using cpu and also improve throttling precision */
> + if (bkt->ups &&
> + bkt->max < min) {
> + bkt->max = min;
> + }
> +}
This function seems like magic. What is really going on here? Why
divide by 10 and when does this case happen?
> +
> +/* take care of canceling a timer */
> +static void throttle_cancel_timer(QEMUTimer *timer)
> +{
> + assert(timer != NULL);
> + if (!qemu_timer_pending(timer)) {
> + return;
> + }
No need to check pending first.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Qemu-devel] [PATCH V5 1/5] throttle: Add a new throttling API implementing continuous leaky bucket.
2013-08-16 11:45 ` Stefan Hajnoczi
@ 2013-08-19 11:27 ` Paolo Bonzini
0 siblings, 0 replies; 17+ messages in thread
From: Paolo Bonzini @ 2013-08-19 11:27 UTC (permalink / raw)
To: Stefan Hajnoczi; +Cc: kwolf, Benoît Canet, stefanha, qemu-devel
Il 16/08/2013 13:45, Stefan Hajnoczi ha scritto:
>> > +#define BUCKETS_COUNT 6
>> > +
>> > +typedef enum {
>> > + THROTTLE_BPS_TOTAL = 0,
>> > + THROTTLE_BPS_READ = 1,
>> > + THROTTLE_BPS_WRITE = 2,
>> > + THROTTLE_OPS_TOTAL = 3,
>> > + THROTTLE_OPS_READ = 4,
>> > + THROTTLE_OPS_WRITE = 5,
>> > +} BucketType;
Please remove the "= N" from the enums, and add BUCKETS_COUNT here.
>> > +typedef struct LeakyBucket {
>> > + double ups; /* units per second */
>> > + double max; /* leaky bucket max in units */
>> > + double bucket; /* bucket in units */
> These comments aren't very clear to me :). So I guess bps or iops would
> be in ups. Max would be the total budget or maximum burst. Bucket
> might be the current level.
I also suggest replacing "ups" with "avg", since it's the average
throughput that the leaky bucket allows after the initial burst has
emptied the bucket.
>> + uint64_t unit_size; /* size of an unit in bytes */
>> + uint64_t op_size; /* size of an operation in units */
>
> It's not clear yet why we need both unit_size *and* op_size. I thought
> you would have a single granularity field for accounting big requests as
> multiple iops.
IIUC the ops buckets account operations in op_size / unit_size units,
while the bps buckets account operations in 1 / unit_size units, or
something like that. But it needs clarification indeed.
Paolo
^ permalink raw reply [flat|nested] 17+ messages in thread
* [Qemu-devel] [PATCH V5 2/5] throttle: Add units tests
2013-08-12 16:53 [Qemu-devel] [PATCH V5 0/5] Continuous Leaky Bucket Throttling Benoît Canet
2013-08-12 16:53 ` [Qemu-devel] [PATCH V5 1/5] throttle: Add a new throttling API implementing continuous leaky bucket Benoît Canet
@ 2013-08-12 16:53 ` Benoît Canet
2013-08-12 16:53 ` [Qemu-devel] [PATCH V5 3/5] block: Enable the new throttling code in the block layer Benoît Canet
` (3 subsequent siblings)
5 siblings, 0 replies; 17+ messages in thread
From: Benoît Canet @ 2013-08-12 16:53 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha
Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
tests/Makefile | 2 +
tests/test-throttle.c | 494 +++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 496 insertions(+)
create mode 100644 tests/test-throttle.c
diff --git a/tests/Makefile b/tests/Makefile
index d044908..fb1e06a 100644
--- a/tests/Makefile
+++ b/tests/Makefile
@@ -29,6 +29,7 @@ check-unit-y += tests/test-visitor-serialization$(EXESUF)
check-unit-y += tests/test-iov$(EXESUF)
gcov-files-test-iov-y = util/iov.c
check-unit-y += tests/test-aio$(EXESUF)
+check-unit-y += tests/test-throttle$(EXESUF)
gcov-files-test-aio-$(CONFIG_WIN32) = aio-win32.c
gcov-files-test-aio-$(CONFIG_POSIX) = aio-posix.c
check-unit-y += tests/test-thread-pool$(EXESUF)
@@ -116,6 +117,7 @@ tests/check-qfloat$(EXESUF): tests/check-qfloat.o libqemuutil.a
tests/check-qjson$(EXESUF): tests/check-qjson.o libqemuutil.a libqemustub.a
tests/test-coroutine$(EXESUF): tests/test-coroutine.o $(block-obj-y) libqemuutil.a libqemustub.a
tests/test-aio$(EXESUF): tests/test-aio.o $(block-obj-y) libqemuutil.a libqemustub.a
+tests/test-throttle$(EXESUF): tests/test-throttle.o $(block-obj-y) libqemuutil.a libqemustub.a
tests/test-thread-pool$(EXESUF): tests/test-thread-pool.o $(block-obj-y) libqemuutil.a libqemustub.a
tests/test-iov$(EXESUF): tests/test-iov.o libqemuutil.a
tests/test-hbitmap$(EXESUF): tests/test-hbitmap.o libqemuutil.a libqemustub.a
diff --git a/tests/test-throttle.c b/tests/test-throttle.c
new file mode 100644
index 0000000..c687f5f
--- /dev/null
+++ b/tests/test-throttle.c
@@ -0,0 +1,494 @@
+/*
+ * Throttle infrastructure tests
+ *
+ * Copyright Nodalink, SARL. 2013
+ *
+ * Authors:
+ * Benoît Canet <benoit.canet@irqsave.net>
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ */
+
+#include <glib.h>
+#include <math.h>
+#include "qemu/throttle.h"
+
+LeakyBucket bkt;
+ThrottleConfig cfg;
+ThrottleState ts;
+
+/* usefull function */
+static bool double_cmp(double x, double y)
+{
+ return fabsl(x - y) < 1e-6;
+}
+
+/* tests for single bucket operations */
+static void test_leak_bucket(void)
+{
+ /* set initial value */
+ bkt.ups = 150;
+ bkt.max = 15;
+ bkt.bucket = 1.5;
+
+ /* leak an op work of time */
+ throttle_leak_bucket(&bkt, NANOSECONDS_PER_SECOND / 150);
+ g_assert(bkt.ups == 150);
+ g_assert(bkt.max == 15);
+ g_assert(double_cmp(bkt.bucket, 0.5));
+
+ /* leak again emptying the bucket */
+ throttle_leak_bucket(&bkt, NANOSECONDS_PER_SECOND / 150);
+ g_assert(bkt.ups == 150);
+ g_assert(bkt.max == 15);
+ g_assert(double_cmp(bkt.bucket, 0));
+
+ /* check that the bucket level won't go lower */
+ throttle_leak_bucket(&bkt, NANOSECONDS_PER_SECOND / 150);
+ g_assert(bkt.ups == 150);
+ g_assert(bkt.max == 15);
+ g_assert(double_cmp(bkt.bucket, 0));
+}
+
+static void test_compute_wait(void)
+{
+ int64_t wait;
+ int64_t result;
+
+ /* no operation limit set */
+ bkt.ups = 0;
+ bkt.max = 15;
+ bkt.bucket = 1.5;
+ wait = throttle_compute_wait(&bkt);
+ g_assert(!wait);
+
+ /* zero delta */
+ bkt.ups = 150;
+ bkt.max = 15;
+ bkt.bucket = 15;
+ wait = throttle_compute_wait(&bkt);
+ g_assert(!wait);
+
+ /* below zero delta */
+ bkt.ups = 150;
+ bkt.max = 15;
+ bkt.bucket = 9;
+ wait = throttle_compute_wait(&bkt);
+ g_assert(!wait);
+
+ /* half an operation above max */
+ bkt.ups = 150;
+ bkt.max = 15;
+ bkt.bucket = 15.5;
+ wait = throttle_compute_wait(&bkt);
+ /* time required to do half an operation */
+ result = (int64_t) NANOSECONDS_PER_SECOND / 150 / 2;
+ g_assert(wait == result);
+}
+
+/* functions to test ThrottleState initialization/destroy methods */
+static void read_timer_cb(void *opaque)
+{
+}
+
+static void write_timer_cb(void *opaque)
+{
+}
+
+static void test_init(void)
+{
+ int i;
+
+ /* fill the structure with crap */
+ memset(&ts, 1, sizeof(ts));
+
+ /* init the structure */
+ throttle_init(&ts, vm_clock, read_timer_cb, write_timer_cb, &ts);
+
+ /* check initialized fields */
+ g_assert(ts.clock == vm_clock);
+ g_assert(ts.timers[0]);
+ g_assert(ts.timers[1]);
+
+ /* check other fields where cleared */
+ g_assert(!ts.previous_leak);
+ g_assert(!ts.cfg.op_size);
+ g_assert(!ts.cfg.unit_size);
+ for (i = 0; i < BUCKETS_COUNT; i++) {
+ g_assert(!ts.cfg.buckets[i].ups);
+ g_assert(!ts.cfg.buckets[i].max);
+ g_assert(!ts.cfg.buckets[i].bucket);
+ }
+
+ throttle_destroy(&ts);
+}
+
+static void test_destroy(void)
+{
+ int i;
+ throttle_init(&ts, vm_clock, read_timer_cb, write_timer_cb, &ts);
+ throttle_destroy(&ts);
+ for (i = 0; i < 2; i++) {
+ g_assert(!ts.timers[i]);
+ }
+}
+
+/* function to test throttle_config and throttle_get_config */
+static void test_config_functions(void)
+{
+ int i;
+ ThrottleConfig orig_cfg, final_cfg;
+
+ orig_cfg.buckets[THROTTLE_BPS_TOTAL].ups = 153;
+ orig_cfg.buckets[THROTTLE_BPS_READ].ups = 56;
+ orig_cfg.buckets[THROTTLE_BPS_WRITE].ups = 1;
+
+ orig_cfg.buckets[THROTTLE_OPS_TOTAL].ups = 150;
+ orig_cfg.buckets[THROTTLE_OPS_READ].ups = 69;
+ orig_cfg.buckets[THROTTLE_OPS_WRITE].ups = 23;
+
+ orig_cfg.buckets[THROTTLE_BPS_TOTAL].max = 0; /* should be corrected */
+ orig_cfg.buckets[THROTTLE_BPS_READ].max = 1; /* should be corrected */
+ orig_cfg.buckets[THROTTLE_BPS_WRITE].max = 120;
+
+ orig_cfg.buckets[THROTTLE_OPS_TOTAL].max = 150;
+ orig_cfg.buckets[THROTTLE_OPS_READ].max = 400;
+ orig_cfg.buckets[THROTTLE_OPS_WRITE].max = 500;
+
+ orig_cfg.buckets[THROTTLE_BPS_TOTAL].bucket = 45;
+ orig_cfg.buckets[THROTTLE_BPS_READ].bucket = 65;
+ orig_cfg.buckets[THROTTLE_BPS_WRITE].bucket = 23;
+
+ orig_cfg.buckets[THROTTLE_OPS_TOTAL].bucket = 1;
+ orig_cfg.buckets[THROTTLE_OPS_READ].bucket = 90;
+ orig_cfg.buckets[THROTTLE_OPS_WRITE].bucket = 75;
+
+ orig_cfg.unit_size = 333;
+ orig_cfg.op_size = 1;
+
+ throttle_init(&ts, vm_clock, read_timer_cb, write_timer_cb, &ts);
+ /* structure reset by throttle_init previous_leak should be null */
+ g_assert(!ts.previous_leak);
+ throttle_config(&ts, &orig_cfg);
+
+ /* has previous leak been initialized by throttle_config ? */
+ g_assert(ts.previous_leak);
+
+ /* get back the fixed configuration */
+ throttle_get_config(&ts, &final_cfg);
+ throttle_destroy(&ts);
+
+ g_assert(final_cfg.buckets[THROTTLE_BPS_TOTAL].ups == 153);
+ g_assert(final_cfg.buckets[THROTTLE_BPS_READ].ups == 56);
+ g_assert(final_cfg.buckets[THROTTLE_BPS_WRITE].ups == 1);
+
+ g_assert(final_cfg.buckets[THROTTLE_OPS_TOTAL].ups == 150);
+ g_assert(final_cfg.buckets[THROTTLE_OPS_READ].ups == 69);
+ g_assert(final_cfg.buckets[THROTTLE_OPS_WRITE].ups == 23);
+
+ g_assert(final_cfg.buckets[THROTTLE_BPS_TOTAL].max == 15.3); /* fixed */
+ g_assert(final_cfg.buckets[THROTTLE_BPS_READ].max == 5.6); /* fixed */
+ g_assert(final_cfg.buckets[THROTTLE_BPS_WRITE].max == 120);
+
+ g_assert(final_cfg.buckets[THROTTLE_OPS_TOTAL].max == 150);
+ g_assert(final_cfg.buckets[THROTTLE_OPS_READ].max == 400);
+ g_assert(final_cfg.buckets[THROTTLE_OPS_WRITE].max == 500);
+
+ g_assert(final_cfg.unit_size == 333);
+ g_assert(final_cfg.op_size == 1);
+
+ /* check bucket have been cleared */
+ for (i = 0; i < BUCKETS_COUNT; i++) {
+ g_assert(!final_cfg.buckets[i].bucket);
+ }
+
+
+}
+
+/* functions to test is throttle is enabled by a config */
+static void set_cfg_value(bool is_max, int index, int value)
+{
+ if (is_max) {
+ cfg.buckets[index].max = value;
+ } else {
+ cfg.buckets[index].ups = value;
+ }
+}
+
+static void test_enabled(void)
+{
+ int i;
+
+ memset(&cfg, 0, sizeof(cfg));
+ g_assert(!throttle_enabled(&cfg));
+
+ for (i = 0; i < BUCKETS_COUNT; i++) {
+ memset(&cfg, 0, sizeof(cfg));
+ set_cfg_value(false, i, 150);
+ g_assert(throttle_enabled(&cfg));
+ }
+
+ for (i = 0; i < BUCKETS_COUNT; i++) {
+ memset(&cfg, 0, sizeof(cfg));
+ set_cfg_value(false, i, -150);
+ g_assert(!throttle_enabled(&cfg));
+ }
+}
+
+/* tests functions for throttle_conflicting */
+
+static void test_conflicts_for_one_set(bool is_max,
+ int total,
+ int read,
+ int write)
+{
+ memset(&cfg, 0, sizeof(cfg));
+ g_assert(!throttle_conflicting(&cfg));
+
+ set_cfg_value(is_max, total, 1);
+ set_cfg_value(is_max, read, 1);
+ g_assert(throttle_conflicting(&cfg));
+
+ memset(&cfg, 0, sizeof(cfg));
+ set_cfg_value(is_max, total, 1);
+ set_cfg_value(is_max, write, 1);
+ g_assert(throttle_conflicting(&cfg));
+
+ memset(&cfg, 0, sizeof(cfg));
+ set_cfg_value(is_max, total, 1);
+ set_cfg_value(is_max, read, 1);
+ set_cfg_value(is_max, write, 1);
+ g_assert(throttle_conflicting(&cfg));
+
+ memset(&cfg, 0, sizeof(cfg));
+ set_cfg_value(is_max, total, 1);
+ g_assert(!throttle_conflicting(&cfg));
+
+ memset(&cfg, 0, sizeof(cfg));
+ set_cfg_value(is_max, read, 1);
+ set_cfg_value(is_max, write, 1);
+ g_assert(!throttle_conflicting(&cfg));
+}
+
+static void test_conflicting_config(void)
+{
+ /* bps average conflicts */
+ test_conflicts_for_one_set(false,
+ THROTTLE_BPS_TOTAL,
+ THROTTLE_BPS_READ,
+ THROTTLE_BPS_WRITE);
+
+ /* ops average conflicts */
+ test_conflicts_for_one_set(false,
+ THROTTLE_OPS_TOTAL,
+ THROTTLE_OPS_READ,
+ THROTTLE_OPS_WRITE);
+
+ /* bps average conflicts */
+ test_conflicts_for_one_set(true,
+ THROTTLE_BPS_TOTAL,
+ THROTTLE_BPS_READ,
+ THROTTLE_BPS_WRITE);
+ /* ops average conflicts */
+ test_conflicts_for_one_set(true,
+ THROTTLE_OPS_TOTAL,
+ THROTTLE_OPS_READ,
+ THROTTLE_OPS_WRITE);
+}
+/* functions to test the throttle_is_valid function */
+static void test_is_valid_for_value(int value, bool should_be_valid)
+{
+ int is_max, index;
+ for (is_max = 0; is_max < 2; is_max++) {
+ for (index = 0; index < BUCKETS_COUNT; index++) {
+ memset(&cfg, 0, sizeof(cfg));
+ set_cfg_value(is_max, index, value);
+ g_assert(throttle_is_valid(&cfg) == should_be_valid);
+ }
+ }
+}
+
+static void test_is_valid(void)
+{
+ /* negative number are invalid */
+ test_is_valid_for_value(-1, false);
+ /* zero are valids */
+ test_is_valid_for_value(0, true);
+ /* positives numers are valids */
+ test_is_valid_for_value(1, true);
+}
+
+static void test_have_timer(void)
+{
+ /* zero the structure */
+ memset(&ts, 0, sizeof(ts));
+
+ /* no timer set shoudl return false */
+ g_assert(!throttle_have_timer(&ts));
+
+ /* init the structure */
+ throttle_init(&ts, vm_clock, read_timer_cb, write_timer_cb, &ts);
+
+ /* timer set by init should return true */
+ g_assert(throttle_have_timer(&ts));
+
+ throttle_destroy(&ts);
+}
+
+static bool do_test_accounting(bool is_ops, /* are we testing bps or ops */
+ int size, /* size of the operation to do */
+ double ups, /* io limit */
+ uint64_t unit_size, /* the byte size of the unit */
+ uint64_t op_size, /* ideal size of an io */
+ double total_result,
+ double read_result,
+ double write_result)
+{
+ BucketType to_test[2][3] = { { THROTTLE_BPS_TOTAL,
+ THROTTLE_BPS_READ,
+ THROTTLE_BPS_WRITE, },
+ { THROTTLE_OPS_TOTAL,
+ THROTTLE_OPS_READ,
+ THROTTLE_OPS_WRITE, } };
+ ThrottleConfig cfg;
+ BucketType index;
+ int i;
+
+ for (i = 0; i < 3; i++) {
+ BucketType index = to_test[is_ops][i];
+ cfg.buckets[index].ups = ups;
+ }
+
+ cfg.unit_size = unit_size;
+ cfg.op_size = op_size;
+
+ throttle_init(&ts, vm_clock, read_timer_cb, write_timer_cb, &ts);
+ throttle_config(&ts, &cfg);
+
+ /* account a read */
+ throttle_account(&ts, false, size);
+ /* account a write */
+ throttle_account(&ts, true, size);
+
+ /* check total result */
+ index = to_test[is_ops][0];
+ if (!double_cmp(ts.cfg.buckets[index].bucket, total_result)) {
+ return false;
+ }
+
+ /* check read result */
+ index = to_test[is_ops][1];
+ if (!double_cmp(ts.cfg.buckets[index].bucket, read_result)) {
+ return false;
+ }
+
+ /* check write result */
+ index = to_test[is_ops][2];
+ if (!double_cmp(ts.cfg.buckets[index].bucket, write_result)) {
+ return false;
+ }
+
+ throttle_destroy(&ts);
+
+ return true;
+}
+
+static void test_accounting(void)
+{
+ /* tests for bps */
+
+ /* op of size 1 */
+ g_assert(do_test_accounting(false,
+ 1,
+ 150,
+ 512,
+ 0,
+ 1024,
+ 512,
+ 512));
+
+ /* op of size 2 */
+ g_assert(do_test_accounting(false,
+ 2,
+ 150,
+ 512,
+ 0,
+ 2048,
+ 1024,
+ 1024));
+
+ /* op of size 2 and orthogonal parameter change */
+ g_assert(do_test_accounting(false,
+ 2,
+ 150,
+ 512,
+ 17,
+ 2048,
+ 1024,
+ 1024));
+
+
+ /* tests for ops */
+
+ /* op of size 1 */
+ g_assert(do_test_accounting(true,
+ 1,
+ 150,
+ 512,
+ 0,
+ 2,
+ 1,
+ 1));
+
+ /* op of size 2 */
+ g_assert(do_test_accounting(true,
+ 2,
+ 150,
+ 512,
+ 0,
+ 2,
+ 1,
+ 1));
+
+ /* jumbo op accounting fragmentation : size 64 with op size of 13 units */
+ g_assert(do_test_accounting(true,
+ 64,
+ 150,
+ 512,
+ 13,
+ (64.0 * 2) / 13,
+ (64.0 / 13),
+ (64.0 / 13)));
+
+ /* same with orthogonal parameters changes */
+ g_assert(do_test_accounting(true,
+ 64,
+ 300,
+ 3433,
+ 13,
+ (64.0 * 2) / 13,
+ (64.0 / 13),
+ (64.0 / 13)));
+}
+
+int main(int argc, char **argv)
+{
+ init_clocks();
+ do {} while (g_main_context_iteration(NULL, false));
+
+ /* tests in the same order as the header function declarations */
+ g_test_init(&argc, &argv, NULL);
+ g_test_add_func("/throttle/leak_bucket", test_leak_bucket);
+ g_test_add_func("/throttle/compute_wait", test_compute_wait);
+ g_test_add_func("/throttle/init", test_init);
+ g_test_add_func("/throttle/destroy", test_destroy);
+ g_test_add_func("/throttle/have_timer", test_have_timer);
+ g_test_add_func("/throttle/config/enabled", test_enabled);
+ g_test_add_func("/throttle/config/conflicting", test_conflicting_config);
+ g_test_add_func("/throttle/config/is_valid", test_is_valid);
+ g_test_add_func("/throttle/config_functions", test_config_functions);
+ g_test_add_func("/throttle/accounting", test_accounting);
+ return g_test_run();
+}
+
--
1.7.10.4
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [Qemu-devel] [PATCH V5 3/5] block: Enable the new throttling code in the block layer.
2013-08-12 16:53 [Qemu-devel] [PATCH V5 0/5] Continuous Leaky Bucket Throttling Benoît Canet
2013-08-12 16:53 ` [Qemu-devel] [PATCH V5 1/5] throttle: Add a new throttling API implementing continuous leaky bucket Benoît Canet
2013-08-12 16:53 ` [Qemu-devel] [PATCH V5 2/5] throttle: Add units tests Benoît Canet
@ 2013-08-12 16:53 ` Benoît Canet
2013-08-14 9:31 ` Fam Zheng
2013-08-16 12:02 ` Stefan Hajnoczi
2013-08-12 16:53 ` [Qemu-devel] [PATCH V5 4/5] block: Add support for throttling burst max in QMP and the command line Benoît Canet
` (2 subsequent siblings)
5 siblings, 2 replies; 17+ messages in thread
From: Benoît Canet @ 2013-08-12 16:53 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha
Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
block.c | 349 ++++++++++++++-------------------------------
block/qapi.c | 21 ++-
blockdev.c | 100 +++++++------
include/block/block.h | 1 -
include/block/block_int.h | 32 +----
5 files changed, 173 insertions(+), 330 deletions(-)
diff --git a/block.c b/block.c
index 01b66d8..d49bc98 100644
--- a/block.c
+++ b/block.c
@@ -86,13 +86,6 @@ static void coroutine_fn bdrv_co_do_rw(void *opaque);
static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs,
int64_t sector_num, int nb_sectors);
-static bool bdrv_exceed_bps_limits(BlockDriverState *bs, int nb_sectors,
- bool is_write, double elapsed_time, uint64_t *wait);
-static bool bdrv_exceed_iops_limits(BlockDriverState *bs, bool is_write,
- double elapsed_time, uint64_t *wait);
-static bool bdrv_exceed_io_limits(BlockDriverState *bs, int nb_sectors,
- bool is_write, int64_t *wait);
-
static QTAILQ_HEAD(, BlockDriverState) bdrv_states =
QTAILQ_HEAD_INITIALIZER(bdrv_states);
@@ -123,70 +116,110 @@ int is_windows_drive(const char *filename)
#endif
/* throttling disk I/O limits */
-void bdrv_io_limits_disable(BlockDriverState *bs)
+void bdrv_set_io_limits(BlockDriverState *bs,
+ ThrottleConfig *cfg)
{
- bs->io_limits_enabled = false;
+ int i;
+
+ throttle_config(&bs->throttle_state, cfg);
+
+ for (i = 0; i < 2; i++) {
+ qemu_co_enter_next(&bs->throttled_reqs[i]);
+ }
+}
+
+/* this function drain all the throttled IOs */
+static bool bdrv_drain_throttled(BlockDriverState *bs)
+{
+ bool drained = false;
+ bool enabled = bs->io_limits_enabled;
+ int i;
- do {} while (qemu_co_enter_next(&bs->throttled_reqs));
+ bs->io_limits_enabled = false;
- if (bs->block_timer) {
- qemu_del_timer(bs->block_timer);
- qemu_free_timer(bs->block_timer);
- bs->block_timer = NULL;
+ for (i = 0; i < 2; i++) {
+ while (qemu_co_enter_next(&bs->throttled_reqs[i])) {
+ drained = true;
+ }
}
- bs->slice_start = 0;
- bs->slice_end = 0;
+ bs->io_limits_enabled = enabled;
+
+ return drained;
}
-static void bdrv_block_timer(void *opaque)
+void bdrv_io_limits_disable(BlockDriverState *bs)
+{
+ bs->io_limits_enabled = false;
+
+ bdrv_drain_throttled(bs);
+
+ throttle_destroy(&bs->throttle_state);
+}
+
+static void bdrv_throttle_read_timer_cb(void *opaque)
{
BlockDriverState *bs = opaque;
+ qemu_co_enter_next(&bs->throttled_reqs[0]);
+}
- qemu_co_enter_next(&bs->throttled_reqs);
+static void bdrv_throttle_write_timer_cb(void *opaque)
+{
+ BlockDriverState *bs = opaque;
+ qemu_co_enter_next(&bs->throttled_reqs[1]);
}
+/* should be called before bdrv_set_io_limits if a limit is set */
void bdrv_io_limits_enable(BlockDriverState *bs)
{
- qemu_co_queue_init(&bs->throttled_reqs);
- bs->block_timer = qemu_new_timer_ns(vm_clock, bdrv_block_timer, bs);
+ throttle_init(&bs->throttle_state,
+ vm_clock,
+ bdrv_throttle_read_timer_cb,
+ bdrv_throttle_write_timer_cb,
+ bs);
+ qemu_co_queue_init(&bs->throttled_reqs[0]);
+ qemu_co_queue_init(&bs->throttled_reqs[1]);
bs->io_limits_enabled = true;
}
-bool bdrv_io_limits_enabled(BlockDriverState *bs)
+/* This function make an IO wait if needed
+ *
+ * @is_write: is the IO a write
+ * @nb_sectors: the number of sectors of the IO
+ */
+static void bdrv_io_limits_intercept(BlockDriverState *bs,
+ bool is_write,
+ int nb_sectors)
{
- BlockIOLimit *io_limits = &bs->io_limits;
- return io_limits->bps[BLOCK_IO_LIMIT_READ]
- || io_limits->bps[BLOCK_IO_LIMIT_WRITE]
- || io_limits->bps[BLOCK_IO_LIMIT_TOTAL]
- || io_limits->iops[BLOCK_IO_LIMIT_READ]
- || io_limits->iops[BLOCK_IO_LIMIT_WRITE]
- || io_limits->iops[BLOCK_IO_LIMIT_TOTAL];
+ /* does this io must wait */
+ bool must_wait = !throttle_allowed(&bs->throttle_state, is_write);
+
+ /* if must wait or any request of this type throttled queue the IO */
+ if (must_wait ||
+ !qemu_co_queue_empty(&bs->throttled_reqs[is_write])) {
+ qemu_co_queue_wait(&bs->throttled_reqs[is_write]);
+ }
+
+ /* the IO will be executed do the accounting */
+ throttle_account(&bs->throttle_state, is_write, nb_sectors);
}
-static void bdrv_io_limits_intercept(BlockDriverState *bs,
- bool is_write, int nb_sectors)
+/* This function will schedule the next IO throttled IO of this type if needed
+ *
+ * @is_write: is the IO a write
+ */
+static void bdrv_io_limits_resched(BlockDriverState *bs, bool is_write)
{
- int64_t wait_time = -1;
- if (!qemu_co_queue_empty(&bs->throttled_reqs)) {
- qemu_co_queue_wait(&bs->throttled_reqs);
+ if (qemu_co_queue_empty(&bs->throttled_reqs[is_write])) {
+ return;
}
- /* In fact, we hope to keep each request's timing, in FIFO mode. The next
- * throttled requests will not be dequeued until the current request is
- * allowed to be serviced. So if the current request still exceeds the
- * limits, it will be inserted to the head. All requests followed it will
- * be still in throttled_reqs queue.
- */
-
- while (bdrv_exceed_io_limits(bs, nb_sectors, is_write, &wait_time)) {
- qemu_mod_timer(bs->block_timer,
- wait_time + qemu_get_clock_ns(vm_clock));
- qemu_co_queue_wait_insert_head(&bs->throttled_reqs);
+ if (!throttle_allowed(&bs->throttle_state, is_write)) {
+ return;
}
- qemu_co_queue_next(&bs->throttled_reqs);
+ qemu_co_queue_next(&bs->throttled_reqs[is_write]);
}
/* check if the path starts with "<protocol>:" */
@@ -1112,11 +1145,6 @@ int bdrv_open(BlockDriverState *bs, const char *filename, QDict *options,
bdrv_dev_change_media_cb(bs, true);
}
- /* throttling disk I/O limits */
- if (bs->io_limits_enabled) {
- bdrv_io_limits_enable(bs);
- }
-
return 0;
unlink_and_fail:
@@ -1452,7 +1480,7 @@ void bdrv_drain_all(void)
* a busy wait.
*/
QTAILQ_FOREACH(bs, &bdrv_states, list) {
- while (qemu_co_enter_next(&bs->throttled_reqs)) {
+ if (bdrv_drain_throttled(bs)) {
busy = true;
}
}
@@ -1461,7 +1489,8 @@ void bdrv_drain_all(void)
/* If requests are still pending there is a bug somewhere */
QTAILQ_FOREACH(bs, &bdrv_states, list) {
assert(QLIST_EMPTY(&bs->tracked_requests));
- assert(qemu_co_queue_empty(&bs->throttled_reqs));
+ assert(qemu_co_queue_empty(&bs->throttled_reqs[0]));
+ assert(qemu_co_queue_empty(&bs->throttled_reqs[1]));
}
}
@@ -1497,13 +1526,12 @@ static void bdrv_move_feature_fields(BlockDriverState *bs_dest,
bs_dest->enable_write_cache = bs_src->enable_write_cache;
- /* i/o timing parameters */
- bs_dest->slice_start = bs_src->slice_start;
- bs_dest->slice_end = bs_src->slice_end;
- bs_dest->slice_submitted = bs_src->slice_submitted;
- bs_dest->io_limits = bs_src->io_limits;
- bs_dest->throttled_reqs = bs_src->throttled_reqs;
- bs_dest->block_timer = bs_src->block_timer;
+ /* i/o throttled req */
+ memcpy(&bs_dest->throttle_state,
+ &bs_src->throttle_state,
+ sizeof(ThrottleState));
+ bs_dest->throttled_reqs[0] = bs_src->throttled_reqs[0];
+ bs_dest->throttled_reqs[1] = bs_src->throttled_reqs[1];
bs_dest->io_limits_enabled = bs_src->io_limits_enabled;
/* r/w error */
@@ -1550,7 +1578,7 @@ void bdrv_swap(BlockDriverState *bs_new, BlockDriverState *bs_old)
assert(bs_new->dev == NULL);
assert(bs_new->in_use == 0);
assert(bs_new->io_limits_enabled == false);
- assert(bs_new->block_timer == NULL);
+ assert(!throttle_have_timer(&bs_new->throttle_state));
tmp = *bs_new;
*bs_new = *bs_old;
@@ -1569,7 +1597,7 @@ void bdrv_swap(BlockDriverState *bs_new, BlockDriverState *bs_old)
assert(bs_new->job == NULL);
assert(bs_new->in_use == 0);
assert(bs_new->io_limits_enabled == false);
- assert(bs_new->block_timer == NULL);
+ assert(!throttle_have_timer(&bs_new->throttle_state));
bdrv_rebind(bs_new);
bdrv_rebind(bs_old);
@@ -1860,8 +1888,15 @@ int bdrv_commit_all(void)
*
* This function should be called when a tracked request is completing.
*/
-static void tracked_request_end(BdrvTrackedRequest *req)
+static void tracked_request_end(BlockDriverState *bs,
+ BdrvTrackedRequest *req,
+ bool is_write)
{
+ /* throttling disk I/O */
+ if (bs->io_limits_enabled) {
+ bdrv_io_limits_resched(bs, is_write);
+ }
+
QLIST_REMOVE(req, list);
qemu_co_queue_restart_all(&req->wait_queue);
}
@@ -1874,6 +1909,11 @@ static void tracked_request_begin(BdrvTrackedRequest *req,
int64_t sector_num,
int nb_sectors, bool is_write)
{
+ /* throttling disk I/O */
+ if (bs->io_limits_enabled) {
+ bdrv_io_limits_intercept(bs, is_write, nb_sectors);
+ }
+
*req = (BdrvTrackedRequest){
.bs = bs,
.sector_num = sector_num,
@@ -2512,11 +2552,6 @@ static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs,
return -EIO;
}
- /* throttling disk read I/O */
- if (bs->io_limits_enabled) {
- bdrv_io_limits_intercept(bs, false, nb_sectors);
- }
-
if (bs->copy_on_read) {
flags |= BDRV_REQ_COPY_ON_READ;
}
@@ -2547,7 +2582,7 @@ static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs,
ret = drv->bdrv_co_readv(bs, sector_num, nb_sectors, qiov);
out:
- tracked_request_end(&req);
+ tracked_request_end(bs, &req, false);
if (flags & BDRV_REQ_COPY_ON_READ) {
bs->copy_on_read_in_flight--;
@@ -2625,11 +2660,6 @@ static int coroutine_fn bdrv_co_do_writev(BlockDriverState *bs,
return -EIO;
}
- /* throttling disk write I/O */
- if (bs->io_limits_enabled) {
- bdrv_io_limits_intercept(bs, true, nb_sectors);
- }
-
if (bs->copy_on_read_in_flight) {
wait_for_overlapping_requests(bs, sector_num, nb_sectors);
}
@@ -2658,7 +2688,7 @@ static int coroutine_fn bdrv_co_do_writev(BlockDriverState *bs,
bs->wr_highest_sector = sector_num + nb_sectors - 1;
}
- tracked_request_end(&req);
+ tracked_request_end(bs, &req, true);
return ret;
}
@@ -2751,14 +2781,6 @@ void bdrv_get_geometry(BlockDriverState *bs, uint64_t *nb_sectors_ptr)
*nb_sectors_ptr = length;
}
-/* throttling disk io limits */
-void bdrv_set_io_limits(BlockDriverState *bs,
- BlockIOLimit *io_limits)
-{
- bs->io_limits = *io_limits;
- bs->io_limits_enabled = bdrv_io_limits_enabled(bs);
-}
-
void bdrv_set_on_error(BlockDriverState *bs, BlockdevOnError on_read_error,
BlockdevOnError on_write_error)
{
@@ -3568,169 +3590,6 @@ void bdrv_aio_cancel(BlockDriverAIOCB *acb)
acb->aiocb_info->cancel(acb);
}
-/* block I/O throttling */
-static bool bdrv_exceed_bps_limits(BlockDriverState *bs, int nb_sectors,
- bool is_write, double elapsed_time, uint64_t *wait)
-{
- uint64_t bps_limit = 0;
- uint64_t extension;
- double bytes_limit, bytes_base, bytes_res;
- double slice_time, wait_time;
-
- if (bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL]) {
- bps_limit = bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL];
- } else if (bs->io_limits.bps[is_write]) {
- bps_limit = bs->io_limits.bps[is_write];
- } else {
- if (wait) {
- *wait = 0;
- }
-
- return false;
- }
-
- slice_time = bs->slice_end - bs->slice_start;
- slice_time /= (NANOSECONDS_PER_SECOND);
- bytes_limit = bps_limit * slice_time;
- bytes_base = bs->slice_submitted.bytes[is_write];
- if (bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL]) {
- bytes_base += bs->slice_submitted.bytes[!is_write];
- }
-
- /* bytes_base: the bytes of data which have been read/written; and
- * it is obtained from the history statistic info.
- * bytes_res: the remaining bytes of data which need to be read/written.
- * (bytes_base + bytes_res) / bps_limit: used to calcuate
- * the total time for completing reading/writting all data.
- */
- bytes_res = (unsigned) nb_sectors * BDRV_SECTOR_SIZE;
-
- if (bytes_base + bytes_res <= bytes_limit) {
- if (wait) {
- *wait = 0;
- }
-
- return false;
- }
-
- /* Calc approx time to dispatch */
- wait_time = (bytes_base + bytes_res) / bps_limit - elapsed_time;
-
- /* When the I/O rate at runtime exceeds the limits,
- * bs->slice_end need to be extended in order that the current statistic
- * info can be kept until the timer fire, so it is increased and tuned
- * based on the result of experiment.
- */
- extension = wait_time * NANOSECONDS_PER_SECOND;
- extension = DIV_ROUND_UP(extension, BLOCK_IO_SLICE_TIME) *
- BLOCK_IO_SLICE_TIME;
- bs->slice_end += extension;
- if (wait) {
- *wait = wait_time * NANOSECONDS_PER_SECOND;
- }
-
- return true;
-}
-
-static bool bdrv_exceed_iops_limits(BlockDriverState *bs, bool is_write,
- double elapsed_time, uint64_t *wait)
-{
- uint64_t iops_limit = 0;
- double ios_limit, ios_base;
- double slice_time, wait_time;
-
- if (bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL]) {
- iops_limit = bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL];
- } else if (bs->io_limits.iops[is_write]) {
- iops_limit = bs->io_limits.iops[is_write];
- } else {
- if (wait) {
- *wait = 0;
- }
-
- return false;
- }
-
- slice_time = bs->slice_end - bs->slice_start;
- slice_time /= (NANOSECONDS_PER_SECOND);
- ios_limit = iops_limit * slice_time;
- ios_base = bs->slice_submitted.ios[is_write];
- if (bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL]) {
- ios_base += bs->slice_submitted.ios[!is_write];
- }
-
- if (ios_base + 1 <= ios_limit) {
- if (wait) {
- *wait = 0;
- }
-
- return false;
- }
-
- /* Calc approx time to dispatch, in seconds */
- wait_time = (ios_base + 1) / iops_limit;
- if (wait_time > elapsed_time) {
- wait_time = wait_time - elapsed_time;
- } else {
- wait_time = 0;
- }
-
- /* Exceeded current slice, extend it by another slice time */
- bs->slice_end += BLOCK_IO_SLICE_TIME;
- if (wait) {
- *wait = wait_time * NANOSECONDS_PER_SECOND;
- }
-
- return true;
-}
-
-static bool bdrv_exceed_io_limits(BlockDriverState *bs, int nb_sectors,
- bool is_write, int64_t *wait)
-{
- int64_t now, max_wait;
- uint64_t bps_wait = 0, iops_wait = 0;
- double elapsed_time;
- int bps_ret, iops_ret;
-
- now = qemu_get_clock_ns(vm_clock);
- if (now > bs->slice_end) {
- bs->slice_start = now;
- bs->slice_end = now + BLOCK_IO_SLICE_TIME;
- memset(&bs->slice_submitted, 0, sizeof(bs->slice_submitted));
- }
-
- elapsed_time = now - bs->slice_start;
- elapsed_time /= (NANOSECONDS_PER_SECOND);
-
- bps_ret = bdrv_exceed_bps_limits(bs, nb_sectors,
- is_write, elapsed_time, &bps_wait);
- iops_ret = bdrv_exceed_iops_limits(bs, is_write,
- elapsed_time, &iops_wait);
- if (bps_ret || iops_ret) {
- max_wait = bps_wait > iops_wait ? bps_wait : iops_wait;
- if (wait) {
- *wait = max_wait;
- }
-
- now = qemu_get_clock_ns(vm_clock);
- if (bs->slice_end < now + max_wait) {
- bs->slice_end = now + max_wait;
- }
-
- return true;
- }
-
- if (wait) {
- *wait = 0;
- }
-
- bs->slice_submitted.bytes[is_write] += (int64_t)nb_sectors *
- BDRV_SECTOR_SIZE;
- bs->slice_submitted.ios[is_write]++;
-
- return false;
-}
-
/**************************************************************/
/* async block device emulation */
diff --git a/block/qapi.c b/block/qapi.c
index a4bc411..45f806b 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -223,18 +223,15 @@ void bdrv_query_info(BlockDriverState *bs,
info->inserted->backing_file_depth = bdrv_get_backing_file_depth(bs);
if (bs->io_limits_enabled) {
- info->inserted->bps =
- bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL];
- info->inserted->bps_rd =
- bs->io_limits.bps[BLOCK_IO_LIMIT_READ];
- info->inserted->bps_wr =
- bs->io_limits.bps[BLOCK_IO_LIMIT_WRITE];
- info->inserted->iops =
- bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL];
- info->inserted->iops_rd =
- bs->io_limits.iops[BLOCK_IO_LIMIT_READ];
- info->inserted->iops_wr =
- bs->io_limits.iops[BLOCK_IO_LIMIT_WRITE];
+ ThrottleConfig cfg;
+ throttle_get_config(&bs->throttle_state, &cfg);
+ info->inserted->bps = cfg.buckets[THROTTLE_BPS_TOTAL].ups;
+ info->inserted->bps_rd = cfg.buckets[THROTTLE_BPS_READ].ups;
+ info->inserted->bps_wr = cfg.buckets[THROTTLE_BPS_WRITE].ups;
+
+ info->inserted->iops = cfg.buckets[THROTTLE_OPS_TOTAL].ups;
+ info->inserted->iops_rd = cfg.buckets[THROTTLE_OPS_READ].ups;
+ info->inserted->iops_wr = cfg.buckets[THROTTLE_OPS_WRITE].ups;
}
bs0 = bs;
diff --git a/blockdev.c b/blockdev.c
index 41b0a49..7559ea5 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -281,32 +281,16 @@ static int parse_block_error_action(const char *buf, bool is_read)
}
}
-static bool do_check_io_limits(BlockIOLimit *io_limits, Error **errp)
+static bool check_throttle_config(ThrottleConfig *cfg, Error **errp)
{
- bool bps_flag;
- bool iops_flag;
-
- assert(io_limits);
-
- bps_flag = (io_limits->bps[BLOCK_IO_LIMIT_TOTAL] != 0)
- && ((io_limits->bps[BLOCK_IO_LIMIT_READ] != 0)
- || (io_limits->bps[BLOCK_IO_LIMIT_WRITE] != 0));
- iops_flag = (io_limits->iops[BLOCK_IO_LIMIT_TOTAL] != 0)
- && ((io_limits->iops[BLOCK_IO_LIMIT_READ] != 0)
- || (io_limits->iops[BLOCK_IO_LIMIT_WRITE] != 0));
- if (bps_flag || iops_flag) {
- error_setg(errp, "bps(iops) and bps_rd/bps_wr(iops_rd/iops_wr) "
+ if (throttle_conflicting(cfg)) {
+ error_setg(errp, "bps/iops/max total values and read/write values"
"cannot be used at the same time");
return false;
}
- if (io_limits->bps[BLOCK_IO_LIMIT_TOTAL] < 0 ||
- io_limits->bps[BLOCK_IO_LIMIT_WRITE] < 0 ||
- io_limits->bps[BLOCK_IO_LIMIT_READ] < 0 ||
- io_limits->iops[BLOCK_IO_LIMIT_TOTAL] < 0 ||
- io_limits->iops[BLOCK_IO_LIMIT_WRITE] < 0 ||
- io_limits->iops[BLOCK_IO_LIMIT_READ] < 0) {
- error_setg(errp, "bps and iops values must be 0 or greater");
+ if (!throttle_is_valid(cfg)) {
+ error_setg(errp, "bps/iops/maxs values must be 0 or greater");
return false;
}
@@ -331,7 +315,7 @@ static DriveInfo *blockdev_init(QemuOpts *all_opts,
int on_read_error, on_write_error;
const char *devaddr;
DriveInfo *dinfo;
- BlockIOLimit io_limits;
+ ThrottleConfig cfg;
int snapshot = 0;
bool copy_on_read;
int ret;
@@ -489,20 +473,31 @@ static DriveInfo *blockdev_init(QemuOpts *all_opts,
}
/* disk I/O throttling */
- io_limits.bps[BLOCK_IO_LIMIT_TOTAL] =
+ cfg.buckets[THROTTLE_BPS_TOTAL].ups =
qemu_opt_get_number(opts, "throttling.bps-total", 0);
- io_limits.bps[BLOCK_IO_LIMIT_READ] =
+ cfg.buckets[THROTTLE_BPS_READ].ups =
qemu_opt_get_number(opts, "throttling.bps-read", 0);
- io_limits.bps[BLOCK_IO_LIMIT_WRITE] =
+ cfg.buckets[THROTTLE_BPS_WRITE].ups =
qemu_opt_get_number(opts, "throttling.bps-write", 0);
- io_limits.iops[BLOCK_IO_LIMIT_TOTAL] =
+ cfg.buckets[THROTTLE_OPS_TOTAL].ups =
qemu_opt_get_number(opts, "throttling.iops-total", 0);
- io_limits.iops[BLOCK_IO_LIMIT_READ] =
+ cfg.buckets[THROTTLE_OPS_READ].ups =
qemu_opt_get_number(opts, "throttling.iops-read", 0);
- io_limits.iops[BLOCK_IO_LIMIT_WRITE] =
+ cfg.buckets[THROTTLE_OPS_WRITE].ups =
qemu_opt_get_number(opts, "throttling.iops-write", 0);
- if (!do_check_io_limits(&io_limits, &error)) {
+ cfg.buckets[THROTTLE_BPS_TOTAL].max = 0;
+ cfg.buckets[THROTTLE_BPS_READ].max = 0;
+ cfg.buckets[THROTTLE_BPS_WRITE].max = 0;
+
+ cfg.buckets[THROTTLE_OPS_TOTAL].max = 0;
+ cfg.buckets[THROTTLE_OPS_READ].max = 0;
+ cfg.buckets[THROTTLE_OPS_WRITE].max = 0;
+
+ cfg.unit_size = BDRV_SECTOR_SIZE;
+ cfg.op_size = 0;
+
+ if (!check_throttle_config(&cfg, &error)) {
error_report("%s", error_get_pretty(error));
error_free(error);
return NULL;
@@ -629,7 +624,10 @@ static DriveInfo *blockdev_init(QemuOpts *all_opts,
bdrv_set_on_error(dinfo->bdrv, on_read_error, on_write_error);
/* disk I/O throttling */
- bdrv_set_io_limits(dinfo->bdrv, &io_limits);
+ if (throttle_enabled(&cfg)) {
+ bdrv_io_limits_enable(dinfo->bdrv);
+ bdrv_set_io_limits(dinfo->bdrv, &cfg);
+ }
switch(type) {
case IF_IDE:
@@ -1262,7 +1260,7 @@ void qmp_block_set_io_throttle(const char *device, int64_t bps, int64_t bps_rd,
int64_t bps_wr, int64_t iops, int64_t iops_rd,
int64_t iops_wr, Error **errp)
{
- BlockIOLimit io_limits;
+ ThrottleConfig cfg;
BlockDriverState *bs;
bs = bdrv_find(device);
@@ -1271,27 +1269,37 @@ void qmp_block_set_io_throttle(const char *device, int64_t bps, int64_t bps_rd,
return;
}
- io_limits.bps[BLOCK_IO_LIMIT_TOTAL] = bps;
- io_limits.bps[BLOCK_IO_LIMIT_READ] = bps_rd;
- io_limits.bps[BLOCK_IO_LIMIT_WRITE] = bps_wr;
- io_limits.iops[BLOCK_IO_LIMIT_TOTAL]= iops;
- io_limits.iops[BLOCK_IO_LIMIT_READ] = iops_rd;
- io_limits.iops[BLOCK_IO_LIMIT_WRITE]= iops_wr;
+ cfg.buckets[THROTTLE_BPS_TOTAL].ups = bps;
+ cfg.buckets[THROTTLE_BPS_READ].ups = bps_rd;
+ cfg.buckets[THROTTLE_BPS_WRITE].ups = bps_wr;
+
+ cfg.buckets[THROTTLE_OPS_TOTAL].ups = iops;
+ cfg.buckets[THROTTLE_OPS_READ].ups = iops_rd;
+ cfg.buckets[THROTTLE_OPS_WRITE].ups = iops_wr;
+
+ cfg.buckets[THROTTLE_BPS_TOTAL].max = 0;
+ cfg.buckets[THROTTLE_BPS_READ].max = 0;
+ cfg.buckets[THROTTLE_BPS_WRITE].max = 0;
+
+ cfg.buckets[THROTTLE_OPS_TOTAL].max = 0;
+ cfg.buckets[THROTTLE_OPS_READ].max = 0;
+ cfg.buckets[THROTTLE_OPS_WRITE].max = 0;
- if (!do_check_io_limits(&io_limits, errp)) {
+ cfg.unit_size = BDRV_SECTOR_SIZE;
+ cfg.op_size = 0;
+
+ if (!check_throttle_config(&cfg, errp)) {
return;
}
- bs->io_limits = io_limits;
-
- if (!bs->io_limits_enabled && bdrv_io_limits_enabled(bs)) {
+ if (!bs->io_limits_enabled && throttle_enabled(&cfg)) {
bdrv_io_limits_enable(bs);
- } else if (bs->io_limits_enabled && !bdrv_io_limits_enabled(bs)) {
+ } else if (bs->io_limits_enabled && !throttle_enabled(&cfg)) {
bdrv_io_limits_disable(bs);
- } else {
- if (bs->block_timer) {
- qemu_mod_timer(bs->block_timer, qemu_get_clock_ns(vm_clock));
- }
+ }
+
+ if (bs->io_limits_enabled) {
+ bdrv_set_io_limits(bs, &cfg);
}
}
diff --git a/include/block/block.h b/include/block/block.h
index 742fce5..b16d579 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -107,7 +107,6 @@ void bdrv_info_stats(Monitor *mon, QObject **ret_data);
/* disk I/O throttling */
void bdrv_io_limits_enable(BlockDriverState *bs);
void bdrv_io_limits_disable(BlockDriverState *bs);
-bool bdrv_io_limits_enabled(BlockDriverState *bs);
void bdrv_init(void);
void bdrv_init_with_whitelist(void);
diff --git a/include/block/block_int.h b/include/block/block_int.h
index e45f2a0..a0b6bc1 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -34,18 +34,12 @@
#include "monitor/monitor.h"
#include "qemu/hbitmap.h"
#include "block/snapshot.h"
+#include "qemu/throttle.h"
#define BLOCK_FLAG_ENCRYPT 1
#define BLOCK_FLAG_COMPAT6 4
#define BLOCK_FLAG_LAZY_REFCOUNTS 8
-#define BLOCK_IO_LIMIT_READ 0
-#define BLOCK_IO_LIMIT_WRITE 1
-#define BLOCK_IO_LIMIT_TOTAL 2
-
-#define BLOCK_IO_SLICE_TIME 100000000
-#define NANOSECONDS_PER_SECOND 1000000000.0
-
#define BLOCK_OPT_SIZE "size"
#define BLOCK_OPT_ENCRYPT "encryption"
#define BLOCK_OPT_COMPAT6 "compat6"
@@ -69,17 +63,6 @@ typedef struct BdrvTrackedRequest {
CoQueue wait_queue; /* coroutines blocked on this request */
} BdrvTrackedRequest;
-
-typedef struct BlockIOLimit {
- int64_t bps[3];
- int64_t iops[3];
-} BlockIOLimit;
-
-typedef struct BlockIOBaseValue {
- uint64_t bytes[2];
- uint64_t ios[2];
-} BlockIOBaseValue;
-
struct BlockDriver {
const char *format_name;
int instance_size;
@@ -263,13 +246,9 @@ struct BlockDriverState {
/* number of in-flight copy-on-read requests */
unsigned int copy_on_read_in_flight;
- /* the time for latest disk I/O */
- int64_t slice_start;
- int64_t slice_end;
- BlockIOLimit io_limits;
- BlockIOBaseValue slice_submitted;
- CoQueue throttled_reqs;
- QEMUTimer *block_timer;
+ /* I/O throttling */
+ ThrottleState throttle_state;
+ CoQueue throttled_reqs[2];
bool io_limits_enabled;
/* I/O stats (display with "info blockstats"). */
@@ -308,7 +287,8 @@ struct BlockDriverState {
int get_tmp_filename(char *filename, int size);
void bdrv_set_io_limits(BlockDriverState *bs,
- BlockIOLimit *io_limits);
+ ThrottleConfig *cfg);
+
/**
* bdrv_add_before_write_notifier:
--
1.7.10.4
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [Qemu-devel] [PATCH V5 3/5] block: Enable the new throttling code in the block layer.
2013-08-12 16:53 ` [Qemu-devel] [PATCH V5 3/5] block: Enable the new throttling code in the block layer Benoît Canet
@ 2013-08-14 9:31 ` Fam Zheng
2013-08-14 9:50 ` Fam Zheng
2013-08-16 12:02 ` Stefan Hajnoczi
1 sibling, 1 reply; 17+ messages in thread
From: Fam Zheng @ 2013-08-14 9:31 UTC (permalink / raw)
To: Benoît Canet; +Cc: kwolf, pbonzini, qemu-devel, stefanha
On Mon, 08/12 18:53, Benoît Canet wrote:
> Signed-off-by: Benoit Canet <benoit@irqsave.net>
> ---
> block.c | 349 ++++++++++++++-------------------------------
> block/qapi.c | 21 ++-
> blockdev.c | 100 +++++++------
> include/block/block.h | 1 -
> include/block/block_int.h | 32 +----
> 5 files changed, 173 insertions(+), 330 deletions(-)
>
> diff --git a/block.c b/block.c
> index 01b66d8..d49bc98 100644
> --- a/block.c
> +++ b/block.c
> @@ -86,13 +86,6 @@ static void coroutine_fn bdrv_co_do_rw(void *opaque);
> static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs,
> int64_t sector_num, int nb_sectors);
>
> -static bool bdrv_exceed_bps_limits(BlockDriverState *bs, int nb_sectors,
> - bool is_write, double elapsed_time, uint64_t *wait);
> -static bool bdrv_exceed_iops_limits(BlockDriverState *bs, bool is_write,
> - double elapsed_time, uint64_t *wait);
> -static bool bdrv_exceed_io_limits(BlockDriverState *bs, int nb_sectors,
> - bool is_write, int64_t *wait);
> -
> static QTAILQ_HEAD(, BlockDriverState) bdrv_states =
> QTAILQ_HEAD_INITIALIZER(bdrv_states);
>
> @@ -123,70 +116,110 @@ int is_windows_drive(const char *filename)
> #endif
>
> /* throttling disk I/O limits */
> -void bdrv_io_limits_disable(BlockDriverState *bs)
> +void bdrv_set_io_limits(BlockDriverState *bs,
> + ThrottleConfig *cfg)
> {
> - bs->io_limits_enabled = false;
> + int i;
> +
> + throttle_config(&bs->throttle_state, cfg);
> +
> + for (i = 0; i < 2; i++) {
> + qemu_co_enter_next(&bs->throttled_reqs[i]);
> + }
> +}
> +
> +/* this function drain all the throttled IOs */
> +static bool bdrv_drain_throttled(BlockDriverState *bs)
> +{
> + bool drained = false;
> + bool enabled = bs->io_limits_enabled;
> + int i;
>
> - do {} while (qemu_co_enter_next(&bs->throttled_reqs));
> + bs->io_limits_enabled = false;
>
> - if (bs->block_timer) {
> - qemu_del_timer(bs->block_timer);
> - qemu_free_timer(bs->block_timer);
> - bs->block_timer = NULL;
> + for (i = 0; i < 2; i++) {
> + while (qemu_co_enter_next(&bs->throttled_reqs[i])) {
> + drained = true;
OK, this should be right. Throttling is disabled here, so there can't be
inserting to throttled_reqs[0] when you drain throttled_reqs[1].
> + }
> }
>
> - bs->slice_start = 0;
> - bs->slice_end = 0;
> + bs->io_limits_enabled = enabled;
> +
> + return drained;
> }
>
> -static void bdrv_block_timer(void *opaque)
> +void bdrv_io_limits_disable(BlockDriverState *bs)
> +{
> + bs->io_limits_enabled = false;
> +
> + bdrv_drain_throttled(bs);
> +
> + throttle_destroy(&bs->throttle_state);
> +}
> +
> +static void bdrv_throttle_read_timer_cb(void *opaque)
> {
> BlockDriverState *bs = opaque;
> + qemu_co_enter_next(&bs->throttled_reqs[0]);
> +}
>
> - qemu_co_enter_next(&bs->throttled_reqs);
> +static void bdrv_throttle_write_timer_cb(void *opaque)
> +{
> + BlockDriverState *bs = opaque;
> + qemu_co_enter_next(&bs->throttled_reqs[1]);
> }
>
> +/* should be called before bdrv_set_io_limits if a limit is set */
> void bdrv_io_limits_enable(BlockDriverState *bs)
> {
> - qemu_co_queue_init(&bs->throttled_reqs);
> - bs->block_timer = qemu_new_timer_ns(vm_clock, bdrv_block_timer, bs);
> + throttle_init(&bs->throttle_state,
> + vm_clock,
> + bdrv_throttle_read_timer_cb,
> + bdrv_throttle_write_timer_cb,
> + bs);
> + qemu_co_queue_init(&bs->throttled_reqs[0]);
> + qemu_co_queue_init(&bs->throttled_reqs[1]);
> bs->io_limits_enabled = true;
> }
>
> -bool bdrv_io_limits_enabled(BlockDriverState *bs)
> +/* This function make an IO wait if needed
s/make/makes/
> + *
> + * @is_write: is the IO a write
> + * @nb_sectors: the number of sectors of the IO
> + */
> +static void bdrv_io_limits_intercept(BlockDriverState *bs,
> + bool is_write,
> + int nb_sectors)
> {
> - BlockIOLimit *io_limits = &bs->io_limits;
> - return io_limits->bps[BLOCK_IO_LIMIT_READ]
> - || io_limits->bps[BLOCK_IO_LIMIT_WRITE]
> - || io_limits->bps[BLOCK_IO_LIMIT_TOTAL]
> - || io_limits->iops[BLOCK_IO_LIMIT_READ]
> - || io_limits->iops[BLOCK_IO_LIMIT_WRITE]
> - || io_limits->iops[BLOCK_IO_LIMIT_TOTAL];
> + /* does this io must wait */
> + bool must_wait = !throttle_allowed(&bs->throttle_state, is_write);
> +
> + /* if must wait or any request of this type throttled queue the IO */
> + if (must_wait ||
> + !qemu_co_queue_empty(&bs->throttled_reqs[is_write])) {
> + qemu_co_queue_wait(&bs->throttled_reqs[is_write]);
> + }
> +
> + /* the IO will be executed do the accounting */
s/executed/executed,/
> + throttle_account(&bs->throttle_state, is_write, nb_sectors);
> }
>
> -static void bdrv_io_limits_intercept(BlockDriverState *bs,
> - bool is_write, int nb_sectors)
> +/* This function will schedule the next IO throttled IO of this type if needed
> + *
> + * @is_write: is the IO a write
> + */
> +static void bdrv_io_limits_resched(BlockDriverState *bs, bool is_write)
> {
> - int64_t wait_time = -1;
>
> - if (!qemu_co_queue_empty(&bs->throttled_reqs)) {
> - qemu_co_queue_wait(&bs->throttled_reqs);
> + if (qemu_co_queue_empty(&bs->throttled_reqs[is_write])) {
> + return;
> }
>
> - /* In fact, we hope to keep each request's timing, in FIFO mode. The next
> - * throttled requests will not be dequeued until the current request is
> - * allowed to be serviced. So if the current request still exceeds the
> - * limits, it will be inserted to the head. All requests followed it will
> - * be still in throttled_reqs queue.
> - */
> -
> - while (bdrv_exceed_io_limits(bs, nb_sectors, is_write, &wait_time)) {
> - qemu_mod_timer(bs->block_timer,
> - wait_time + qemu_get_clock_ns(vm_clock));
> - qemu_co_queue_wait_insert_head(&bs->throttled_reqs);
> + if (!throttle_allowed(&bs->throttle_state, is_write)) {
> + return;
> }
>
> - qemu_co_queue_next(&bs->throttled_reqs);
> + qemu_co_queue_next(&bs->throttled_reqs[is_write]);
> }
>
> /* check if the path starts with "<protocol>:" */
> @@ -1112,11 +1145,6 @@ int bdrv_open(BlockDriverState *bs, const char *filename, QDict *options,
> bdrv_dev_change_media_cb(bs, true);
> }
>
> - /* throttling disk I/O limits */
> - if (bs->io_limits_enabled) {
> - bdrv_io_limits_enable(bs);
> - }
> -
> return 0;
>
> unlink_and_fail:
> @@ -1452,7 +1480,7 @@ void bdrv_drain_all(void)
> * a busy wait.
> */
> QTAILQ_FOREACH(bs, &bdrv_states, list) {
> - while (qemu_co_enter_next(&bs->throttled_reqs)) {
> + if (bdrv_drain_throttled(bs)) {
> busy = true;
> }
> }
> @@ -1461,7 +1489,8 @@ void bdrv_drain_all(void)
> /* If requests are still pending there is a bug somewhere */
> QTAILQ_FOREACH(bs, &bdrv_states, list) {
> assert(QLIST_EMPTY(&bs->tracked_requests));
> - assert(qemu_co_queue_empty(&bs->throttled_reqs));
> + assert(qemu_co_queue_empty(&bs->throttled_reqs[0]));
> + assert(qemu_co_queue_empty(&bs->throttled_reqs[1]));
> }
> }
>
> @@ -1497,13 +1526,12 @@ static void bdrv_move_feature_fields(BlockDriverState *bs_dest,
>
> bs_dest->enable_write_cache = bs_src->enable_write_cache;
>
> - /* i/o timing parameters */
> - bs_dest->slice_start = bs_src->slice_start;
> - bs_dest->slice_end = bs_src->slice_end;
> - bs_dest->slice_submitted = bs_src->slice_submitted;
> - bs_dest->io_limits = bs_src->io_limits;
> - bs_dest->throttled_reqs = bs_src->throttled_reqs;
> - bs_dest->block_timer = bs_src->block_timer;
> + /* i/o throttled req */
> + memcpy(&bs_dest->throttle_state,
> + &bs_src->throttle_state,
> + sizeof(ThrottleState));
> + bs_dest->throttled_reqs[0] = bs_src->throttled_reqs[0];
> + bs_dest->throttled_reqs[1] = bs_src->throttled_reqs[1];
> bs_dest->io_limits_enabled = bs_src->io_limits_enabled;
>
> /* r/w error */
> @@ -1550,7 +1578,7 @@ void bdrv_swap(BlockDriverState *bs_new, BlockDriverState *bs_old)
> assert(bs_new->dev == NULL);
> assert(bs_new->in_use == 0);
> assert(bs_new->io_limits_enabled == false);
> - assert(bs_new->block_timer == NULL);
> + assert(!throttle_have_timer(&bs_new->throttle_state));
>
> tmp = *bs_new;
> *bs_new = *bs_old;
> @@ -1569,7 +1597,7 @@ void bdrv_swap(BlockDriverState *bs_new, BlockDriverState *bs_old)
> assert(bs_new->job == NULL);
> assert(bs_new->in_use == 0);
> assert(bs_new->io_limits_enabled == false);
> - assert(bs_new->block_timer == NULL);
> + assert(!throttle_have_timer(&bs_new->throttle_state));
>
> bdrv_rebind(bs_new);
> bdrv_rebind(bs_old);
> @@ -1860,8 +1888,15 @@ int bdrv_commit_all(void)
> *
> * This function should be called when a tracked request is completing.
> */
> -static void tracked_request_end(BdrvTrackedRequest *req)
> +static void tracked_request_end(BlockDriverState *bs,
> + BdrvTrackedRequest *req,
> + bool is_write)
> {
> + /* throttling disk I/O */
> + if (bs->io_limits_enabled) {
> + bdrv_io_limits_resched(bs, is_write);
> + }
> +
> QLIST_REMOVE(req, list);
> qemu_co_queue_restart_all(&req->wait_queue);
> }
> @@ -1874,6 +1909,11 @@ static void tracked_request_begin(BdrvTrackedRequest *req,
> int64_t sector_num,
> int nb_sectors, bool is_write)
> {
> + /* throttling disk I/O */
> + if (bs->io_limits_enabled) {
> + bdrv_io_limits_intercept(bs, is_write, nb_sectors);
> + }
> +
> *req = (BdrvTrackedRequest){
> .bs = bs,
> .sector_num = sector_num,
> @@ -2512,11 +2552,6 @@ static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs,
> return -EIO;
> }
>
> - /* throttling disk read I/O */
> - if (bs->io_limits_enabled) {
> - bdrv_io_limits_intercept(bs, false, nb_sectors);
> - }
> -
> if (bs->copy_on_read) {
> flags |= BDRV_REQ_COPY_ON_READ;
> }
> @@ -2547,7 +2582,7 @@ static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs,
> ret = drv->bdrv_co_readv(bs, sector_num, nb_sectors, qiov);
>
> out:
> - tracked_request_end(&req);
> + tracked_request_end(bs, &req, false);
>
> if (flags & BDRV_REQ_COPY_ON_READ) {
> bs->copy_on_read_in_flight--;
> @@ -2625,11 +2660,6 @@ static int coroutine_fn bdrv_co_do_writev(BlockDriverState *bs,
> return -EIO;
> }
>
> - /* throttling disk write I/O */
> - if (bs->io_limits_enabled) {
> - bdrv_io_limits_intercept(bs, true, nb_sectors);
> - }
> -
> if (bs->copy_on_read_in_flight) {
> wait_for_overlapping_requests(bs, sector_num, nb_sectors);
> }
> @@ -2658,7 +2688,7 @@ static int coroutine_fn bdrv_co_do_writev(BlockDriverState *bs,
> bs->wr_highest_sector = sector_num + nb_sectors - 1;
> }
>
> - tracked_request_end(&req);
> + tracked_request_end(bs, &req, true);
>
> return ret;
> }
> @@ -2751,14 +2781,6 @@ void bdrv_get_geometry(BlockDriverState *bs, uint64_t *nb_sectors_ptr)
> *nb_sectors_ptr = length;
> }
>
> -/* throttling disk io limits */
> -void bdrv_set_io_limits(BlockDriverState *bs,
> - BlockIOLimit *io_limits)
> -{
> - bs->io_limits = *io_limits;
> - bs->io_limits_enabled = bdrv_io_limits_enabled(bs);
> -}
> -
> void bdrv_set_on_error(BlockDriverState *bs, BlockdevOnError on_read_error,
> BlockdevOnError on_write_error)
> {
> @@ -3568,169 +3590,6 @@ void bdrv_aio_cancel(BlockDriverAIOCB *acb)
> acb->aiocb_info->cancel(acb);
> }
>
> -/* block I/O throttling */
> -static bool bdrv_exceed_bps_limits(BlockDriverState *bs, int nb_sectors,
> - bool is_write, double elapsed_time, uint64_t *wait)
> -{
> - uint64_t bps_limit = 0;
> - uint64_t extension;
> - double bytes_limit, bytes_base, bytes_res;
> - double slice_time, wait_time;
> -
> - if (bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL]) {
> - bps_limit = bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL];
> - } else if (bs->io_limits.bps[is_write]) {
> - bps_limit = bs->io_limits.bps[is_write];
> - } else {
> - if (wait) {
> - *wait = 0;
> - }
> -
> - return false;
> - }
> -
> - slice_time = bs->slice_end - bs->slice_start;
> - slice_time /= (NANOSECONDS_PER_SECOND);
> - bytes_limit = bps_limit * slice_time;
> - bytes_base = bs->slice_submitted.bytes[is_write];
> - if (bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL]) {
> - bytes_base += bs->slice_submitted.bytes[!is_write];
> - }
> -
> - /* bytes_base: the bytes of data which have been read/written; and
> - * it is obtained from the history statistic info.
> - * bytes_res: the remaining bytes of data which need to be read/written.
> - * (bytes_base + bytes_res) / bps_limit: used to calcuate
> - * the total time for completing reading/writting all data.
> - */
> - bytes_res = (unsigned) nb_sectors * BDRV_SECTOR_SIZE;
> -
> - if (bytes_base + bytes_res <= bytes_limit) {
> - if (wait) {
> - *wait = 0;
> - }
> -
> - return false;
> - }
> -
> - /* Calc approx time to dispatch */
> - wait_time = (bytes_base + bytes_res) / bps_limit - elapsed_time;
> -
> - /* When the I/O rate at runtime exceeds the limits,
> - * bs->slice_end need to be extended in order that the current statistic
> - * info can be kept until the timer fire, so it is increased and tuned
> - * based on the result of experiment.
> - */
> - extension = wait_time * NANOSECONDS_PER_SECOND;
> - extension = DIV_ROUND_UP(extension, BLOCK_IO_SLICE_TIME) *
> - BLOCK_IO_SLICE_TIME;
> - bs->slice_end += extension;
> - if (wait) {
> - *wait = wait_time * NANOSECONDS_PER_SECOND;
> - }
> -
> - return true;
> -}
> -
> -static bool bdrv_exceed_iops_limits(BlockDriverState *bs, bool is_write,
> - double elapsed_time, uint64_t *wait)
> -{
> - uint64_t iops_limit = 0;
> - double ios_limit, ios_base;
> - double slice_time, wait_time;
> -
> - if (bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL]) {
> - iops_limit = bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL];
> - } else if (bs->io_limits.iops[is_write]) {
> - iops_limit = bs->io_limits.iops[is_write];
> - } else {
> - if (wait) {
> - *wait = 0;
> - }
> -
> - return false;
> - }
> -
> - slice_time = bs->slice_end - bs->slice_start;
> - slice_time /= (NANOSECONDS_PER_SECOND);
> - ios_limit = iops_limit * slice_time;
> - ios_base = bs->slice_submitted.ios[is_write];
> - if (bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL]) {
> - ios_base += bs->slice_submitted.ios[!is_write];
> - }
> -
> - if (ios_base + 1 <= ios_limit) {
> - if (wait) {
> - *wait = 0;
> - }
> -
> - return false;
> - }
> -
> - /* Calc approx time to dispatch, in seconds */
> - wait_time = (ios_base + 1) / iops_limit;
> - if (wait_time > elapsed_time) {
> - wait_time = wait_time - elapsed_time;
> - } else {
> - wait_time = 0;
> - }
> -
> - /* Exceeded current slice, extend it by another slice time */
> - bs->slice_end += BLOCK_IO_SLICE_TIME;
> - if (wait) {
> - *wait = wait_time * NANOSECONDS_PER_SECOND;
> - }
> -
> - return true;
> -}
> -
> -static bool bdrv_exceed_io_limits(BlockDriverState *bs, int nb_sectors,
> - bool is_write, int64_t *wait)
> -{
> - int64_t now, max_wait;
> - uint64_t bps_wait = 0, iops_wait = 0;
> - double elapsed_time;
> - int bps_ret, iops_ret;
> -
> - now = qemu_get_clock_ns(vm_clock);
> - if (now > bs->slice_end) {
> - bs->slice_start = now;
> - bs->slice_end = now + BLOCK_IO_SLICE_TIME;
> - memset(&bs->slice_submitted, 0, sizeof(bs->slice_submitted));
> - }
> -
> - elapsed_time = now - bs->slice_start;
> - elapsed_time /= (NANOSECONDS_PER_SECOND);
> -
> - bps_ret = bdrv_exceed_bps_limits(bs, nb_sectors,
> - is_write, elapsed_time, &bps_wait);
> - iops_ret = bdrv_exceed_iops_limits(bs, is_write,
> - elapsed_time, &iops_wait);
> - if (bps_ret || iops_ret) {
> - max_wait = bps_wait > iops_wait ? bps_wait : iops_wait;
> - if (wait) {
> - *wait = max_wait;
> - }
> -
> - now = qemu_get_clock_ns(vm_clock);
> - if (bs->slice_end < now + max_wait) {
> - bs->slice_end = now + max_wait;
> - }
> -
> - return true;
> - }
> -
> - if (wait) {
> - *wait = 0;
> - }
> -
> - bs->slice_submitted.bytes[is_write] += (int64_t)nb_sectors *
> - BDRV_SECTOR_SIZE;
> - bs->slice_submitted.ios[is_write]++;
> -
> - return false;
> -}
> -
> /**************************************************************/
> /* async block device emulation */
>
> diff --git a/block/qapi.c b/block/qapi.c
> index a4bc411..45f806b 100644
> --- a/block/qapi.c
> +++ b/block/qapi.c
> @@ -223,18 +223,15 @@ void bdrv_query_info(BlockDriverState *bs,
> info->inserted->backing_file_depth = bdrv_get_backing_file_depth(bs);
>
> if (bs->io_limits_enabled) {
> - info->inserted->bps =
> - bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL];
> - info->inserted->bps_rd =
> - bs->io_limits.bps[BLOCK_IO_LIMIT_READ];
> - info->inserted->bps_wr =
> - bs->io_limits.bps[BLOCK_IO_LIMIT_WRITE];
> - info->inserted->iops =
> - bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL];
> - info->inserted->iops_rd =
> - bs->io_limits.iops[BLOCK_IO_LIMIT_READ];
> - info->inserted->iops_wr =
> - bs->io_limits.iops[BLOCK_IO_LIMIT_WRITE];
> + ThrottleConfig cfg;
> + throttle_get_config(&bs->throttle_state, &cfg);
> + info->inserted->bps = cfg.buckets[THROTTLE_BPS_TOTAL].ups;
> + info->inserted->bps_rd = cfg.buckets[THROTTLE_BPS_READ].ups;
> + info->inserted->bps_wr = cfg.buckets[THROTTLE_BPS_WRITE].ups;
> +
> + info->inserted->iops = cfg.buckets[THROTTLE_OPS_TOTAL].ups;
> + info->inserted->iops_rd = cfg.buckets[THROTTLE_OPS_READ].ups;
> + info->inserted->iops_wr = cfg.buckets[THROTTLE_OPS_WRITE].ups;
> }
>
> bs0 = bs;
> diff --git a/blockdev.c b/blockdev.c
> index 41b0a49..7559ea5 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -281,32 +281,16 @@ static int parse_block_error_action(const char *buf, bool is_read)
> }
> }
>
> -static bool do_check_io_limits(BlockIOLimit *io_limits, Error **errp)
> +static bool check_throttle_config(ThrottleConfig *cfg, Error **errp)
> {
> - bool bps_flag;
> - bool iops_flag;
> -
> - assert(io_limits);
> -
> - bps_flag = (io_limits->bps[BLOCK_IO_LIMIT_TOTAL] != 0)
> - && ((io_limits->bps[BLOCK_IO_LIMIT_READ] != 0)
> - || (io_limits->bps[BLOCK_IO_LIMIT_WRITE] != 0));
> - iops_flag = (io_limits->iops[BLOCK_IO_LIMIT_TOTAL] != 0)
> - && ((io_limits->iops[BLOCK_IO_LIMIT_READ] != 0)
> - || (io_limits->iops[BLOCK_IO_LIMIT_WRITE] != 0));
> - if (bps_flag || iops_flag) {
> - error_setg(errp, "bps(iops) and bps_rd/bps_wr(iops_rd/iops_wr) "
> + if (throttle_conflicting(cfg)) {
> + error_setg(errp, "bps/iops/max total values and read/write values"
> "cannot be used at the same time");
> return false;
> }
>
> - if (io_limits->bps[BLOCK_IO_LIMIT_TOTAL] < 0 ||
> - io_limits->bps[BLOCK_IO_LIMIT_WRITE] < 0 ||
> - io_limits->bps[BLOCK_IO_LIMIT_READ] < 0 ||
> - io_limits->iops[BLOCK_IO_LIMIT_TOTAL] < 0 ||
> - io_limits->iops[BLOCK_IO_LIMIT_WRITE] < 0 ||
> - io_limits->iops[BLOCK_IO_LIMIT_READ] < 0) {
> - error_setg(errp, "bps and iops values must be 0 or greater");
> + if (!throttle_is_valid(cfg)) {
> + error_setg(errp, "bps/iops/maxs values must be 0 or greater");
> return false;
> }
>
> @@ -331,7 +315,7 @@ static DriveInfo *blockdev_init(QemuOpts *all_opts,
> int on_read_error, on_write_error;
> const char *devaddr;
> DriveInfo *dinfo;
> - BlockIOLimit io_limits;
> + ThrottleConfig cfg;
> int snapshot = 0;
> bool copy_on_read;
> int ret;
> @@ -489,20 +473,31 @@ static DriveInfo *blockdev_init(QemuOpts *all_opts,
> }
>
> /* disk I/O throttling */
> - io_limits.bps[BLOCK_IO_LIMIT_TOTAL] =
> + cfg.buckets[THROTTLE_BPS_TOTAL].ups =
> qemu_opt_get_number(opts, "throttling.bps-total", 0);
> - io_limits.bps[BLOCK_IO_LIMIT_READ] =
> + cfg.buckets[THROTTLE_BPS_READ].ups =
> qemu_opt_get_number(opts, "throttling.bps-read", 0);
> - io_limits.bps[BLOCK_IO_LIMIT_WRITE] =
> + cfg.buckets[THROTTLE_BPS_WRITE].ups =
> qemu_opt_get_number(opts, "throttling.bps-write", 0);
> - io_limits.iops[BLOCK_IO_LIMIT_TOTAL] =
> + cfg.buckets[THROTTLE_OPS_TOTAL].ups =
> qemu_opt_get_number(opts, "throttling.iops-total", 0);
> - io_limits.iops[BLOCK_IO_LIMIT_READ] =
> + cfg.buckets[THROTTLE_OPS_READ].ups =
> qemu_opt_get_number(opts, "throttling.iops-read", 0);
> - io_limits.iops[BLOCK_IO_LIMIT_WRITE] =
> + cfg.buckets[THROTTLE_OPS_WRITE].ups =
> qemu_opt_get_number(opts, "throttling.iops-write", 0);
>
> - if (!do_check_io_limits(&io_limits, &error)) {
> + cfg.buckets[THROTTLE_BPS_TOTAL].max = 0;
> + cfg.buckets[THROTTLE_BPS_READ].max = 0;
> + cfg.buckets[THROTTLE_BPS_WRITE].max = 0;
> +
> + cfg.buckets[THROTTLE_OPS_TOTAL].max = 0;
> + cfg.buckets[THROTTLE_OPS_READ].max = 0;
> + cfg.buckets[THROTTLE_OPS_WRITE].max = 0;
> +
> + cfg.unit_size = BDRV_SECTOR_SIZE;
> + cfg.op_size = 0;
> +
> + if (!check_throttle_config(&cfg, &error)) {
> error_report("%s", error_get_pretty(error));
> error_free(error);
> return NULL;
> @@ -629,7 +624,10 @@ static DriveInfo *blockdev_init(QemuOpts *all_opts,
> bdrv_set_on_error(dinfo->bdrv, on_read_error, on_write_error);
>
> /* disk I/O throttling */
> - bdrv_set_io_limits(dinfo->bdrv, &io_limits);
> + if (throttle_enabled(&cfg)) {
> + bdrv_io_limits_enable(dinfo->bdrv);
> + bdrv_set_io_limits(dinfo->bdrv, &cfg);
> + }
>
> switch(type) {
> case IF_IDE:
> @@ -1262,7 +1260,7 @@ void qmp_block_set_io_throttle(const char *device, int64_t bps, int64_t bps_rd,
> int64_t bps_wr, int64_t iops, int64_t iops_rd,
> int64_t iops_wr, Error **errp)
> {
> - BlockIOLimit io_limits;
> + ThrottleConfig cfg;
> BlockDriverState *bs;
>
> bs = bdrv_find(device);
> @@ -1271,27 +1269,37 @@ void qmp_block_set_io_throttle(const char *device, int64_t bps, int64_t bps_rd,
> return;
> }
>
> - io_limits.bps[BLOCK_IO_LIMIT_TOTAL] = bps;
> - io_limits.bps[BLOCK_IO_LIMIT_READ] = bps_rd;
> - io_limits.bps[BLOCK_IO_LIMIT_WRITE] = bps_wr;
> - io_limits.iops[BLOCK_IO_LIMIT_TOTAL]= iops;
> - io_limits.iops[BLOCK_IO_LIMIT_READ] = iops_rd;
> - io_limits.iops[BLOCK_IO_LIMIT_WRITE]= iops_wr;
> + cfg.buckets[THROTTLE_BPS_TOTAL].ups = bps;
> + cfg.buckets[THROTTLE_BPS_READ].ups = bps_rd;
> + cfg.buckets[THROTTLE_BPS_WRITE].ups = bps_wr;
> +
> + cfg.buckets[THROTTLE_OPS_TOTAL].ups = iops;
> + cfg.buckets[THROTTLE_OPS_READ].ups = iops_rd;
> + cfg.buckets[THROTTLE_OPS_WRITE].ups = iops_wr;
> +
> + cfg.buckets[THROTTLE_BPS_TOTAL].max = 0;
> + cfg.buckets[THROTTLE_BPS_READ].max = 0;
> + cfg.buckets[THROTTLE_BPS_WRITE].max = 0;
> +
> + cfg.buckets[THROTTLE_OPS_TOTAL].max = 0;
> + cfg.buckets[THROTTLE_OPS_READ].max = 0;
> + cfg.buckets[THROTTLE_OPS_WRITE].max = 0;
>
> - if (!do_check_io_limits(&io_limits, errp)) {
> + cfg.unit_size = BDRV_SECTOR_SIZE;
Does this mean user set bps limit in sector now? I think we should be
consistent to existing parameter unit.
> + cfg.op_size = 0;
Why not set op_size to 1?
> +
> + if (!check_throttle_config(&cfg, errp)) {
> return;
> }
>
> - bs->io_limits = io_limits;
> -
> - if (!bs->io_limits_enabled && bdrv_io_limits_enabled(bs)) {
> + if (!bs->io_limits_enabled && throttle_enabled(&cfg)) {
> bdrv_io_limits_enable(bs);
> - } else if (bs->io_limits_enabled && !bdrv_io_limits_enabled(bs)) {
> + } else if (bs->io_limits_enabled && !throttle_enabled(&cfg)) {
> bdrv_io_limits_disable(bs);
> - } else {
> - if (bs->block_timer) {
> - qemu_mod_timer(bs->block_timer, qemu_get_clock_ns(vm_clock));
> - }
> + }
> +
> + if (bs->io_limits_enabled) {
> + bdrv_set_io_limits(bs, &cfg);
> }
> }
>
> diff --git a/include/block/block.h b/include/block/block.h
> index 742fce5..b16d579 100644
> --- a/include/block/block.h
> +++ b/include/block/block.h
> @@ -107,7 +107,6 @@ void bdrv_info_stats(Monitor *mon, QObject **ret_data);
> /* disk I/O throttling */
> void bdrv_io_limits_enable(BlockDriverState *bs);
> void bdrv_io_limits_disable(BlockDriverState *bs);
> -bool bdrv_io_limits_enabled(BlockDriverState *bs);
>
> void bdrv_init(void);
> void bdrv_init_with_whitelist(void);
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index e45f2a0..a0b6bc1 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -34,18 +34,12 @@
> #include "monitor/monitor.h"
> #include "qemu/hbitmap.h"
> #include "block/snapshot.h"
> +#include "qemu/throttle.h"
>
> #define BLOCK_FLAG_ENCRYPT 1
> #define BLOCK_FLAG_COMPAT6 4
> #define BLOCK_FLAG_LAZY_REFCOUNTS 8
>
> -#define BLOCK_IO_LIMIT_READ 0
> -#define BLOCK_IO_LIMIT_WRITE 1
> -#define BLOCK_IO_LIMIT_TOTAL 2
> -
> -#define BLOCK_IO_SLICE_TIME 100000000
> -#define NANOSECONDS_PER_SECOND 1000000000.0
> -
> #define BLOCK_OPT_SIZE "size"
> #define BLOCK_OPT_ENCRYPT "encryption"
> #define BLOCK_OPT_COMPAT6 "compat6"
> @@ -69,17 +63,6 @@ typedef struct BdrvTrackedRequest {
> CoQueue wait_queue; /* coroutines blocked on this request */
> } BdrvTrackedRequest;
>
> -
> -typedef struct BlockIOLimit {
> - int64_t bps[3];
> - int64_t iops[3];
> -} BlockIOLimit;
> -
> -typedef struct BlockIOBaseValue {
> - uint64_t bytes[2];
> - uint64_t ios[2];
> -} BlockIOBaseValue;
> -
> struct BlockDriver {
> const char *format_name;
> int instance_size;
> @@ -263,13 +246,9 @@ struct BlockDriverState {
> /* number of in-flight copy-on-read requests */
> unsigned int copy_on_read_in_flight;
>
> - /* the time for latest disk I/O */
> - int64_t slice_start;
> - int64_t slice_end;
> - BlockIOLimit io_limits;
> - BlockIOBaseValue slice_submitted;
> - CoQueue throttled_reqs;
> - QEMUTimer *block_timer;
> + /* I/O throttling */
> + ThrottleState throttle_state;
> + CoQueue throttled_reqs[2];
> bool io_limits_enabled;
>
> /* I/O stats (display with "info blockstats"). */
> @@ -308,7 +287,8 @@ struct BlockDriverState {
> int get_tmp_filename(char *filename, int size);
>
> void bdrv_set_io_limits(BlockDriverState *bs,
> - BlockIOLimit *io_limits);
> + ThrottleConfig *cfg);
> +
>
> /**
> * bdrv_add_before_write_notifier:
> --
> 1.7.10.4
>
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Qemu-devel] [PATCH V5 3/5] block: Enable the new throttling code in the block layer.
2013-08-14 9:31 ` Fam Zheng
@ 2013-08-14 9:50 ` Fam Zheng
0 siblings, 0 replies; 17+ messages in thread
From: Fam Zheng @ 2013-08-14 9:50 UTC (permalink / raw)
To: Benoît Canet; +Cc: kwolf, pbonzini, qemu-devel, stefanha
On Wed, 08/14 17:31, Fam Zheng wrote:
> On Mon, 08/12 18:53, Benoît Canet wrote:
> > Signed-off-by: Benoit Canet <benoit@irqsave.net>
> > @@ -1262,7 +1260,7 @@ void qmp_block_set_io_throttle(const char *device, int64_t bps, int64_t bps_rd,
> > int64_t bps_wr, int64_t iops, int64_t iops_rd,
> > int64_t iops_wr, Error **errp)
> > {
> > - BlockIOLimit io_limits;
> > + ThrottleConfig cfg;
> > BlockDriverState *bs;
> >
> > bs = bdrv_find(device);
> > @@ -1271,27 +1269,37 @@ void qmp_block_set_io_throttle(const char *device, int64_t bps, int64_t bps_rd,
> > return;
> > }
> >
> > - io_limits.bps[BLOCK_IO_LIMIT_TOTAL] = bps;
> > - io_limits.bps[BLOCK_IO_LIMIT_READ] = bps_rd;
> > - io_limits.bps[BLOCK_IO_LIMIT_WRITE] = bps_wr;
> > - io_limits.iops[BLOCK_IO_LIMIT_TOTAL]= iops;
> > - io_limits.iops[BLOCK_IO_LIMIT_READ] = iops_rd;
> > - io_limits.iops[BLOCK_IO_LIMIT_WRITE]= iops_wr;
> > + cfg.buckets[THROTTLE_BPS_TOTAL].ups = bps;
> > + cfg.buckets[THROTTLE_BPS_READ].ups = bps_rd;
> > + cfg.buckets[THROTTLE_BPS_WRITE].ups = bps_wr;
> > +
> > + cfg.buckets[THROTTLE_OPS_TOTAL].ups = iops;
> > + cfg.buckets[THROTTLE_OPS_READ].ups = iops_rd;
> > + cfg.buckets[THROTTLE_OPS_WRITE].ups = iops_wr;
> > +
> > + cfg.buckets[THROTTLE_BPS_TOTAL].max = 0;
> > + cfg.buckets[THROTTLE_BPS_READ].max = 0;
> > + cfg.buckets[THROTTLE_BPS_WRITE].max = 0;
> > +
> > + cfg.buckets[THROTTLE_OPS_TOTAL].max = 0;
> > + cfg.buckets[THROTTLE_OPS_READ].max = 0;
> > + cfg.buckets[THROTTLE_OPS_WRITE].max = 0;
> >
> > - if (!do_check_io_limits(&io_limits, errp)) {
> > + cfg.unit_size = BDRV_SECTOR_SIZE;
>
> Does this mean user set bps limit in sector now? I think we should be
> consistent to existing parameter unit.
>
> > + cfg.op_size = 0;
>
> Why not set op_size to 1?
>
Never mind. You se it to 0 here and it's the condition of setting
unit=1.
Fam
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Qemu-devel] [PATCH V5 3/5] block: Enable the new throttling code in the block layer.
2013-08-12 16:53 ` [Qemu-devel] [PATCH V5 3/5] block: Enable the new throttling code in the block layer Benoît Canet
2013-08-14 9:31 ` Fam Zheng
@ 2013-08-16 12:02 ` Stefan Hajnoczi
1 sibling, 0 replies; 17+ messages in thread
From: Stefan Hajnoczi @ 2013-08-16 12:02 UTC (permalink / raw)
To: Benoît Canet; +Cc: kwolf, pbonzini, qemu-devel, stefanha
On Mon, Aug 12, 2013 at 06:53:14PM +0200, Benoît Canet wrote:
> +/* this function drain all the throttled IOs */
> +static bool bdrv_drain_throttled(BlockDriverState *bs)
> +{
> + bool drained = false;
> + bool enabled = bs->io_limits_enabled;
> + int i;
>
> - do {} while (qemu_co_enter_next(&bs->throttled_reqs));
> + bs->io_limits_enabled = false;
>
> - if (bs->block_timer) {
> - qemu_del_timer(bs->block_timer);
> - qemu_free_timer(bs->block_timer);
> - bs->block_timer = NULL;
> + for (i = 0; i < 2; i++) {
> + while (qemu_co_enter_next(&bs->throttled_reqs[i])) {
> + drained = true;
> + }
> }
This function submits throttled requests but it doesn't drain them -
they might still be executing when this function returns!
> +/* should be called before bdrv_set_io_limits if a limit is set */
> void bdrv_io_limits_enable(BlockDriverState *bs)
> {
> - qemu_co_queue_init(&bs->throttled_reqs);
> - bs->block_timer = qemu_new_timer_ns(vm_clock, bdrv_block_timer, bs);
Make sure we never double-initialized ->throttle_state:
assert(!bs->io_limits enabled);
> @@ -2512,11 +2552,6 @@ static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs,
> return -EIO;
> }
>
> - /* throttling disk read I/O */
> - if (bs->io_limits_enabled) {
> - bdrv_io_limits_intercept(bs, false, nb_sectors);
> - }
> -
Why move bdrv_io_limits_intercept() into tracked_request_begin()? IMO
tracked_request_begin() should only create the tracked request, it
shouldn't do unrelated stuff like I/O throttling and yielding. If it
does then it must be declared coroutine_fn.
^ permalink raw reply [flat|nested] 17+ messages in thread
* [Qemu-devel] [PATCH V5 4/5] block: Add support for throttling burst max in QMP and the command line.
2013-08-12 16:53 [Qemu-devel] [PATCH V5 0/5] Continuous Leaky Bucket Throttling Benoît Canet
` (2 preceding siblings ...)
2013-08-12 16:53 ` [Qemu-devel] [PATCH V5 3/5] block: Enable the new throttling code in the block layer Benoît Canet
@ 2013-08-12 16:53 ` Benoît Canet
2013-08-16 12:07 ` Stefan Hajnoczi
2013-08-12 16:53 ` [Qemu-devel] [PATCH V5 5/5] block: Add iops_sector_count to do the iops accounting for a given io size Benoît Canet
2013-08-16 12:14 ` [Qemu-devel] [PATCH V5 0/5] Continuous Leaky Bucket Throttling Stefan Hajnoczi
5 siblings, 1 reply; 17+ messages in thread
From: Benoît Canet @ 2013-08-12 16:53 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha
The max parameter of the leaky bycket throttling algoritm can be used to
allow the guest to do bursts.
The max value is a pool of I/O that the guest can use without being throttled
at all. Throttling is triggered once this pool is empty.
Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
block/qapi.c | 26 ++++++++++++
blockdev.c | 119 ++++++++++++++++++++++++++++++++++++++++++++++--------
hmp.c | 32 +++++++++++++--
qapi-schema.json | 34 +++++++++++++++-
qemu-options.hx | 4 +-
qmp-commands.hx | 30 ++++++++++++--
6 files changed, 220 insertions(+), 25 deletions(-)
diff --git a/block/qapi.c b/block/qapi.c
index 45f806b..5ba10f4 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -232,6 +232,32 @@ void bdrv_query_info(BlockDriverState *bs,
info->inserted->iops = cfg.buckets[THROTTLE_OPS_TOTAL].ups;
info->inserted->iops_rd = cfg.buckets[THROTTLE_OPS_READ].ups;
info->inserted->iops_wr = cfg.buckets[THROTTLE_OPS_WRITE].ups;
+
+ info->inserted->has_bps_max =
+ cfg.buckets[THROTTLE_BPS_TOTAL].max;
+ info->inserted->bps_max =
+ cfg.buckets[THROTTLE_BPS_TOTAL].max;
+ info->inserted->has_bps_rd_max =
+ cfg.buckets[THROTTLE_BPS_READ].max;
+ info->inserted->bps_rd_max =
+ cfg.buckets[THROTTLE_BPS_READ].max;
+ info->inserted->has_bps_wr_max =
+ cfg.buckets[THROTTLE_BPS_WRITE].max;
+ info->inserted->bps_wr_max =
+ cfg.buckets[THROTTLE_BPS_WRITE].max;
+
+ info->inserted->has_iops_max =
+ cfg.buckets[THROTTLE_OPS_TOTAL].max;
+ info->inserted->iops_max =
+ cfg.buckets[THROTTLE_OPS_TOTAL].max;
+ info->inserted->has_iops_rd_max =
+ cfg.buckets[THROTTLE_OPS_READ].max;
+ info->inserted->iops_rd_max =
+ cfg.buckets[THROTTLE_OPS_READ].max;
+ info->inserted->has_iops_wr_max =
+ cfg.buckets[THROTTLE_OPS_WRITE].max;
+ info->inserted->iops_wr_max =
+ cfg.buckets[THROTTLE_OPS_WRITE].max;
}
bs0 = bs;
diff --git a/blockdev.c b/blockdev.c
index 7559ea5..22016a2 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -486,13 +486,18 @@ static DriveInfo *blockdev_init(QemuOpts *all_opts,
cfg.buckets[THROTTLE_OPS_WRITE].ups =
qemu_opt_get_number(opts, "throttling.iops-write", 0);
- cfg.buckets[THROTTLE_BPS_TOTAL].max = 0;
- cfg.buckets[THROTTLE_BPS_READ].max = 0;
- cfg.buckets[THROTTLE_BPS_WRITE].max = 0;
-
- cfg.buckets[THROTTLE_OPS_TOTAL].max = 0;
- cfg.buckets[THROTTLE_OPS_READ].max = 0;
- cfg.buckets[THROTTLE_OPS_WRITE].max = 0;
+ cfg.buckets[THROTTLE_BPS_TOTAL].max =
+ qemu_opt_get_number(opts, "throttling.bps-total-max", 0);
+ cfg.buckets[THROTTLE_BPS_READ].max =
+ qemu_opt_get_number(opts, "throttling.bps-read-max", 0);
+ cfg.buckets[THROTTLE_BPS_WRITE].max =
+ qemu_opt_get_number(opts, "throttling.bps-write-max", 0);
+ cfg.buckets[THROTTLE_OPS_TOTAL].max =
+ qemu_opt_get_number(opts, "throttling.iops-total-max", 0);
+ cfg.buckets[THROTTLE_OPS_READ].max =
+ qemu_opt_get_number(opts, "throttling.iops-read-max", 0);
+ cfg.buckets[THROTTLE_OPS_WRITE].max =
+ qemu_opt_get_number(opts, "throttling.iops-write-max", 0);
cfg.unit_size = BDRV_SECTOR_SIZE;
cfg.op_size = 0;
@@ -773,6 +778,14 @@ DriveInfo *drive_init(QemuOpts *all_opts, BlockInterfaceType block_default_type)
qemu_opt_rename(all_opts, "bps_rd", "throttling.bps-read");
qemu_opt_rename(all_opts, "bps_wr", "throttling.bps-write");
+ qemu_opt_rename(all_opts, "iops_max", "throttling.iops-total-max");
+ qemu_opt_rename(all_opts, "iops_rd_max", "throttling.iops-read-max");
+ qemu_opt_rename(all_opts, "iops_wr_max", "throttling.iops-write-max");
+
+ qemu_opt_rename(all_opts, "bps_max", "throttling.bps-total-max");
+ qemu_opt_rename(all_opts, "bps_rd_max", "throttling.bps-read-max");
+ qemu_opt_rename(all_opts, "bps_wr_max", "throttling.bps-write-max");
+
qemu_opt_rename(all_opts, "readonly", "read-only");
value = qemu_opt_get(all_opts, "cache");
@@ -1257,8 +1270,22 @@ void qmp_change_blockdev(const char *device, const char *filename,
/* throttling disk I/O limits */
void qmp_block_set_io_throttle(const char *device, int64_t bps, int64_t bps_rd,
- int64_t bps_wr, int64_t iops, int64_t iops_rd,
- int64_t iops_wr, Error **errp)
+ int64_t bps_wr,
+ int64_t iops,
+ int64_t iops_rd,
+ int64_t iops_wr,
+ bool has_bps_max,
+ int64_t bps_max,
+ bool has_bps_rd_max,
+ int64_t bps_rd_max,
+ bool has_bps_wr_max,
+ int64_t bps_wr_max,
+ bool has_iops_max,
+ int64_t iops_max,
+ bool has_iops_rd_max,
+ int64_t iops_rd_max,
+ bool has_iops_wr_max,
+ int64_t iops_wr_max, Error **errp)
{
ThrottleConfig cfg;
BlockDriverState *bs;
@@ -1269,6 +1296,7 @@ void qmp_block_set_io_throttle(const char *device, int64_t bps, int64_t bps_rd,
return;
}
+ memset(&cfg, 0, sizeof(cfg));
cfg.buckets[THROTTLE_BPS_TOTAL].ups = bps;
cfg.buckets[THROTTLE_BPS_READ].ups = bps_rd;
cfg.buckets[THROTTLE_BPS_WRITE].ups = bps_wr;
@@ -1277,13 +1305,24 @@ void qmp_block_set_io_throttle(const char *device, int64_t bps, int64_t bps_rd,
cfg.buckets[THROTTLE_OPS_READ].ups = iops_rd;
cfg.buckets[THROTTLE_OPS_WRITE].ups = iops_wr;
- cfg.buckets[THROTTLE_BPS_TOTAL].max = 0;
- cfg.buckets[THROTTLE_BPS_READ].max = 0;
- cfg.buckets[THROTTLE_BPS_WRITE].max = 0;
-
- cfg.buckets[THROTTLE_OPS_TOTAL].max = 0;
- cfg.buckets[THROTTLE_OPS_READ].max = 0;
- cfg.buckets[THROTTLE_OPS_WRITE].max = 0;
+ if (has_bps_max) {
+ cfg.buckets[THROTTLE_BPS_TOTAL].max = bps_max;
+ }
+ if (has_bps_rd_max) {
+ cfg.buckets[THROTTLE_BPS_READ].max = bps_rd_max;
+ }
+ if (has_bps_wr_max) {
+ cfg.buckets[THROTTLE_BPS_WRITE].max = bps_wr_max;
+ }
+ if (has_iops_max) {
+ cfg.buckets[THROTTLE_OPS_TOTAL].max = iops_max;
+ }
+ if (has_iops_rd_max) {
+ cfg.buckets[THROTTLE_OPS_READ].max = iops_rd_max;
+ }
+ if (has_iops_wr_max) {
+ cfg.buckets[THROTTLE_OPS_WRITE].max = iops_wr_max;
+ }
cfg.unit_size = BDRV_SECTOR_SIZE;
cfg.op_size = 0;
@@ -1988,6 +2027,30 @@ QemuOptsList qemu_common_drive_opts = {
.type = QEMU_OPT_NUMBER,
.help = "limit write bytes per second",
},{
+ .name = "throttling.iops-total-max",
+ .type = QEMU_OPT_NUMBER,
+ .help = "I/O operations burst",
+ },{
+ .name = "throttling.iops-read-max",
+ .type = QEMU_OPT_NUMBER,
+ .help = "I/O operations read burst",
+ },{
+ .name = "throttling.iops-write-max",
+ .type = QEMU_OPT_NUMBER,
+ .help = "I/O operations write burst",
+ },{
+ .name = "throttling.bps-total-max",
+ .type = QEMU_OPT_NUMBER,
+ .help = "total bytes burst",
+ },{
+ .name = "throttling.bps-read-max",
+ .type = QEMU_OPT_NUMBER,
+ .help = "total bytes read burst",
+ },{
+ .name = "throttling.bps-write-max",
+ .type = QEMU_OPT_NUMBER,
+ .help = "total bytes write burst",
+ },{
.name = "copy-on-read",
.type = QEMU_OPT_BOOL,
.help = "copy read data from backing file into image file",
@@ -2110,6 +2173,30 @@ QemuOptsList qemu_old_drive_opts = {
.type = QEMU_OPT_NUMBER,
.help = "limit write bytes per second",
},{
+ .name = "iops_max",
+ .type = QEMU_OPT_NUMBER,
+ .help = "I/O operations burst",
+ },{
+ .name = "iops_rd_max",
+ .type = QEMU_OPT_NUMBER,
+ .help = "I/O operations read burst",
+ },{
+ .name = "iops_wr_max",
+ .type = QEMU_OPT_NUMBER,
+ .help = "I/O operations write burst",
+ },{
+ .name = "bps_max",
+ .type = QEMU_OPT_NUMBER,
+ .help = "total bytes burst",
+ },{
+ .name = "bps_rd_max",
+ .type = QEMU_OPT_NUMBER,
+ .help = "total bytes read burst",
+ },{
+ .name = "bps_wr_max",
+ .type = QEMU_OPT_NUMBER,
+ .help = "total bytes write burst",
+ },{
.name = "copy-on-read",
.type = QEMU_OPT_BOOL,
.help = "copy read data from backing file into image file",
diff --git a/hmp.c b/hmp.c
index c45514b..e0c387c 100644
--- a/hmp.c
+++ b/hmp.c
@@ -344,14 +344,28 @@ void hmp_info_block(Monitor *mon, const QDict *qdict)
{
monitor_printf(mon, " I/O throttling: bps=%" PRId64
" bps_rd=%" PRId64 " bps_wr=%" PRId64
+ " bps_max=%" PRId64
+ " bps_rd_max=%" PRId64
+ " bps_wr_max=%" PRId64
" iops=%" PRId64 " iops_rd=%" PRId64
- " iops_wr=%" PRId64 "\n",
+ " iops_wr=%" PRId64
+ " iops_max=%" PRId64
+ " iops_rd_max=%" PRId64
+ " iops_wr_max=%" PRId64 "\n",
info->value->inserted->bps,
info->value->inserted->bps_rd,
info->value->inserted->bps_wr,
+ info->value->inserted->bps_max,
+ info->value->inserted->bps_rd_max,
+ info->value->inserted->bps_wr_max,
info->value->inserted->iops,
info->value->inserted->iops_rd,
- info->value->inserted->iops_wr);
+ info->value->inserted->iops_wr,
+ info->value->inserted->iops_max,
+ info->value->inserted->iops_rd_max,
+ info->value->inserted->iops_wr_max);
+ } else {
+ monitor_printf(mon, " [not inserted]");
}
if (verbose) {
@@ -1098,7 +1112,19 @@ void hmp_block_set_io_throttle(Monitor *mon, const QDict *qdict)
qdict_get_int(qdict, "bps_wr"),
qdict_get_int(qdict, "iops"),
qdict_get_int(qdict, "iops_rd"),
- qdict_get_int(qdict, "iops_wr"), &err);
+ qdict_get_int(qdict, "iops_wr"),
+ false, /* no burst max via QMP */
+ 0,
+ false,
+ 0,
+ false,
+ 0,
+ false,
+ 0,
+ false,
+ 0,
+ false,
+ 0, &err);
hmp_handle_error(mon, &err);
}
diff --git a/qapi-schema.json b/qapi-schema.json
index a51f7d2..5e5461e 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -785,6 +785,18 @@
#
# @image: the info of image used (since: 1.6)
#
+# @bps_max: #optional total max in bytes (Since 1.7)
+#
+# @bps_rd_max: #optional read max in bytes (Since 1.7)
+#
+# @bps_wr_max: #optional write max in bytes (Since 1.7)
+#
+# @iops_max: #optional total I/O operations max (Since 1.7)
+#
+# @iops_rd_max: #optional read I/O operations max (Since 1.7)
+#
+# @iops_wr_max: #optional write I/O operations max (Since 1.7)
+#
# Since: 0.14.0
#
# Notes: This interface is only found in @BlockInfo.
@@ -795,7 +807,10 @@
'encrypted': 'bool', 'encryption_key_missing': 'bool',
'bps': 'int', 'bps_rd': 'int', 'bps_wr': 'int',
'iops': 'int', 'iops_rd': 'int', 'iops_wr': 'int',
- 'image': 'ImageInfo' } }
+ 'image': 'ImageInfo',
+ '*bps_max': 'int', '*bps_rd_max': 'int',
+ '*bps_wr_max': 'int', '*iops_max': 'int',
+ '*iops_rd_max': 'int', '*iops_wr_max': 'int' }}
##
# @BlockDeviceIoStatus:
@@ -2174,6 +2189,18 @@
#
# @iops_wr: write I/O operations per second
#
+# @bps_max: #optional total max in bytes (Since 1.7)
+#
+# @bps_rd_max: #optional read max in bytes (Since 1.7)
+#
+# @bps_wr_max: #optional write max in bytes (Since 1.7)
+#
+# @iops_max: #optional total I/O operations max (Since 1.7)
+#
+# @iops_rd_max: #optional read I/O operations max (Since 1.7)
+#
+# @iops_wr_max: #optional write I/O operations max (Since 1.7)
+#
# Returns: Nothing on success
# If @device is not a valid block device, DeviceNotFound
#
@@ -2181,7 +2208,10 @@
##
{ 'command': 'block_set_io_throttle',
'data': { 'device': 'str', 'bps': 'int', 'bps_rd': 'int', 'bps_wr': 'int',
- 'iops': 'int', 'iops_rd': 'int', 'iops_wr': 'int' } }
+ 'iops': 'int', 'iops_rd': 'int', 'iops_wr': 'int',
+ '*bps_max': 'int', '*bps_rd_max': 'int',
+ '*bps_wr_max': 'int', '*iops_max': 'int',
+ '*iops_rd_max': 'int', '*iops_wr_max': 'int' }}
##
# @block-stream:
diff --git a/qemu-options.hx b/qemu-options.hx
index d15338e..3aa8f9e 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -409,7 +409,9 @@ DEF("drive", HAS_ARG, QEMU_OPTION_drive,
" [,cache=writethrough|writeback|none|directsync|unsafe][,format=f]\n"
" [,serial=s][,addr=A][,id=name][,aio=threads|native]\n"
" [,readonly=on|off][,copy-on-read=on|off]\n"
- " [[,bps=b]|[[,bps_rd=r][,bps_wr=w]]][[,iops=i]|[[,iops_rd=r][,iops_wr=w]]\n"
+ " [[,bps=b]|[[,bps_rd=r][,bps_wr=w]]][[,iops=i]|[[,iops_rd=r]"
+ " [,iops_wr=w][,bps_max=bt]|[[,bps_rd_max=rt][,bps_wr_max=wt]]]"
+ " [[,iops_max=it]|[[,iops_rd_max=rt][,iops_wr_max=wt]]\n"
" use 'file' as a drive image\n", QEMU_ARCH_ALL)
STEXI
@item -drive @var{option}[,@var{option}[,@var{option}[,...]]]
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 2e59b0d..1610831 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -1389,7 +1389,7 @@ EQMP
{
.name = "block_set_io_throttle",
- .args_type = "device:B,bps:l,bps_rd:l,bps_wr:l,iops:l,iops_rd:l,iops_wr:l",
+ .args_type = "device:B,bps:l,bps_rd:l,bps_wr:l,iops:l,iops_rd:l,iops_wr:l,bps_max:l?,bps_rd_max:l?,bps_wr_max:l?,iops_max:l?,iops_rd_max:l?,iops_wr_max:l?",
.mhandler.cmd_new = qmp_marshal_input_block_set_io_throttle,
},
@@ -1404,10 +1404,16 @@ Arguments:
- "device": device name (json-string)
- "bps": total throughput limit in bytes per second(json-int)
- "bps_rd": read throughput limit in bytes per second(json-int)
-- "bps_wr": read throughput limit in bytes per second(json-int)
+- "bps_wr": write throughput limit in bytes per second(json-int)
- "iops": total I/O operations per second(json-int)
- "iops_rd": read I/O operations per second(json-int)
- "iops_wr": write I/O operations per second(json-int)
+- "bps_max": total max in bytes(json-int)
+- "bps_rd_max": read max in bytes(json-int)
+- "bps_wr_max": write max in bytes(json-int)
+- "iops_max": total I/O operations max(json-int)
+- "iops_rd_max": read I/O operations max(json-int)
+- "iops_wr_max": write I/O operations max(json-int)
Example:
@@ -1417,7 +1423,13 @@ Example:
"bps_wr": "0",
"iops": "0",
"iops_rd": "0",
- "iops_wr": "0" } }
+ "iops_wr": "0",
+ "bps_max": "8000000",
+ "bps_rd_max": "0",
+ "bps_wr_max": "0",
+ "iops_max": "0",
+ "iops_rd_max": "0",
+ "iops_wr_max": "0" } }
<- { "return": {} }
EQMP
@@ -1758,6 +1770,12 @@ Each json-object contain the following:
- "iops": limit total I/O operations per second (json-int)
- "iops_rd": limit read operations per second (json-int)
- "iops_wr": limit write operations per second (json-int)
+ - "bps_max": total max in bytes(json-int)
+ - "bps_rd_max": read max in bytes(json-int)
+ - "bps_wr_max": write max in bytes(json-int)
+ - "iops_max": total I/O operations max(json-int)
+ - "iops_rd_max": read I/O operations max(json-int)
+ - "iops_wr_max": write I/O operations max(json-int)
- "image": the detail of the image, it is a json-object containing
the following:
- "filename": image file name (json-string)
@@ -1827,6 +1845,12 @@ Example:
"iops":1000000,
"iops_rd":0,
"iops_wr":0,
+ "bps_max": "8000000",
+ "bps_rd_max": "0",
+ "bps_wr_max": "0",
+ "iops_max": "0",
+ "iops_rd_max": "0",
+ "iops_wr_max": "0",
"image":{
"filename":"disks/test.qcow2",
"format":"qcow2",
--
1.7.10.4
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [Qemu-devel] [PATCH V5 4/5] block: Add support for throttling burst max in QMP and the command line.
2013-08-12 16:53 ` [Qemu-devel] [PATCH V5 4/5] block: Add support for throttling burst max in QMP and the command line Benoît Canet
@ 2013-08-16 12:07 ` Stefan Hajnoczi
0 siblings, 0 replies; 17+ messages in thread
From: Stefan Hajnoczi @ 2013-08-16 12:07 UTC (permalink / raw)
To: Benoît Canet; +Cc: kwolf, pbonzini, qemu-devel, stefanha
On Mon, Aug 12, 2013 at 06:53:15PM +0200, Benoît Canet wrote:
> The max parameter of the leaky bycket throttling algoritm can be used to
s/bycket/bucket/
s/algoritm/algorithm/
> @@ -1098,7 +1112,19 @@ void hmp_block_set_io_throttle(Monitor *mon, const QDict *qdict)
> qdict_get_int(qdict, "bps_wr"),
> qdict_get_int(qdict, "iops"),
> qdict_get_int(qdict, "iops_rd"),
> - qdict_get_int(qdict, "iops_wr"), &err);
> + qdict_get_int(qdict, "iops_wr"),
> + false, /* no burst max via QMP */
s/QMP/HMP/
> @@ -1758,6 +1770,12 @@ Each json-object contain the following:
> - "iops": limit total I/O operations per second (json-int)
> - "iops_rd": limit read operations per second (json-int)
> - "iops_wr": limit write operations per second (json-int)
> + - "bps_max": total max in bytes(json-int)
Space before "(json-int)" and for the others new fields too.
^ permalink raw reply [flat|nested] 17+ messages in thread
* [Qemu-devel] [PATCH V5 5/5] block: Add iops_sector_count to do the iops accounting for a given io size.
2013-08-12 16:53 [Qemu-devel] [PATCH V5 0/5] Continuous Leaky Bucket Throttling Benoît Canet
` (3 preceding siblings ...)
2013-08-12 16:53 ` [Qemu-devel] [PATCH V5 4/5] block: Add support for throttling burst max in QMP and the command line Benoît Canet
@ 2013-08-12 16:53 ` Benoît Canet
2013-08-14 9:48 ` Fam Zheng
2013-08-16 12:10 ` Stefan Hajnoczi
2013-08-16 12:14 ` [Qemu-devel] [PATCH V5 0/5] Continuous Leaky Bucket Throttling Stefan Hajnoczi
5 siblings, 2 replies; 17+ messages in thread
From: Benoît Canet @ 2013-08-12 16:53 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha
This feature can be used in case where users are avoiding the iops limit by
doing jumbo I/Os hammering the storage backend.
Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
block/qapi.c | 3 +++
blockdev.c | 22 +++++++++++++++++++---
hmp.c | 8 ++++++--
qapi-schema.json | 10 ++++++++--
qemu-options.hx | 2 +-
qmp-commands.hx | 8 ++++++--
6 files changed, 43 insertions(+), 10 deletions(-)
diff --git a/block/qapi.c b/block/qapi.c
index 5ba10f4..f98ff64 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -258,6 +258,9 @@ void bdrv_query_info(BlockDriverState *bs,
cfg.buckets[THROTTLE_OPS_WRITE].max;
info->inserted->iops_wr_max =
cfg.buckets[THROTTLE_OPS_WRITE].max;
+
+ info->inserted->has_iops_sector_count = cfg.op_size;
+ info->inserted->iops_sector_count = cfg.op_size;
}
bs0 = bs;
diff --git a/blockdev.c b/blockdev.c
index 22016a2..e2b0ee0 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -500,7 +500,7 @@ static DriveInfo *blockdev_init(QemuOpts *all_opts,
qemu_opt_get_number(opts, "throttling.iops-write-max", 0);
cfg.unit_size = BDRV_SECTOR_SIZE;
- cfg.op_size = 0;
+ cfg.op_size = qemu_opt_get_number(opts, "throttling.iops-sector-count", 0);
if (!check_throttle_config(&cfg, &error)) {
error_report("%s", error_get_pretty(error));
@@ -786,6 +786,9 @@ DriveInfo *drive_init(QemuOpts *all_opts, BlockInterfaceType block_default_type)
qemu_opt_rename(all_opts, "bps_rd_max", "throttling.bps-read-max");
qemu_opt_rename(all_opts, "bps_wr_max", "throttling.bps-write-max");
+ qemu_opt_rename(all_opts,
+ "iops_sector_count", "throttling.iops-sector-count");
+
qemu_opt_rename(all_opts, "readonly", "read-only");
value = qemu_opt_get(all_opts, "cache");
@@ -1285,7 +1288,9 @@ void qmp_block_set_io_throttle(const char *device, int64_t bps, int64_t bps_rd,
bool has_iops_rd_max,
int64_t iops_rd_max,
bool has_iops_wr_max,
- int64_t iops_wr_max, Error **errp)
+ int64_t iops_wr_max,
+ bool has_iops_sector_count,
+ int64_t iops_sector_count, Error **errp)
{
ThrottleConfig cfg;
BlockDriverState *bs;
@@ -1325,7 +1330,10 @@ void qmp_block_set_io_throttle(const char *device, int64_t bps, int64_t bps_rd,
}
cfg.unit_size = BDRV_SECTOR_SIZE;
- cfg.op_size = 0;
+
+ if (has_iops_sector_count) {
+ cfg.op_size = iops_sector_count;
+ }
if (!check_throttle_config(&cfg, errp)) {
return;
@@ -2051,6 +2059,10 @@ QemuOptsList qemu_common_drive_opts = {
.type = QEMU_OPT_NUMBER,
.help = "total bytes write burst",
},{
+ .name = "throttling.iops-sector-count",
+ .type = QEMU_OPT_NUMBER,
+ .help = "when limiting by iops max size of an I/O in sector",
+ },{
.name = "copy-on-read",
.type = QEMU_OPT_BOOL,
.help = "copy read data from backing file into image file",
@@ -2197,6 +2209,10 @@ QemuOptsList qemu_old_drive_opts = {
.type = QEMU_OPT_NUMBER,
.help = "total bytes write burst",
},{
+ .name = "iops_sector_count",
+ .type = QEMU_OPT_NUMBER,
+ .help = "when limiting by iops max size of an I/O in sector",
+ },{
.name = "copy-on-read",
.type = QEMU_OPT_BOOL,
.help = "copy read data from backing file into image file",
diff --git a/hmp.c b/hmp.c
index e0c387c..01f7685 100644
--- a/hmp.c
+++ b/hmp.c
@@ -351,7 +351,8 @@ void hmp_info_block(Monitor *mon, const QDict *qdict)
" iops_wr=%" PRId64
" iops_max=%" PRId64
" iops_rd_max=%" PRId64
- " iops_wr_max=%" PRId64 "\n",
+ " iops_wr_max=%" PRId64
+ " iops_sector_count=%" PRId64 "\n",
info->value->inserted->bps,
info->value->inserted->bps_rd,
info->value->inserted->bps_wr,
@@ -363,7 +364,8 @@ void hmp_info_block(Monitor *mon, const QDict *qdict)
info->value->inserted->iops_wr,
info->value->inserted->iops_max,
info->value->inserted->iops_rd_max,
- info->value->inserted->iops_wr_max);
+ info->value->inserted->iops_wr_max,
+ info->value->inserted->iops_sector_count);
} else {
monitor_printf(mon, " [not inserted]");
}
@@ -1124,6 +1126,8 @@ void hmp_block_set_io_throttle(Monitor *mon, const QDict *qdict)
false,
0,
false,
+ 0,
+ false, /* No default I/O size */
0, &err);
hmp_handle_error(mon, &err);
}
diff --git a/qapi-schema.json b/qapi-schema.json
index 5e5461e..c7b1d5e 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -797,6 +797,8 @@
#
# @iops_wr_max: #optional write I/O operations max (Since 1.7)
#
+# @iops_sector_count: #optional an I/O size in sector (Since 1.6)
+#
# Since: 0.14.0
#
# Notes: This interface is only found in @BlockInfo.
@@ -810,7 +812,8 @@
'image': 'ImageInfo',
'*bps_max': 'int', '*bps_rd_max': 'int',
'*bps_wr_max': 'int', '*iops_max': 'int',
- '*iops_rd_max': 'int', '*iops_wr_max': 'int' }}
+ '*iops_rd_max': 'int', '*iops_wr_max': 'int',
+ '*iops_sector_count': 'int' } }
##
# @BlockDeviceIoStatus:
@@ -2201,6 +2204,8 @@
#
# @iops_wr_max: #optional write I/O operations max (Since 1.7)
#
+# @iops_sector_count: #optional an I/O size in sector (Since 1.6)
+#
# Returns: Nothing on success
# If @device is not a valid block device, DeviceNotFound
#
@@ -2211,7 +2216,8 @@
'iops': 'int', 'iops_rd': 'int', 'iops_wr': 'int',
'*bps_max': 'int', '*bps_rd_max': 'int',
'*bps_wr_max': 'int', '*iops_max': 'int',
- '*iops_rd_max': 'int', '*iops_wr_max': 'int' }}
+ '*iops_rd_max': 'int', '*iops_wr_max': 'int',
+ '*iops_sector_count': 'int' }}
##
# @block-stream:
diff --git a/qemu-options.hx b/qemu-options.hx
index 3aa8f9e..c539dd0 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -411,7 +411,7 @@ DEF("drive", HAS_ARG, QEMU_OPTION_drive,
" [,readonly=on|off][,copy-on-read=on|off]\n"
" [[,bps=b]|[[,bps_rd=r][,bps_wr=w]]][[,iops=i]|[[,iops_rd=r]"
" [,iops_wr=w][,bps_max=bt]|[[,bps_rd_max=rt][,bps_wr_max=wt]]]"
- " [[,iops_max=it]|[[,iops_rd_max=rt][,iops_wr_max=wt]]\n"
+ " [[,iops_max=it]|[[,iops_rd_max=rt][,iops_wr_max=wt][,iops_sector_count=cnt]]\n"
" use 'file' as a drive image\n", QEMU_ARCH_ALL)
STEXI
@item -drive @var{option}[,@var{option}[,@var{option}[,...]]]
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 1610831..bcd255f 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -1389,7 +1389,7 @@ EQMP
{
.name = "block_set_io_throttle",
- .args_type = "device:B,bps:l,bps_rd:l,bps_wr:l,iops:l,iops_rd:l,iops_wr:l,bps_max:l?,bps_rd_max:l?,bps_wr_max:l?,iops_max:l?,iops_rd_max:l?,iops_wr_max:l?",
+ .args_type = "device:B,bps:l,bps_rd:l,bps_wr:l,iops:l,iops_rd:l,iops_wr:l,bps_max:l?,bps_rd_max:l?,bps_wr_max:l?,iops_max:l?,iops_rd_max:l?,iops_wr_max:l?,iops_sector_count:l?",
.mhandler.cmd_new = qmp_marshal_input_block_set_io_throttle,
},
@@ -1414,6 +1414,7 @@ Arguments:
- "iops_max": total I/O operations max(json-int)
- "iops_rd_max": read I/O operations max(json-int)
- "iops_wr_max": write I/O operations max(json-int)
+- "iops_sector_count": I/O sector count when limiting(json-int)
Example:
@@ -1429,7 +1430,8 @@ Example:
"bps_wr_max": "0",
"iops_max": "0",
"iops_rd_max": "0",
- "iops_wr_max": "0" } }
+ "iops_wr_max": "0",
+ "iops_sector_count": "0" } }
<- { "return": {} }
EQMP
@@ -1776,6 +1778,7 @@ Each json-object contain the following:
- "iops_max": total I/O operations max(json-int)
- "iops_rd_max": read I/O operations max(json-int)
- "iops_wr_max": write I/O operations max(json-int)
+ - "iops_sector_count": I/O sector count when limiting(json-int)
- "image": the detail of the image, it is a json-object containing
the following:
- "filename": image file name (json-string)
@@ -1851,6 +1854,7 @@ Example:
"iops_max": "0",
"iops_rd_max": "0",
"iops_wr_max": "0",
+ "iops_sector_count": "0",
"image":{
"filename":"disks/test.qcow2",
"format":"qcow2",
--
1.7.10.4
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [Qemu-devel] [PATCH V5 5/5] block: Add iops_sector_count to do the iops accounting for a given io size.
2013-08-12 16:53 ` [Qemu-devel] [PATCH V5 5/5] block: Add iops_sector_count to do the iops accounting for a given io size Benoît Canet
@ 2013-08-14 9:48 ` Fam Zheng
2013-08-14 18:31 ` Benoît Canet
2013-08-16 12:10 ` Stefan Hajnoczi
1 sibling, 1 reply; 17+ messages in thread
From: Fam Zheng @ 2013-08-14 9:48 UTC (permalink / raw)
To: Benoît Canet; +Cc: kwolf, pbonzini, qemu-devel, stefanha
On Mon, 08/12 18:53, Benoît Canet wrote:
> This feature can be used in case where users are avoiding the iops limit by
> doing jumbo I/Os hammering the storage backend.
>
You are accounting io ops by the op size:
(unit = size / iops_sector_count), which is equivelant to bps. So I'm
still not understanding why can't such jumbo IO be throttled by bps
limits.
> Signed-off-by: Benoit Canet <benoit@irqsave.net>
> ---
> block/qapi.c | 3 +++
> blockdev.c | 22 +++++++++++++++++++---
> hmp.c | 8 ++++++--
> qapi-schema.json | 10 ++++++++--
> qemu-options.hx | 2 +-
> qmp-commands.hx | 8 ++++++--
> 6 files changed, 43 insertions(+), 10 deletions(-)
>
> diff --git a/block/qapi.c b/block/qapi.c
> index 5ba10f4..f98ff64 100644
> --- a/block/qapi.c
> +++ b/block/qapi.c
> @@ -258,6 +258,9 @@ void bdrv_query_info(BlockDriverState *bs,
> cfg.buckets[THROTTLE_OPS_WRITE].max;
> info->inserted->iops_wr_max =
> cfg.buckets[THROTTLE_OPS_WRITE].max;
> +
> + info->inserted->has_iops_sector_count = cfg.op_size;
> + info->inserted->iops_sector_count = cfg.op_size;
> }
>
> bs0 = bs;
> diff --git a/blockdev.c b/blockdev.c
> index 22016a2..e2b0ee0 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -500,7 +500,7 @@ static DriveInfo *blockdev_init(QemuOpts *all_opts,
> qemu_opt_get_number(opts, "throttling.iops-write-max", 0);
>
> cfg.unit_size = BDRV_SECTOR_SIZE;
> - cfg.op_size = 0;
> + cfg.op_size = qemu_opt_get_number(opts, "throttling.iops-sector-count", 0);
>
> if (!check_throttle_config(&cfg, &error)) {
> error_report("%s", error_get_pretty(error));
> @@ -786,6 +786,9 @@ DriveInfo *drive_init(QemuOpts *all_opts, BlockInterfaceType block_default_type)
> qemu_opt_rename(all_opts, "bps_rd_max", "throttling.bps-read-max");
> qemu_opt_rename(all_opts, "bps_wr_max", "throttling.bps-write-max");
>
> + qemu_opt_rename(all_opts,
> + "iops_sector_count", "throttling.iops-sector-count");
> +
> qemu_opt_rename(all_opts, "readonly", "read-only");
>
> value = qemu_opt_get(all_opts, "cache");
> @@ -1285,7 +1288,9 @@ void qmp_block_set_io_throttle(const char *device, int64_t bps, int64_t bps_rd,
> bool has_iops_rd_max,
> int64_t iops_rd_max,
> bool has_iops_wr_max,
> - int64_t iops_wr_max, Error **errp)
> + int64_t iops_wr_max,
> + bool has_iops_sector_count,
> + int64_t iops_sector_count, Error **errp)
> {
> ThrottleConfig cfg;
> BlockDriverState *bs;
> @@ -1325,7 +1330,10 @@ void qmp_block_set_io_throttle(const char *device, int64_t bps, int64_t bps_rd,
> }
>
> cfg.unit_size = BDRV_SECTOR_SIZE;
> - cfg.op_size = 0;
> +
> + if (has_iops_sector_count) {
> + cfg.op_size = iops_sector_count;
> + }
>
> if (!check_throttle_config(&cfg, errp)) {
> return;
> @@ -2051,6 +2059,10 @@ QemuOptsList qemu_common_drive_opts = {
> .type = QEMU_OPT_NUMBER,
> .help = "total bytes write burst",
> },{
> + .name = "throttling.iops-sector-count",
> + .type = QEMU_OPT_NUMBER,
> + .help = "when limiting by iops max size of an I/O in sector",
> + },{
> .name = "copy-on-read",
> .type = QEMU_OPT_BOOL,
> .help = "copy read data from backing file into image file",
> @@ -2197,6 +2209,10 @@ QemuOptsList qemu_old_drive_opts = {
> .type = QEMU_OPT_NUMBER,
> .help = "total bytes write burst",
> },{
> + .name = "iops_sector_count",
> + .type = QEMU_OPT_NUMBER,
> + .help = "when limiting by iops max size of an I/O in sector",
> + },{
> .name = "copy-on-read",
> .type = QEMU_OPT_BOOL,
> .help = "copy read data from backing file into image file",
> diff --git a/hmp.c b/hmp.c
> index e0c387c..01f7685 100644
> --- a/hmp.c
> +++ b/hmp.c
> @@ -351,7 +351,8 @@ void hmp_info_block(Monitor *mon, const QDict *qdict)
> " iops_wr=%" PRId64
> " iops_max=%" PRId64
> " iops_rd_max=%" PRId64
> - " iops_wr_max=%" PRId64 "\n",
> + " iops_wr_max=%" PRId64
> + " iops_sector_count=%" PRId64 "\n",
> info->value->inserted->bps,
> info->value->inserted->bps_rd,
> info->value->inserted->bps_wr,
> @@ -363,7 +364,8 @@ void hmp_info_block(Monitor *mon, const QDict *qdict)
> info->value->inserted->iops_wr,
> info->value->inserted->iops_max,
> info->value->inserted->iops_rd_max,
> - info->value->inserted->iops_wr_max);
> + info->value->inserted->iops_wr_max,
> + info->value->inserted->iops_sector_count);
> } else {
> monitor_printf(mon, " [not inserted]");
> }
> @@ -1124,6 +1126,8 @@ void hmp_block_set_io_throttle(Monitor *mon, const QDict *qdict)
> false,
> 0,
> false,
> + 0,
> + false, /* No default I/O size */
> 0, &err);
> hmp_handle_error(mon, &err);
> }
> diff --git a/qapi-schema.json b/qapi-schema.json
> index 5e5461e..c7b1d5e 100644
> --- a/qapi-schema.json
> +++ b/qapi-schema.json
> @@ -797,6 +797,8 @@
> #
> # @iops_wr_max: #optional write I/O operations max (Since 1.7)
> #
> +# @iops_sector_count: #optional an I/O size in sector (Since 1.6)
> +#
> # Since: 0.14.0
> #
> # Notes: This interface is only found in @BlockInfo.
> @@ -810,7 +812,8 @@
> 'image': 'ImageInfo',
> '*bps_max': 'int', '*bps_rd_max': 'int',
> '*bps_wr_max': 'int', '*iops_max': 'int',
> - '*iops_rd_max': 'int', '*iops_wr_max': 'int' }}
> + '*iops_rd_max': 'int', '*iops_wr_max': 'int',
> + '*iops_sector_count': 'int' } }
>
> ##
> # @BlockDeviceIoStatus:
> @@ -2201,6 +2204,8 @@
> #
> # @iops_wr_max: #optional write I/O operations max (Since 1.7)
> #
> +# @iops_sector_count: #optional an I/O size in sector (Since 1.6)
> +#
> # Returns: Nothing on success
> # If @device is not a valid block device, DeviceNotFound
> #
> @@ -2211,7 +2216,8 @@
> 'iops': 'int', 'iops_rd': 'int', 'iops_wr': 'int',
> '*bps_max': 'int', '*bps_rd_max': 'int',
> '*bps_wr_max': 'int', '*iops_max': 'int',
> - '*iops_rd_max': 'int', '*iops_wr_max': 'int' }}
> + '*iops_rd_max': 'int', '*iops_wr_max': 'int',
> + '*iops_sector_count': 'int' }}
>
> ##
> # @block-stream:
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 3aa8f9e..c539dd0 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -411,7 +411,7 @@ DEF("drive", HAS_ARG, QEMU_OPTION_drive,
> " [,readonly=on|off][,copy-on-read=on|off]\n"
> " [[,bps=b]|[[,bps_rd=r][,bps_wr=w]]][[,iops=i]|[[,iops_rd=r]"
> " [,iops_wr=w][,bps_max=bt]|[[,bps_rd_max=rt][,bps_wr_max=wt]]]"
> - " [[,iops_max=it]|[[,iops_rd_max=rt][,iops_wr_max=wt]]\n"
> + " [[,iops_max=it]|[[,iops_rd_max=rt][,iops_wr_max=wt][,iops_sector_count=cnt]]\n"
> " use 'file' as a drive image\n", QEMU_ARCH_ALL)
> STEXI
> @item -drive @var{option}[,@var{option}[,@var{option}[,...]]]
> diff --git a/qmp-commands.hx b/qmp-commands.hx
> index 1610831..bcd255f 100644
> --- a/qmp-commands.hx
> +++ b/qmp-commands.hx
> @@ -1389,7 +1389,7 @@ EQMP
>
> {
> .name = "block_set_io_throttle",
> - .args_type = "device:B,bps:l,bps_rd:l,bps_wr:l,iops:l,iops_rd:l,iops_wr:l,bps_max:l?,bps_rd_max:l?,bps_wr_max:l?,iops_max:l?,iops_rd_max:l?,iops_wr_max:l?",
> + .args_type = "device:B,bps:l,bps_rd:l,bps_wr:l,iops:l,iops_rd:l,iops_wr:l,bps_max:l?,bps_rd_max:l?,bps_wr_max:l?,iops_max:l?,iops_rd_max:l?,iops_wr_max:l?,iops_sector_count:l?",
> .mhandler.cmd_new = qmp_marshal_input_block_set_io_throttle,
> },
>
> @@ -1414,6 +1414,7 @@ Arguments:
> - "iops_max": total I/O operations max(json-int)
> - "iops_rd_max": read I/O operations max(json-int)
> - "iops_wr_max": write I/O operations max(json-int)
> +- "iops_sector_count": I/O sector count when limiting(json-int)
>
> Example:
>
> @@ -1429,7 +1430,8 @@ Example:
> "bps_wr_max": "0",
> "iops_max": "0",
> "iops_rd_max": "0",
> - "iops_wr_max": "0" } }
> + "iops_wr_max": "0",
> + "iops_sector_count": "0" } }
> <- { "return": {} }
>
> EQMP
> @@ -1776,6 +1778,7 @@ Each json-object contain the following:
> - "iops_max": total I/O operations max(json-int)
> - "iops_rd_max": read I/O operations max(json-int)
> - "iops_wr_max": write I/O operations max(json-int)
> + - "iops_sector_count": I/O sector count when limiting(json-int)
> - "image": the detail of the image, it is a json-object containing
> the following:
> - "filename": image file name (json-string)
> @@ -1851,6 +1854,7 @@ Example:
> "iops_max": "0",
> "iops_rd_max": "0",
> "iops_wr_max": "0",
> + "iops_sector_count": "0",
> "image":{
> "filename":"disks/test.qcow2",
> "format":"qcow2",
> --
> 1.7.10.4
>
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Qemu-devel] [PATCH V5 5/5] block: Add iops_sector_count to do the iops accounting for a given io size.
2013-08-14 9:48 ` Fam Zheng
@ 2013-08-14 18:31 ` Benoît Canet
0 siblings, 0 replies; 17+ messages in thread
From: Benoît Canet @ 2013-08-14 18:31 UTC (permalink / raw)
To: Fam Zheng; +Cc: kwolf, pbonzini, qemu-devel, stefanha
Le Wednesday 14 Aug 2013 à 17:48:46 (+0800), Fam Zheng a écrit :
> On Mon, 08/12 18:53, Benoît Canet wrote:
> > This feature can be used in case where users are avoiding the iops limit by
> > doing jumbo I/Os hammering the storage backend.
> >
> You are accounting io ops by the op size:
> (unit = size / iops_sector_count), which is equivelant to bps. So I'm
> still not understanding why can't such jumbo IO be throttled by bps
> limits.
I am aiming at providing to the users an amazon like feature set.
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AmazonEBS.html
>From the provisioned volume section:
The read and write operations have a block size of 16 KB or less. If the I/O
increases above 16 KB, the IOPS delivered drop in proportion to the increase in
the size of the I/O. For example, a 1000 IOPS volume can deliver 1000 16 KB
writes per second, 500 32 KB writes per second, or 250 64 KB writes per second.
Best regards
Benoît
>
> > Signed-off-by: Benoit Canet <benoit@irqsave.net>
> > ---
> > block/qapi.c | 3 +++
> > blockdev.c | 22 +++++++++++++++++++---
> > hmp.c | 8 ++++++--
> > qapi-schema.json | 10 ++++++++--
> > qemu-options.hx | 2 +-
> > qmp-commands.hx | 8 ++++++--
> > 6 files changed, 43 insertions(+), 10 deletions(-)
> >
> > diff --git a/block/qapi.c b/block/qapi.c
> > index 5ba10f4..f98ff64 100644
> > --- a/block/qapi.c
> > +++ b/block/qapi.c
> > @@ -258,6 +258,9 @@ void bdrv_query_info(BlockDriverState *bs,
> > cfg.buckets[THROTTLE_OPS_WRITE].max;
> > info->inserted->iops_wr_max =
> > cfg.buckets[THROTTLE_OPS_WRITE].max;
> > +
> > + info->inserted->has_iops_sector_count = cfg.op_size;
> > + info->inserted->iops_sector_count = cfg.op_size;
> > }
> >
> > bs0 = bs;
> > diff --git a/blockdev.c b/blockdev.c
> > index 22016a2..e2b0ee0 100644
> > --- a/blockdev.c
> > +++ b/blockdev.c
> > @@ -500,7 +500,7 @@ static DriveInfo *blockdev_init(QemuOpts *all_opts,
> > qemu_opt_get_number(opts, "throttling.iops-write-max", 0);
> >
> > cfg.unit_size = BDRV_SECTOR_SIZE;
> > - cfg.op_size = 0;
> > + cfg.op_size = qemu_opt_get_number(opts, "throttling.iops-sector-count", 0);
> >
> > if (!check_throttle_config(&cfg, &error)) {
> > error_report("%s", error_get_pretty(error));
> > @@ -786,6 +786,9 @@ DriveInfo *drive_init(QemuOpts *all_opts, BlockInterfaceType block_default_type)
> > qemu_opt_rename(all_opts, "bps_rd_max", "throttling.bps-read-max");
> > qemu_opt_rename(all_opts, "bps_wr_max", "throttling.bps-write-max");
> >
> > + qemu_opt_rename(all_opts,
> > + "iops_sector_count", "throttling.iops-sector-count");
> > +
> > qemu_opt_rename(all_opts, "readonly", "read-only");
> >
> > value = qemu_opt_get(all_opts, "cache");
> > @@ -1285,7 +1288,9 @@ void qmp_block_set_io_throttle(const char *device, int64_t bps, int64_t bps_rd,
> > bool has_iops_rd_max,
> > int64_t iops_rd_max,
> > bool has_iops_wr_max,
> > - int64_t iops_wr_max, Error **errp)
> > + int64_t iops_wr_max,
> > + bool has_iops_sector_count,
> > + int64_t iops_sector_count, Error **errp)
> > {
> > ThrottleConfig cfg;
> > BlockDriverState *bs;
> > @@ -1325,7 +1330,10 @@ void qmp_block_set_io_throttle(const char *device, int64_t bps, int64_t bps_rd,
> > }
> >
> > cfg.unit_size = BDRV_SECTOR_SIZE;
> > - cfg.op_size = 0;
> > +
> > + if (has_iops_sector_count) {
> > + cfg.op_size = iops_sector_count;
> > + }
> >
> > if (!check_throttle_config(&cfg, errp)) {
> > return;
> > @@ -2051,6 +2059,10 @@ QemuOptsList qemu_common_drive_opts = {
> > .type = QEMU_OPT_NUMBER,
> > .help = "total bytes write burst",
> > },{
> > + .name = "throttling.iops-sector-count",
> > + .type = QEMU_OPT_NUMBER,
> > + .help = "when limiting by iops max size of an I/O in sector",
> > + },{
> > .name = "copy-on-read",
> > .type = QEMU_OPT_BOOL,
> > .help = "copy read data from backing file into image file",
> > @@ -2197,6 +2209,10 @@ QemuOptsList qemu_old_drive_opts = {
> > .type = QEMU_OPT_NUMBER,
> > .help = "total bytes write burst",
> > },{
> > + .name = "iops_sector_count",
> > + .type = QEMU_OPT_NUMBER,
> > + .help = "when limiting by iops max size of an I/O in sector",
> > + },{
> > .name = "copy-on-read",
> > .type = QEMU_OPT_BOOL,
> > .help = "copy read data from backing file into image file",
> > diff --git a/hmp.c b/hmp.c
> > index e0c387c..01f7685 100644
> > --- a/hmp.c
> > +++ b/hmp.c
> > @@ -351,7 +351,8 @@ void hmp_info_block(Monitor *mon, const QDict *qdict)
> > " iops_wr=%" PRId64
> > " iops_max=%" PRId64
> > " iops_rd_max=%" PRId64
> > - " iops_wr_max=%" PRId64 "\n",
> > + " iops_wr_max=%" PRId64
> > + " iops_sector_count=%" PRId64 "\n",
> > info->value->inserted->bps,
> > info->value->inserted->bps_rd,
> > info->value->inserted->bps_wr,
> > @@ -363,7 +364,8 @@ void hmp_info_block(Monitor *mon, const QDict *qdict)
> > info->value->inserted->iops_wr,
> > info->value->inserted->iops_max,
> > info->value->inserted->iops_rd_max,
> > - info->value->inserted->iops_wr_max);
> > + info->value->inserted->iops_wr_max,
> > + info->value->inserted->iops_sector_count);
> > } else {
> > monitor_printf(mon, " [not inserted]");
> > }
> > @@ -1124,6 +1126,8 @@ void hmp_block_set_io_throttle(Monitor *mon, const QDict *qdict)
> > false,
> > 0,
> > false,
> > + 0,
> > + false, /* No default I/O size */
> > 0, &err);
> > hmp_handle_error(mon, &err);
> > }
> > diff --git a/qapi-schema.json b/qapi-schema.json
> > index 5e5461e..c7b1d5e 100644
> > --- a/qapi-schema.json
> > +++ b/qapi-schema.json
> > @@ -797,6 +797,8 @@
> > #
> > # @iops_wr_max: #optional write I/O operations max (Since 1.7)
> > #
> > +# @iops_sector_count: #optional an I/O size in sector (Since 1.6)
> > +#
> > # Since: 0.14.0
> > #
> > # Notes: This interface is only found in @BlockInfo.
> > @@ -810,7 +812,8 @@
> > 'image': 'ImageInfo',
> > '*bps_max': 'int', '*bps_rd_max': 'int',
> > '*bps_wr_max': 'int', '*iops_max': 'int',
> > - '*iops_rd_max': 'int', '*iops_wr_max': 'int' }}
> > + '*iops_rd_max': 'int', '*iops_wr_max': 'int',
> > + '*iops_sector_count': 'int' } }
> >
> > ##
> > # @BlockDeviceIoStatus:
> > @@ -2201,6 +2204,8 @@
> > #
> > # @iops_wr_max: #optional write I/O operations max (Since 1.7)
> > #
> > +# @iops_sector_count: #optional an I/O size in sector (Since 1.6)
> > +#
> > # Returns: Nothing on success
> > # If @device is not a valid block device, DeviceNotFound
> > #
> > @@ -2211,7 +2216,8 @@
> > 'iops': 'int', 'iops_rd': 'int', 'iops_wr': 'int',
> > '*bps_max': 'int', '*bps_rd_max': 'int',
> > '*bps_wr_max': 'int', '*iops_max': 'int',
> > - '*iops_rd_max': 'int', '*iops_wr_max': 'int' }}
> > + '*iops_rd_max': 'int', '*iops_wr_max': 'int',
> > + '*iops_sector_count': 'int' }}
> >
> > ##
> > # @block-stream:
> > diff --git a/qemu-options.hx b/qemu-options.hx
> > index 3aa8f9e..c539dd0 100644
> > --- a/qemu-options.hx
> > +++ b/qemu-options.hx
> > @@ -411,7 +411,7 @@ DEF("drive", HAS_ARG, QEMU_OPTION_drive,
> > " [,readonly=on|off][,copy-on-read=on|off]\n"
> > " [[,bps=b]|[[,bps_rd=r][,bps_wr=w]]][[,iops=i]|[[,iops_rd=r]"
> > " [,iops_wr=w][,bps_max=bt]|[[,bps_rd_max=rt][,bps_wr_max=wt]]]"
> > - " [[,iops_max=it]|[[,iops_rd_max=rt][,iops_wr_max=wt]]\n"
> > + " [[,iops_max=it]|[[,iops_rd_max=rt][,iops_wr_max=wt][,iops_sector_count=cnt]]\n"
> > " use 'file' as a drive image\n", QEMU_ARCH_ALL)
> > STEXI
> > @item -drive @var{option}[,@var{option}[,@var{option}[,...]]]
> > diff --git a/qmp-commands.hx b/qmp-commands.hx
> > index 1610831..bcd255f 100644
> > --- a/qmp-commands.hx
> > +++ b/qmp-commands.hx
> > @@ -1389,7 +1389,7 @@ EQMP
> >
> > {
> > .name = "block_set_io_throttle",
> > - .args_type = "device:B,bps:l,bps_rd:l,bps_wr:l,iops:l,iops_rd:l,iops_wr:l,bps_max:l?,bps_rd_max:l?,bps_wr_max:l?,iops_max:l?,iops_rd_max:l?,iops_wr_max:l?",
> > + .args_type = "device:B,bps:l,bps_rd:l,bps_wr:l,iops:l,iops_rd:l,iops_wr:l,bps_max:l?,bps_rd_max:l?,bps_wr_max:l?,iops_max:l?,iops_rd_max:l?,iops_wr_max:l?,iops_sector_count:l?",
> > .mhandler.cmd_new = qmp_marshal_input_block_set_io_throttle,
> > },
> >
> > @@ -1414,6 +1414,7 @@ Arguments:
> > - "iops_max": total I/O operations max(json-int)
> > - "iops_rd_max": read I/O operations max(json-int)
> > - "iops_wr_max": write I/O operations max(json-int)
> > +- "iops_sector_count": I/O sector count when limiting(json-int)
> >
> > Example:
> >
> > @@ -1429,7 +1430,8 @@ Example:
> > "bps_wr_max": "0",
> > "iops_max": "0",
> > "iops_rd_max": "0",
> > - "iops_wr_max": "0" } }
> > + "iops_wr_max": "0",
> > + "iops_sector_count": "0" } }
> > <- { "return": {} }
> >
> > EQMP
> > @@ -1776,6 +1778,7 @@ Each json-object contain the following:
> > - "iops_max": total I/O operations max(json-int)
> > - "iops_rd_max": read I/O operations max(json-int)
> > - "iops_wr_max": write I/O operations max(json-int)
> > + - "iops_sector_count": I/O sector count when limiting(json-int)
> > - "image": the detail of the image, it is a json-object containing
> > the following:
> > - "filename": image file name (json-string)
> > @@ -1851,6 +1854,7 @@ Example:
> > "iops_max": "0",
> > "iops_rd_max": "0",
> > "iops_wr_max": "0",
> > + "iops_sector_count": "0",
> > "image":{
> > "filename":"disks/test.qcow2",
> > "format":"qcow2",
> > --
> > 1.7.10.4
> >
> >
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Qemu-devel] [PATCH V5 5/5] block: Add iops_sector_count to do the iops accounting for a given io size.
2013-08-12 16:53 ` [Qemu-devel] [PATCH V5 5/5] block: Add iops_sector_count to do the iops accounting for a given io size Benoît Canet
2013-08-14 9:48 ` Fam Zheng
@ 2013-08-16 12:10 ` Stefan Hajnoczi
1 sibling, 0 replies; 17+ messages in thread
From: Stefan Hajnoczi @ 2013-08-16 12:10 UTC (permalink / raw)
To: Benoît Canet; +Cc: kwolf, pbonzini, qemu-devel, stefanha
On Mon, Aug 12, 2013 at 06:53:16PM +0200, Benoît Canet wrote:
> diff --git a/block/qapi.c b/block/qapi.c
> index 5ba10f4..f98ff64 100644
> --- a/block/qapi.c
> +++ b/block/qapi.c
> @@ -258,6 +258,9 @@ void bdrv_query_info(BlockDriverState *bs,
> cfg.buckets[THROTTLE_OPS_WRITE].max;
> info->inserted->iops_wr_max =
> cfg.buckets[THROTTLE_OPS_WRITE].max;
> +
> + info->inserted->has_iops_sector_count = cfg.op_size;
> + info->inserted->iops_sector_count = cfg.op_size;
Why introduce the concept of sectors here? Throttling uses bytes and
that will help prevent confusion if the user has 4 KB sector disks, for
example, instead of 512 bytes.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Qemu-devel] [PATCH V5 0/5] Continuous Leaky Bucket Throttling
2013-08-12 16:53 [Qemu-devel] [PATCH V5 0/5] Continuous Leaky Bucket Throttling Benoît Canet
` (4 preceding siblings ...)
2013-08-12 16:53 ` [Qemu-devel] [PATCH V5 5/5] block: Add iops_sector_count to do the iops accounting for a given io size Benoît Canet
@ 2013-08-16 12:14 ` Stefan Hajnoczi
5 siblings, 0 replies; 17+ messages in thread
From: Stefan Hajnoczi @ 2013-08-16 12:14 UTC (permalink / raw)
To: Benoît Canet; +Cc: kwolf, pbonzini, qemu-devel, stefanha
On Mon, Aug 12, 2013 at 06:53:11PM +0200, Benoît Canet wrote:
> This patchset implement continous leaky bucket throttling.
>
> It use two requests queue to enable to do silly unbalanced throttling like
> block_set_io_throttle 0 0 0 0 6000 1
>
> It use two timer to get the timer callbacks and the throttle.c code simple
>
> It add unit tests.
>
> The throttling core is pretty solid and the surrouding of the patchset needs
> polish. (new options ...)
Looks promising. I still need to try it out and get familiar with how
bursting ("max") works but I've left comments on the patches for now.
Stefan
^ permalink raw reply [flat|nested] 17+ messages in thread