* [Qemu-devel] [PATCH V3 for-1.6 0/5] Leaky bucket throttling and features
@ 2013-07-23 16:29 Benoît Canet
2013-07-23 16:29 ` [Qemu-devel] [PATCH V3 for-1.6 1/5] block: Repair the throttling code Benoît Canet
` (6 more replies)
0 siblings, 7 replies; 20+ messages in thread
From: Benoît Canet @ 2013-07-23 16:29 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, Benoît Canet, stefanha
The first patch fixes the throttling which was broken by a previous commit.
The next patch replace the existing throttling algorithm by the well described
leaky bucket algorithm.
Third patch implement bursting by adding *_threshold parameters to
qmp_block_set_io_throttle.
The last one allow to define the max size of an io when throttling by iops via
iops_sector_count to avoid vm users cheating on the iops limit.
The last patch adds a metric reflecting how much the I/O are throttled.
since v1:
Add throttling percentage metric [Benoît]
since v2:
Enable timer only during I/O activity [Stefan]
Mark function as coroutine_fn [Stefan]
Guard these function checking they are in a coroutine [Stefan]
Use defines to access buckets [Stefan]
Fix typo [Stefan]
reset throttling metric on iddle [Benoît]
rename invalid to check_io_limit [Stefan]
Benoît Canet (5):
block: Repair the throttling code.
block: Modify the throttling code to implement the leaky bucket
algorithm.
block: Add support for throttling burst threshold in QMP and the
command line.
block: Add iops_sector_count to do the iops accounting for a given io
size.
block: Add throttling percentage metrics.
block.c | 493 ++++++++++++++++++++++++++++-----------------
block/qapi.c | 32 +++
blockdev.c | 174 ++++++++++++++--
hmp.c | 38 +++-
include/block/block_int.h | 18 +-
include/block/coroutine.h | 9 +-
qapi-schema.json | 42 +++-
qemu-coroutine-lock.c | 20 +-
qemu-options.hx | 2 +-
qmp-commands.hx | 34 +++-
10 files changed, 642 insertions(+), 220 deletions(-)
--
1.7.10.4
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Qemu-devel] [PATCH V3 for-1.6 1/5] block: Repair the throttling code.
2013-07-23 16:29 [Qemu-devel] [PATCH V3 for-1.6 0/5] Leaky bucket throttling and features Benoît Canet
@ 2013-07-23 16:29 ` Benoît Canet
2013-07-26 19:16 ` Eric Blake
2013-07-23 16:29 ` [Qemu-devel] [PATCH V3 for-1.6 2/5] block: Modify the throttling code to implement the leaky bucket algorithm Benoît Canet
` (5 subsequent siblings)
6 siblings, 1 reply; 20+ messages in thread
From: Benoît Canet @ 2013-07-23 16:29 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, Benoît Canet, stefanha
The throttling code was segfaulting since commit
02ffb504485f0920cfc75a0982a602f824a9a4f4 because some qemu_co_queue_next caller
does not run in a coroutine.
qemu_co_queue_do_restart assume that the caller is a coroutinne.
As suggested by Stefan fix this by entering the coroutine directly.
Also make sure like suggested that qemu_co_queue_next() and
qemu_co_queue_restart_all() can be called only in coroutines.
Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
block.c | 8 ++++----
include/block/coroutine.h | 9 +++++++--
qemu-coroutine-lock.c | 20 ++++++++++++++++++--
3 files changed, 29 insertions(+), 8 deletions(-)
diff --git a/block.c b/block.c
index b560241..dc72643 100644
--- a/block.c
+++ b/block.c
@@ -127,7 +127,8 @@ void bdrv_io_limits_disable(BlockDriverState *bs)
{
bs->io_limits_enabled = false;
- while (qemu_co_queue_next(&bs->throttled_reqs));
+ while (qemu_co_enter_next(&bs->throttled_reqs)) {
+ }
if (bs->block_timer) {
qemu_del_timer(bs->block_timer);
@@ -143,7 +144,7 @@ static void bdrv_block_timer(void *opaque)
{
BlockDriverState *bs = opaque;
- qemu_co_queue_next(&bs->throttled_reqs);
+ qemu_co_enter_next(&bs->throttled_reqs);
}
void bdrv_io_limits_enable(BlockDriverState *bs)
@@ -1445,8 +1446,7 @@ void bdrv_drain_all(void)
* a busy wait.
*/
QTAILQ_FOREACH(bs, &bdrv_states, list) {
- if (!qemu_co_queue_empty(&bs->throttled_reqs)) {
- qemu_co_queue_restart_all(&bs->throttled_reqs);
+ while (qemu_co_enter_next(&bs->throttled_reqs)) {
busy = true;
}
}
diff --git a/include/block/coroutine.h b/include/block/coroutine.h
index 377805a..1f2db3e 100644
--- a/include/block/coroutine.h
+++ b/include/block/coroutine.h
@@ -130,12 +130,17 @@ void coroutine_fn qemu_co_queue_wait_insert_head(CoQueue *queue);
*
* Returns true if a coroutine was restarted, false if the queue is empty.
*/
-bool qemu_co_queue_next(CoQueue *queue);
+bool coroutine_fn qemu_co_queue_next(CoQueue *queue);
/**
* Restarts all coroutines in the CoQueue and leaves the queue empty.
*/
-void qemu_co_queue_restart_all(CoQueue *queue);
+void coroutine_fn qemu_co_queue_restart_all(CoQueue *queue);
+
+/**
+ * Enter the next coroutine in the queue
+ */
+bool qemu_co_enter_next(CoQueue *queue);
/**
* Checks if the CoQueue is empty.
diff --git a/qemu-coroutine-lock.c b/qemu-coroutine-lock.c
index d9fea49..aeb33b9 100644
--- a/qemu-coroutine-lock.c
+++ b/qemu-coroutine-lock.c
@@ -88,16 +88,32 @@ static bool qemu_co_queue_do_restart(CoQueue *queue, bool single)
return true;
}
-bool qemu_co_queue_next(CoQueue *queue)
+bool coroutine_fn qemu_co_queue_next(CoQueue *queue)
{
+ assert(qemu_in_coroutine());
return qemu_co_queue_do_restart(queue, true);
}
-void qemu_co_queue_restart_all(CoQueue *queue)
+void coroutine_fn qemu_co_queue_restart_all(CoQueue *queue)
{
+ assert(qemu_in_coroutine());
qemu_co_queue_do_restart(queue, false);
}
+bool qemu_co_enter_next(CoQueue *queue)
+{
+ Coroutine *next;
+
+ next = QTAILQ_FIRST(&queue->entries);
+ if (!next) {
+ return false;
+ }
+
+ QTAILQ_REMOVE(&queue->entries, next, co_queue_next);
+ qemu_coroutine_enter(next, NULL);
+ return true;
+}
+
bool qemu_co_queue_empty(CoQueue *queue)
{
return (QTAILQ_FIRST(&queue->entries) == NULL);
--
1.7.10.4
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [Qemu-devel] [PATCH V3 for-1.6 2/5] block: Modify the throttling code to implement the leaky bucket algorithm.
2013-07-23 16:29 [Qemu-devel] [PATCH V3 for-1.6 0/5] Leaky bucket throttling and features Benoît Canet
2013-07-23 16:29 ` [Qemu-devel] [PATCH V3 for-1.6 1/5] block: Repair the throttling code Benoît Canet
@ 2013-07-23 16:29 ` Benoît Canet
2013-07-26 3:40 ` Fam Zheng
2013-07-23 16:29 ` [Qemu-devel] [PATCH V3 for-1.6 3/5] block: Add support for throttling burst threshold in QMP and the command line Benoît Canet
` (4 subsequent siblings)
6 siblings, 1 reply; 20+ messages in thread
From: Benoît Canet @ 2013-07-23 16:29 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, Benoît Canet, stefanha
This patch replace the previous algorithm by the well described leaky bucket
algorithm: A bucket is filled by the incoming IOs and a periodic timer decrement
the counter to make the bucket leak. When a given threshold is reached the
bucket is full and the IOs are hold.
In this patch the threshold is set to a default value to make the code behave
like the previous implementation.
In the next patch the threshold will be exposed in QMP to let the user control
the burstiness of the throttling.
Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
block.c | 454 +++++++++++++++++++++++++++------------------
blockdev.c | 71 +++++--
include/block/block_int.h | 15 +-
3 files changed, 339 insertions(+), 201 deletions(-)
diff --git a/block.c b/block.c
index dc72643..f1cd9c0 100644
--- a/block.c
+++ b/block.c
@@ -86,13 +86,6 @@ static void coroutine_fn bdrv_co_do_rw(void *opaque);
static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs,
int64_t sector_num, int nb_sectors);
-static bool bdrv_exceed_bps_limits(BlockDriverState *bs, int nb_sectors,
- bool is_write, double elapsed_time, uint64_t *wait);
-static bool bdrv_exceed_iops_limits(BlockDriverState *bs, bool is_write,
- double elapsed_time, uint64_t *wait);
-static bool bdrv_exceed_io_limits(BlockDriverState *bs, int nb_sectors,
- bool is_write, int64_t *wait);
-
static QTAILQ_HEAD(, BlockDriverState) bdrv_states =
QTAILQ_HEAD_INITIALIZER(bdrv_states);
@@ -101,6 +94,8 @@ static QLIST_HEAD(, BlockDriver) bdrv_drivers =
/* If non-zero, use only whitelisted block drivers */
static int use_bdrv_whitelist;
+/* boolean used to inform the throttling code that a bdrv_drain_all is issued */
+static bool draining;
#ifdef _WIN32
static int is_windows_drive_prefix(const char *filename)
@@ -129,28 +124,170 @@ void bdrv_io_limits_disable(BlockDriverState *bs)
while (qemu_co_enter_next(&bs->throttled_reqs)) {
}
+}
- if (bs->block_timer) {
- qemu_del_timer(bs->block_timer);
- qemu_free_timer(bs->block_timer);
- bs->block_timer = NULL;
+static void bdrv_make_bps_buckets_leak(BlockDriverState *bs, int64_t delta)
+{
+ int64_t *bytes = bs->leaky_buckets.bytes;
+ int64_t read_leak, write_leak;
+
+ /* the limit apply to both reads and writes */
+ if (bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL]) {
+ /* compute half the total leak */
+ int64_t leak = ((bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL] * delta) /
+ NANOSECONDS_PER_SECOND);
+ int remain = leak % 2;
+ leak /= 2;
+
+ /* the read bucket is smaller than half the quantity to leak so take
+ * care adding the leak difference to write leak
+ */
+ if (bytes[BLOCK_IO_LIMIT_READ] <= leak) {
+ read_leak = bytes[BLOCK_IO_LIMIT_READ];
+ write_leak = 2 * leak + remain - bytes[BLOCK_IO_LIMIT_READ];
+ /* symetric case */
+ } else if (bytes[BLOCK_IO_LIMIT_WRITE] <= leak) {
+ write_leak = bytes[BLOCK_IO_LIMIT_WRITE];
+ read_leak = 2 * leak + remain - bytes[BLOCK_IO_LIMIT_WRITE];
+ /* both bucket above leak count use half the total leak for both */
+ } else {
+ write_leak = leak;
+ read_leak = leak + remain;
+ }
+ /* else we consider that limits are separated */
+ } else {
+ read_leak = (bs->io_limits.bps[BLOCK_IO_LIMIT_READ] * delta) /
+ NANOSECONDS_PER_SECOND;
+ write_leak = (bs->io_limits.bps[BLOCK_IO_LIMIT_WRITE] * delta) /
+ NANOSECONDS_PER_SECOND;
+ }
+
+ /* make the buckets leak */
+ bytes[BLOCK_IO_LIMIT_READ] = MAX(bytes[BLOCK_IO_LIMIT_READ] - read_leak,
+ 0);
+ bytes[BLOCK_IO_LIMIT_WRITE] = MAX(bytes[BLOCK_IO_LIMIT_WRITE] - write_leak,
+ 0);
+}
+
+static void bdrv_make_iops_buckets_leak(BlockDriverState *bs, int64_t delta)
+{
+ double *ios = bs->leaky_buckets.ios;
+ int64_t read_leak, write_leak;
+
+ /* the limit apply to both reads and writes */
+ if (bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL]) {
+ /* compute half the total leak */
+ int64_t leak = ((bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL] * delta) /
+ NANOSECONDS_PER_SECOND);
+ int remain = leak % 2;
+ leak /= 2;
+
+ /* the read bucket is smaller than half the quantity to leak so take
+ * care adding the leak difference to write leak
+ */
+ if (ios[BLOCK_IO_LIMIT_READ] <= leak) {
+ read_leak = ios[BLOCK_IO_LIMIT_READ];
+ write_leak = 2 * leak + remain - ios[BLOCK_IO_LIMIT_READ];
+ /* symetric case */
+ } else if (ios[BLOCK_IO_LIMIT_WRITE] <= leak) {
+ write_leak = ios[BLOCK_IO_LIMIT_WRITE];
+ read_leak = 2 * leak + remain - ios[BLOCK_IO_LIMIT_WRITE];
+ /* both bucket above leak count use half the total leak for both */
+ } else {
+ write_leak = leak;
+ read_leak = leak + remain;
+ }
+ /* else we consider that limits are separated */
+ } else {
+ read_leak = (bs->io_limits.iops[BLOCK_IO_LIMIT_READ] * delta) /
+ NANOSECONDS_PER_SECOND;
+ write_leak = (bs->io_limits.iops[BLOCK_IO_LIMIT_WRITE] * delta) /
+ NANOSECONDS_PER_SECOND;
+ }
+
+ /* make the buckets leak */
+ ios[BLOCK_IO_LIMIT_READ] = MAX(ios[BLOCK_IO_LIMIT_READ] - read_leak, 0);
+ ios[BLOCK_IO_LIMIT_WRITE] = MAX(ios[BLOCK_IO_LIMIT_WRITE] - write_leak, 0);
+}
+
+static void bdrv_leak_if_needed(BlockDriverState *bs)
+{
+ int64_t now;
+ int64_t delta;
+
+ if (!bs->must_leak) {
+ return;
+ }
+
+ bs->must_leak = false;
+
+ now = qemu_get_clock_ns(rt_clock);
+ delta = now - bs->previous_leak;
+ bs->previous_leak = now;
+
+ bdrv_make_bps_buckets_leak(bs, delta);
+ bdrv_make_iops_buckets_leak(bs, delta);
+}
+
+static void bdrv_block_timer_disable(BlockDriverState *bs)
+{
+ if (!bs->block_timer) {
+ return;
}
- bs->slice_start = 0;
- bs->slice_end = 0;
+ qemu_del_timer(bs->block_timer);
+ qemu_free_timer(bs->block_timer);
+ bs->block_timer = NULL;
+}
+
+static bool bdrv_throttling_is_iddle(BlockDriverState *bs)
+{
+ int64_t delta = qemu_get_clock_ns(rt_clock) - bs->previous_leak;
+
+ if (delta < BLOCK_IO_THROTTLE_PERIOD * 2) {
+ return false;
+ }
+
+ /* iddle */
+ return true;
}
+/* This callback is the timer in charge of making the leaky buckets leak */
static void bdrv_block_timer(void *opaque)
{
BlockDriverState *bs = opaque;
+ /* disable throttling time on iddle for economy purpose */
+ if (bdrv_throttling_is_iddle(bs)) {
+ bdrv_block_timer_disable(bs);
+ return;
+ }
+
+ /* rearm the timer */
+ qemu_mod_timer(bs->block_timer,
+ qemu_get_clock_ns(vm_clock) +
+ BLOCK_IO_THROTTLE_PERIOD);
+
+ bs->must_leak = true;
qemu_co_enter_next(&bs->throttled_reqs);
}
+static void bdrv_block_timer_enable(BlockDriverState *bs)
+{
+ if (bs->block_timer) {
+ return;
+ }
+
+ bs->block_timer = qemu_new_timer_ns(vm_clock, bdrv_block_timer, bs);
+ bs->previous_leak = qemu_get_clock_ns(rt_clock);
+ qemu_mod_timer(bs->block_timer,
+ qemu_get_clock_ns(vm_clock) +
+ BLOCK_IO_THROTTLE_PERIOD);
+}
+
void bdrv_io_limits_enable(BlockDriverState *bs)
{
qemu_co_queue_init(&bs->throttled_reqs);
- bs->block_timer = qemu_new_timer_ns(vm_clock, bdrv_block_timer, bs);
bs->io_limits_enabled = true;
}
@@ -165,15 +302,118 @@ bool bdrv_io_limits_enabled(BlockDriverState *bs)
|| io_limits->iops[BLOCK_IO_LIMIT_TOTAL];
}
+/* This function check if the correct bandwith threshold has been exceeded
+ *
+ * @is_write: true if the current IO is a write, false if it's a read
+ * @ret: true if threshold has been exceeded else false
+ */
+static bool bdrv_is_bps_threshold_exceeded(BlockDriverState *bs, bool is_write)
+{
+ /* limit is on total read + write bps : do the sum and compare with total
+ * threshold
+ */
+ if (bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL]) {
+ int64_t bytes = bs->leaky_buckets.bytes[BLOCK_IO_LIMIT_READ] +
+ bs->leaky_buckets.bytes[BLOCK_IO_LIMIT_WRITE];
+ return bs->io_limits.bps_threshold[BLOCK_IO_LIMIT_TOTAL] < bytes;
+ }
+
+ /* check wether the threshold corresponding to the current io type (read,
+ * write) has been exceeded
+ */
+ if (bs->io_limits.bps[is_write]) {
+ return bs->io_limits.bps_threshold[is_write] <
+ bs->leaky_buckets.bytes[is_write];
+ }
+
+ /* no limit */
+ return false;
+}
+
+/* This function check if the correct iops threshold has been exceeded
+ *
+ * @is_write: true if the current IO is a write, false if it's a read
+ * @ret: true if threshold has been exceeded else false
+ */
+static bool bdrv_is_iops_threshold_exceeded(BlockDriverState *bs, bool is_write)
+{
+ /* limit is on total read + write iops : do the sum and compare with total
+ * threshold
+ */
+ if (bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL]) {
+ double ios = bs->leaky_buckets.ios[BLOCK_IO_LIMIT_READ] +
+ bs->leaky_buckets.ios[BLOCK_IO_LIMIT_WRITE];
+ return bs->io_limits.iops_threshold[BLOCK_IO_LIMIT_TOTAL] < ios;
+ }
+
+ /* check wether the threshold corresponding to the current io type (read,
+ * write) has been exceeded
+ */
+ if (bs->io_limits.iops[is_write]) {
+ return bs->io_limits.iops_threshold[is_write] <
+ bs->leaky_buckets.ios[is_write];
+ }
+
+ /* no limit */
+ return false;
+}
+
+/* This function check if any bandwith or iops threshold has been exceeded
+ *
+ * @nb_sectors: the number of sectors of the current IO
+ * @is_write: true if the current IO is a write, false if it's a read
+ * @ret: true if any threshold has been exceeded else false
+ */
+static bool bdrv_is_any_threshold_exceeded(BlockDriverState *bs, int nb_sectors,
+ bool is_write)
+{
+ bool bps_ret, iops_ret;
+
+ /* check if any bandwith or per IO threshold has been exceeded */
+ bps_ret = bdrv_is_bps_threshold_exceeded(bs, is_write);
+ iops_ret = bdrv_is_iops_threshold_exceeded(bs, is_write);
+
+ /* if so the IO will be blocked so do not account it and return true
+ * also return false if a bdrv_drain_all is in progress
+ */
+ if (!draining && (bps_ret || iops_ret)) {
+ return true;
+ }
+
+ /* NOTE: the counter can go above the threshold when authorizing an IO.
+ * At next call the code will punish the guest by blocking the
+ * next IO until the counter has been decremented below the threshold.
+ * This way if a guest issue a jumbo IO bigger than the threshold it
+ * will have a chance no be authorized and will not result in a guest
+ * IO deadlock.
+ */
+
+ /* the IO is authorized so do the accounting and return false */
+ bs->leaky_buckets.bytes[is_write] += (int64_t)nb_sectors *
+ BDRV_SECTOR_SIZE;
+ bs->leaky_buckets.ios[is_write]++;
+
+ return false;
+}
+
static void bdrv_io_limits_intercept(BlockDriverState *bs,
bool is_write, int nb_sectors)
{
- int64_t wait_time = -1;
+ /* enable block timer if needed when intercepting I/Os */
+ if (!bs->block_timer) {
+ bdrv_block_timer_enable(bs);
+ }
+ bdrv_leak_if_needed(bs);
+ /* if some IOs are already queued because the bucket is full put the current
+ * IO at the end of the queue (FIFO)
+ */
if (!qemu_co_queue_empty(&bs->throttled_reqs)) {
qemu_co_queue_wait(&bs->throttled_reqs);
}
+ bdrv_leak_if_needed(bs);
+
/* In fact, we hope to keep each request's timing, in FIFO mode. The next
* throttled requests will not be dequeued until the current request is
* allowed to be serviced. So if the current request still exceeds the
@@ -181,13 +421,19 @@ static void bdrv_io_limits_intercept(BlockDriverState *bs,
* be still in throttled_reqs queue.
*/
- while (bdrv_exceed_io_limits(bs, nb_sectors, is_write, &wait_time)) {
- qemu_mod_timer(bs->block_timer,
- wait_time + qemu_get_clock_ns(vm_clock));
+ /* if a threshold is exceeded the leaky bucket is full so the code put the
+ * IO in the throttle_reqs queue until the bucket has leaked enough to be
+ * not full
+ */
+ while (bdrv_is_any_threshold_exceeded(bs, nb_sectors, is_write)) {
+ bdrv_leak_if_needed(bs);
qemu_co_queue_wait_insert_head(&bs->throttled_reqs);
+ bdrv_leak_if_needed(bs);
}
+ bdrv_leak_if_needed(bs);
qemu_co_queue_next(&bs->throttled_reqs);
+ bdrv_leak_if_needed(bs);
}
/* check if the path starts with "<protocol>:" */
@@ -1439,6 +1685,9 @@ void bdrv_drain_all(void)
BlockDriverState *bs;
bool busy;
+ /* tell the throttling code we are draining */
+ draining = true;
+
do {
busy = qemu_aio_wait();
@@ -1457,6 +1706,8 @@ void bdrv_drain_all(void)
assert(QLIST_EMPTY(&bs->tracked_requests));
assert(qemu_co_queue_empty(&bs->throttled_reqs));
}
+
+ draining = false;
}
/* make a BlockDriverState anonymous by removing from bdrv_state list.
@@ -1492,9 +1743,7 @@ static void bdrv_move_feature_fields(BlockDriverState *bs_dest,
bs_dest->enable_write_cache = bs_src->enable_write_cache;
/* i/o timing parameters */
- bs_dest->slice_start = bs_src->slice_start;
- bs_dest->slice_end = bs_src->slice_end;
- bs_dest->slice_submitted = bs_src->slice_submitted;
+ bs_dest->leaky_buckets = bs_src->leaky_buckets;
bs_dest->io_limits = bs_src->io_limits;
bs_dest->throttled_reqs = bs_src->throttled_reqs;
bs_dest->block_timer = bs_src->block_timer;
@@ -3551,169 +3800,6 @@ void bdrv_aio_cancel(BlockDriverAIOCB *acb)
acb->aiocb_info->cancel(acb);
}
-/* block I/O throttling */
-static bool bdrv_exceed_bps_limits(BlockDriverState *bs, int nb_sectors,
- bool is_write, double elapsed_time, uint64_t *wait)
-{
- uint64_t bps_limit = 0;
- uint64_t extension;
- double bytes_limit, bytes_base, bytes_res;
- double slice_time, wait_time;
-
- if (bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL]) {
- bps_limit = bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL];
- } else if (bs->io_limits.bps[is_write]) {
- bps_limit = bs->io_limits.bps[is_write];
- } else {
- if (wait) {
- *wait = 0;
- }
-
- return false;
- }
-
- slice_time = bs->slice_end - bs->slice_start;
- slice_time /= (NANOSECONDS_PER_SECOND);
- bytes_limit = bps_limit * slice_time;
- bytes_base = bs->slice_submitted.bytes[is_write];
- if (bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL]) {
- bytes_base += bs->slice_submitted.bytes[!is_write];
- }
-
- /* bytes_base: the bytes of data which have been read/written; and
- * it is obtained from the history statistic info.
- * bytes_res: the remaining bytes of data which need to be read/written.
- * (bytes_base + bytes_res) / bps_limit: used to calcuate
- * the total time for completing reading/writting all data.
- */
- bytes_res = (unsigned) nb_sectors * BDRV_SECTOR_SIZE;
-
- if (bytes_base + bytes_res <= bytes_limit) {
- if (wait) {
- *wait = 0;
- }
-
- return false;
- }
-
- /* Calc approx time to dispatch */
- wait_time = (bytes_base + bytes_res) / bps_limit - elapsed_time;
-
- /* When the I/O rate at runtime exceeds the limits,
- * bs->slice_end need to be extended in order that the current statistic
- * info can be kept until the timer fire, so it is increased and tuned
- * based on the result of experiment.
- */
- extension = wait_time * NANOSECONDS_PER_SECOND;
- extension = DIV_ROUND_UP(extension, BLOCK_IO_SLICE_TIME) *
- BLOCK_IO_SLICE_TIME;
- bs->slice_end += extension;
- if (wait) {
- *wait = wait_time * NANOSECONDS_PER_SECOND;
- }
-
- return true;
-}
-
-static bool bdrv_exceed_iops_limits(BlockDriverState *bs, bool is_write,
- double elapsed_time, uint64_t *wait)
-{
- uint64_t iops_limit = 0;
- double ios_limit, ios_base;
- double slice_time, wait_time;
-
- if (bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL]) {
- iops_limit = bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL];
- } else if (bs->io_limits.iops[is_write]) {
- iops_limit = bs->io_limits.iops[is_write];
- } else {
- if (wait) {
- *wait = 0;
- }
-
- return false;
- }
-
- slice_time = bs->slice_end - bs->slice_start;
- slice_time /= (NANOSECONDS_PER_SECOND);
- ios_limit = iops_limit * slice_time;
- ios_base = bs->slice_submitted.ios[is_write];
- if (bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL]) {
- ios_base += bs->slice_submitted.ios[!is_write];
- }
-
- if (ios_base + 1 <= ios_limit) {
- if (wait) {
- *wait = 0;
- }
-
- return false;
- }
-
- /* Calc approx time to dispatch, in seconds */
- wait_time = (ios_base + 1) / iops_limit;
- if (wait_time > elapsed_time) {
- wait_time = wait_time - elapsed_time;
- } else {
- wait_time = 0;
- }
-
- /* Exceeded current slice, extend it by another slice time */
- bs->slice_end += BLOCK_IO_SLICE_TIME;
- if (wait) {
- *wait = wait_time * NANOSECONDS_PER_SECOND;
- }
-
- return true;
-}
-
-static bool bdrv_exceed_io_limits(BlockDriverState *bs, int nb_sectors,
- bool is_write, int64_t *wait)
-{
- int64_t now, max_wait;
- uint64_t bps_wait = 0, iops_wait = 0;
- double elapsed_time;
- int bps_ret, iops_ret;
-
- now = qemu_get_clock_ns(vm_clock);
- if (now > bs->slice_end) {
- bs->slice_start = now;
- bs->slice_end = now + BLOCK_IO_SLICE_TIME;
- memset(&bs->slice_submitted, 0, sizeof(bs->slice_submitted));
- }
-
- elapsed_time = now - bs->slice_start;
- elapsed_time /= (NANOSECONDS_PER_SECOND);
-
- bps_ret = bdrv_exceed_bps_limits(bs, nb_sectors,
- is_write, elapsed_time, &bps_wait);
- iops_ret = bdrv_exceed_iops_limits(bs, is_write,
- elapsed_time, &iops_wait);
- if (bps_ret || iops_ret) {
- max_wait = bps_wait > iops_wait ? bps_wait : iops_wait;
- if (wait) {
- *wait = max_wait;
- }
-
- now = qemu_get_clock_ns(vm_clock);
- if (bs->slice_end < now + max_wait) {
- bs->slice_end = now + max_wait;
- }
-
- return true;
- }
-
- if (wait) {
- *wait = 0;
- }
-
- bs->slice_submitted.bytes[is_write] += (int64_t)nb_sectors *
- BDRV_SECTOR_SIZE;
- bs->slice_submitted.ios[is_write]++;
-
- return false;
-}
-
/**************************************************************/
/* async block device emulation */
diff --git a/blockdev.c b/blockdev.c
index c5abd65..491e4d0 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -280,10 +280,25 @@ static int parse_block_error_action(const char *buf, bool is_read)
}
}
+static bool check_io_limit(int64_t limit)
+{
+ if (!limit) {
+ return false;
+ }
+
+ if (limit < (THROTTLE_HZ * 2)) {
+ return true;
+ }
+
+ return false;
+}
+
static bool do_check_io_limits(BlockIOLimit *io_limits, Error **errp)
{
bool bps_flag;
bool iops_flag;
+ bool bps_threshold_flag;
+ bool iops_threshold_flag;
assert(io_limits);
@@ -299,13 +314,30 @@ static bool do_check_io_limits(BlockIOLimit *io_limits, Error **errp)
return false;
}
- if (io_limits->bps[BLOCK_IO_LIMIT_TOTAL] < 0 ||
- io_limits->bps[BLOCK_IO_LIMIT_WRITE] < 0 ||
- io_limits->bps[BLOCK_IO_LIMIT_READ] < 0 ||
- io_limits->iops[BLOCK_IO_LIMIT_TOTAL] < 0 ||
- io_limits->iops[BLOCK_IO_LIMIT_WRITE] < 0 ||
- io_limits->iops[BLOCK_IO_LIMIT_READ] < 0) {
- error_setg(errp, "bps and iops values must be 0 or greater");
+ bps_threshold_flag =
+ (io_limits->bps_threshold[BLOCK_IO_LIMIT_TOTAL] != 0)
+ && ((io_limits->bps_threshold[BLOCK_IO_LIMIT_READ] != 0)
+ || (io_limits->bps_threshold[BLOCK_IO_LIMIT_WRITE] != 0));
+ iops_threshold_flag =
+ (io_limits->iops_threshold[BLOCK_IO_LIMIT_TOTAL] != 0)
+ && ((io_limits->iops_threshold[BLOCK_IO_LIMIT_READ] != 0)
+ || (io_limits->iops_threshold[BLOCK_IO_LIMIT_WRITE] != 0));
+ if (bps_threshold_flag || iops_threshold_flag) {
+ error_setg(errp, "bps_threshold(iops_threshold) and "
+ "bps_rd_threshold/bps_wr_threshold"
+ "(iops_rd_threshold/iops_wr_threshold) "
+ "cannot be used at the same time");
+ return false;
+ }
+
+ if (check_io_limit(io_limits->bps[BLOCK_IO_LIMIT_TOTAL]) ||
+ check_io_limit(io_limits->bps[BLOCK_IO_LIMIT_WRITE]) ||
+ check_io_limit(io_limits->bps[BLOCK_IO_LIMIT_READ]) ||
+ check_io_limit(io_limits->iops[BLOCK_IO_LIMIT_TOTAL]) ||
+ check_io_limit(io_limits->iops[BLOCK_IO_LIMIT_WRITE]) ||
+ check_io_limit(io_limits->iops[BLOCK_IO_LIMIT_READ])) {
+ error_setg(errp, "bps and iops values must be %i or greater",
+ THROTTLE_HZ * 2);
return false;
}
@@ -497,6 +529,18 @@ DriveInfo *drive_init(QemuOpts *all_opts, BlockInterfaceType block_default_type)
qemu_opt_get_number(opts, "iops_rd", 0);
io_limits.iops[BLOCK_IO_LIMIT_WRITE] =
qemu_opt_get_number(opts, "iops_wr", 0);
+ io_limits.bps_threshold[BLOCK_IO_LIMIT_TOTAL] =
+ io_limits.bps[BLOCK_IO_LIMIT_TOTAL] / THROTTLE_HZ;
+ io_limits.bps_threshold[BLOCK_IO_LIMIT_READ] =
+ io_limits.bps[BLOCK_IO_LIMIT_READ] / THROTTLE_HZ;
+ io_limits.bps_threshold[BLOCK_IO_LIMIT_WRITE] =
+ io_limits.bps[BLOCK_IO_LIMIT_WRITE] / THROTTLE_HZ;
+ io_limits.iops_threshold[BLOCK_IO_LIMIT_TOTAL] =
+ io_limits.iops[BLOCK_IO_LIMIT_TOTAL] / THROTTLE_HZ;
+ io_limits.iops_threshold[BLOCK_IO_LIMIT_READ] =
+ io_limits.iops[BLOCK_IO_LIMIT_READ] / THROTTLE_HZ;
+ io_limits.iops_threshold[BLOCK_IO_LIMIT_WRITE] =
+ io_limits.iops[BLOCK_IO_LIMIT_WRITE] / THROTTLE_HZ;
if (!do_check_io_limits(&io_limits, &error)) {
error_report("%s", error_get_pretty(error));
@@ -1198,6 +1242,12 @@ void qmp_block_set_io_throttle(const char *device, int64_t bps, int64_t bps_rd,
io_limits.iops[BLOCK_IO_LIMIT_TOTAL]= iops;
io_limits.iops[BLOCK_IO_LIMIT_READ] = iops_rd;
io_limits.iops[BLOCK_IO_LIMIT_WRITE]= iops_wr;
+ io_limits.bps_threshold[BLOCK_IO_LIMIT_TOTAL] = bps / THROTTLE_HZ;
+ io_limits.bps_threshold[BLOCK_IO_LIMIT_READ] = bps_rd / THROTTLE_HZ;
+ io_limits.bps_threshold[BLOCK_IO_LIMIT_WRITE] = bps_wr / THROTTLE_HZ;
+ io_limits.iops_threshold[BLOCK_IO_LIMIT_TOTAL] = iops / THROTTLE_HZ;
+ io_limits.iops_threshold[BLOCK_IO_LIMIT_READ] = iops_rd / THROTTLE_HZ;
+ io_limits.iops_threshold[BLOCK_IO_LIMIT_WRITE] = iops_wr / THROTTLE_HZ;
if (!do_check_io_limits(&io_limits, errp)) {
return;
@@ -1209,11 +1259,10 @@ void qmp_block_set_io_throttle(const char *device, int64_t bps, int64_t bps_rd,
bdrv_io_limits_enable(bs);
} else if (bs->io_limits_enabled && !bdrv_io_limits_enabled(bs)) {
bdrv_io_limits_disable(bs);
- } else {
- if (bs->block_timer) {
- qemu_mod_timer(bs->block_timer, qemu_get_clock_ns(vm_clock));
- }
}
+
+ /* reset leaky bucket to get the system in a known state */
+ memset(&bs->leaky_buckets, 0, sizeof(bs->leaky_buckets));
}
int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data)
diff --git a/include/block/block_int.h b/include/block/block_int.h
index c6ac871..e32ad1f 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -43,8 +43,9 @@
#define BLOCK_IO_LIMIT_WRITE 1
#define BLOCK_IO_LIMIT_TOTAL 2
-#define BLOCK_IO_SLICE_TIME 100000000
#define NANOSECONDS_PER_SECOND 1000000000.0
+#define THROTTLE_HZ 1
+#define BLOCK_IO_THROTTLE_PERIOD (NANOSECONDS_PER_SECOND / THROTTLE_HZ)
#define BLOCK_OPT_SIZE "size"
#define BLOCK_OPT_ENCRYPT "encryption"
@@ -73,11 +74,13 @@ typedef struct BdrvTrackedRequest {
typedef struct BlockIOLimit {
int64_t bps[3];
int64_t iops[3];
+ int64_t bps_threshold[3];
+ int64_t iops_threshold[3];
} BlockIOLimit;
typedef struct BlockIOBaseValue {
- uint64_t bytes[2];
- uint64_t ios[2];
+ int64_t bytes[2];
+ double ios[2];
} BlockIOBaseValue;
struct BlockDriver {
@@ -264,10 +267,10 @@ struct BlockDriverState {
unsigned int copy_on_read_in_flight;
/* the time for latest disk I/O */
- int64_t slice_start;
- int64_t slice_end;
BlockIOLimit io_limits;
- BlockIOBaseValue slice_submitted;
+ BlockIOBaseValue leaky_buckets;
+ int64_t previous_leak;
+ bool must_leak;
CoQueue throttled_reqs;
QEMUTimer *block_timer;
bool io_limits_enabled;
--
1.7.10.4
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [Qemu-devel] [PATCH V3 for-1.6 3/5] block: Add support for throttling burst threshold in QMP and the command line.
2013-07-23 16:29 [Qemu-devel] [PATCH V3 for-1.6 0/5] Leaky bucket throttling and features Benoît Canet
2013-07-23 16:29 ` [Qemu-devel] [PATCH V3 for-1.6 1/5] block: Repair the throttling code Benoît Canet
2013-07-23 16:29 ` [Qemu-devel] [PATCH V3 for-1.6 2/5] block: Modify the throttling code to implement the leaky bucket algorithm Benoît Canet
@ 2013-07-23 16:29 ` Benoît Canet
2013-07-26 3:07 ` Fam Zheng
2013-07-26 19:24 ` Eric Blake
2013-07-23 16:29 ` [Qemu-devel] [PATCH V3 for-1.6 4/5] block: Add iops_sector_count to do the iops accounting for a given io size Benoît Canet
` (3 subsequent siblings)
6 siblings, 2 replies; 20+ messages in thread
From: Benoît Canet @ 2013-07-23 16:29 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, Benoît Canet, stefanha
The thresholds of the leaky bucket algorithm can be used to allow some
burstiness.
Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
block/qapi.c | 24 +++++++++++++
blockdev.c | 105 +++++++++++++++++++++++++++++++++++++++++++++++-------
hmp.c | 32 +++++++++++++++--
qapi-schema.json | 34 ++++++++++++++++--
qemu-options.hx | 2 +-
qmp-commands.hx | 30 ++++++++++++++--
6 files changed, 205 insertions(+), 22 deletions(-)
diff --git a/block/qapi.c b/block/qapi.c
index a4bc411..03f1604 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -235,6 +235,30 @@ void bdrv_query_info(BlockDriverState *bs,
bs->io_limits.iops[BLOCK_IO_LIMIT_READ];
info->inserted->iops_wr =
bs->io_limits.iops[BLOCK_IO_LIMIT_WRITE];
+ info->inserted->has_bps_threshold =
+ bs->io_limits.bps_threshold[BLOCK_IO_LIMIT_TOTAL];
+ info->inserted->bps_threshold =
+ bs->io_limits.bps_threshold[BLOCK_IO_LIMIT_TOTAL];
+ info->inserted->has_bps_rd_threshold =
+ bs->io_limits.bps_threshold[BLOCK_IO_LIMIT_READ];
+ info->inserted->bps_rd_threshold =
+ bs->io_limits.bps_threshold[BLOCK_IO_LIMIT_READ];
+ info->inserted->has_bps_wr_threshold =
+ bs->io_limits.bps_threshold[BLOCK_IO_LIMIT_WRITE];
+ info->inserted->bps_wr_threshold =
+ bs->io_limits.bps_threshold[BLOCK_IO_LIMIT_WRITE];
+ info->inserted->has_iops_threshold =
+ bs->io_limits.iops_threshold[BLOCK_IO_LIMIT_TOTAL];
+ info->inserted->iops_threshold =
+ bs->io_limits.iops_threshold[BLOCK_IO_LIMIT_TOTAL];
+ info->inserted->iops_rd_threshold =
+ bs->io_limits.iops_threshold[BLOCK_IO_LIMIT_READ];
+ info->inserted->has_iops_rd_threshold =
+ bs->io_limits.iops_threshold[BLOCK_IO_LIMIT_READ];
+ info->inserted->has_iops_wr_threshold =
+ bs->io_limits.iops_threshold[BLOCK_IO_LIMIT_WRITE];
+ info->inserted->iops_wr_threshold =
+ bs->io_limits.iops_threshold[BLOCK_IO_LIMIT_WRITE];
}
bs0 = bs;
diff --git a/blockdev.c b/blockdev.c
index 491e4d0..241ebe8 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -341,6 +341,17 @@ static bool do_check_io_limits(BlockIOLimit *io_limits, Error **errp)
return false;
}
+ if (io_limits->bps_threshold[BLOCK_IO_LIMIT_TOTAL] < 0 ||
+ io_limits->bps_threshold[BLOCK_IO_LIMIT_WRITE] < 0 ||
+ io_limits->bps_threshold[BLOCK_IO_LIMIT_READ] < 0 ||
+ io_limits->iops_threshold[BLOCK_IO_LIMIT_TOTAL] < 0 ||
+ io_limits->iops_threshold[BLOCK_IO_LIMIT_WRITE] < 0 ||
+ io_limits->iops_threshold[BLOCK_IO_LIMIT_READ] < 0) {
+ error_setg(errp,
+ "threshold values must be 0 or greater");
+ return false;
+ }
+
return true;
}
@@ -523,24 +534,34 @@ DriveInfo *drive_init(QemuOpts *all_opts, BlockInterfaceType block_default_type)
qemu_opt_get_number(opts, "bps_rd", 0);
io_limits.bps[BLOCK_IO_LIMIT_WRITE] =
qemu_opt_get_number(opts, "bps_wr", 0);
+ /* bps thresholds */
+ io_limits.bps_threshold[BLOCK_IO_LIMIT_TOTAL] =
+ qemu_opt_get_number(opts, "bps_threshold",
+ io_limits.bps[BLOCK_IO_LIMIT_TOTAL] / THROTTLE_HZ);
+ io_limits.bps_threshold[BLOCK_IO_LIMIT_READ] =
+ qemu_opt_get_number(opts, "bps_rd_threshold",
+ io_limits.bps[BLOCK_IO_LIMIT_READ] / THROTTLE_HZ);
+ io_limits.bps_threshold[BLOCK_IO_LIMIT_WRITE] =
+ qemu_opt_get_number(opts, "bps_wr_threshold",
+ io_limits.bps[BLOCK_IO_LIMIT_WRITE] / THROTTLE_HZ);
+
io_limits.iops[BLOCK_IO_LIMIT_TOTAL] =
qemu_opt_get_number(opts, "iops", 0);
io_limits.iops[BLOCK_IO_LIMIT_READ] =
qemu_opt_get_number(opts, "iops_rd", 0);
io_limits.iops[BLOCK_IO_LIMIT_WRITE] =
qemu_opt_get_number(opts, "iops_wr", 0);
- io_limits.bps_threshold[BLOCK_IO_LIMIT_TOTAL] =
- io_limits.bps[BLOCK_IO_LIMIT_TOTAL] / THROTTLE_HZ;
- io_limits.bps_threshold[BLOCK_IO_LIMIT_READ] =
- io_limits.bps[BLOCK_IO_LIMIT_READ] / THROTTLE_HZ;
- io_limits.bps_threshold[BLOCK_IO_LIMIT_WRITE] =
- io_limits.bps[BLOCK_IO_LIMIT_WRITE] / THROTTLE_HZ;
- io_limits.iops_threshold[BLOCK_IO_LIMIT_TOTAL] =
- io_limits.iops[BLOCK_IO_LIMIT_TOTAL] / THROTTLE_HZ;
+
+ /* iops thresholds */
+ io_limits.iops_threshold[BLOCK_IO_LIMIT_TOTAL] =
+ qemu_opt_get_number(opts, "iops_threshold",
+ io_limits.iops[BLOCK_IO_LIMIT_TOTAL] / THROTTLE_HZ);
io_limits.iops_threshold[BLOCK_IO_LIMIT_READ] =
- io_limits.iops[BLOCK_IO_LIMIT_READ] / THROTTLE_HZ;
- io_limits.iops_threshold[BLOCK_IO_LIMIT_WRITE] =
- io_limits.iops[BLOCK_IO_LIMIT_WRITE] / THROTTLE_HZ;
+ qemu_opt_get_number(opts, "iops_rd_threshold",
+ io_limits.iops[BLOCK_IO_LIMIT_READ] / THROTTLE_HZ);
+ io_limits.iops_threshold[BLOCK_IO_LIMIT_WRITE] =
+ qemu_opt_get_number(opts, "iops_wr_threshold",
+ io_limits.iops[BLOCK_IO_LIMIT_WRITE] / THROTTLE_HZ);
if (!do_check_io_limits(&io_limits, &error)) {
error_report("%s", error_get_pretty(error));
@@ -1224,8 +1245,22 @@ void qmp_change_blockdev(const char *device, const char *filename,
/* throttling disk I/O limits */
void qmp_block_set_io_throttle(const char *device, int64_t bps, int64_t bps_rd,
- int64_t bps_wr, int64_t iops, int64_t iops_rd,
- int64_t iops_wr, Error **errp)
+ int64_t bps_wr,
+ int64_t iops,
+ int64_t iops_rd,
+ int64_t iops_wr,
+ bool has_bps_threshold,
+ int64_t bps_threshold,
+ bool has_bps_rd_threshold,
+ int64_t bps_rd_threshold,
+ bool has_bps_wr_threshold,
+ int64_t bps_wr_threshold,
+ bool has_iops_threshold,
+ int64_t iops_threshold,
+ bool has_iops_rd_threshold,
+ int64_t iops_rd_threshold,
+ bool has_iops_wr_threshold,
+ int64_t iops_wr_threshold, Error **errp)
{
BlockIOLimit io_limits;
BlockDriverState *bs;
@@ -1249,6 +1284,26 @@ void qmp_block_set_io_throttle(const char *device, int64_t bps, int64_t bps_rd,
io_limits.iops_threshold[BLOCK_IO_LIMIT_READ] = iops_rd / THROTTLE_HZ;
io_limits.iops_threshold[BLOCK_IO_LIMIT_WRITE] = iops_wr / THROTTLE_HZ;
+ /* override them with givens values if present */
+ if (has_bps_threshold) {
+ io_limits.bps_threshold[BLOCK_IO_LIMIT_TOTAL] = bps_threshold;
+ }
+ if (has_bps_rd_threshold) {
+ io_limits.bps_threshold[BLOCK_IO_LIMIT_READ] = bps_rd_threshold;
+ }
+ if (has_bps_wr_threshold) {
+ io_limits.bps_threshold[BLOCK_IO_LIMIT_WRITE] = bps_wr_threshold;
+ }
+ if (has_iops_threshold) {
+ io_limits.iops_threshold[BLOCK_IO_LIMIT_TOTAL] = iops_threshold;
+ }
+ if (has_iops_rd_threshold) {
+ io_limits.iops_threshold[BLOCK_IO_LIMIT_READ] = iops_rd_threshold;
+ }
+ if (has_iops_wr_threshold) {
+ io_limits.iops_threshold[BLOCK_IO_LIMIT_WRITE] = iops_wr_threshold;
+ }
+
if (!do_check_io_limits(&io_limits, errp)) {
return;
}
@@ -1916,6 +1971,18 @@ QemuOptsList qemu_common_drive_opts = {
.type = QEMU_OPT_NUMBER,
.help = "limit write operations per second",
},{
+ .name = "iops_threshold",
+ .type = QEMU_OPT_NUMBER,
+ .help = "total I/O operations threshold",
+ },{
+ .name = "iops_rd_threshold",
+ .type = QEMU_OPT_NUMBER,
+ .help = "read operations threshold",
+ },{
+ .name = "iops_wr_threshold",
+ .type = QEMU_OPT_NUMBER,
+ .help = "write operations threshold",
+ },{
.name = "bps",
.type = QEMU_OPT_NUMBER,
.help = "limit total bytes per second",
@@ -1928,6 +1995,18 @@ QemuOptsList qemu_common_drive_opts = {
.type = QEMU_OPT_NUMBER,
.help = "limit write bytes per second",
},{
+ .name = "bps_threshold",
+ .type = QEMU_OPT_NUMBER,
+ .help = "total bytes threshold",
+ },{
+ .name = "bps_rd_threshold",
+ .type = QEMU_OPT_NUMBER,
+ .help = "read bytes threshold",
+ },{
+ .name = "bps_wr_threshold",
+ .type = QEMU_OPT_NUMBER,
+ .help = "write bytes threshold",
+ },{
.name = "copy-on-read",
.type = QEMU_OPT_BOOL,
.help = "copy read data from backing file into image file",
diff --git a/hmp.c b/hmp.c
index dc4d8d4..d75aa99 100644
--- a/hmp.c
+++ b/hmp.c
@@ -340,14 +340,28 @@ void hmp_info_block(Monitor *mon, const QDict *qdict)
{
monitor_printf(mon, " I/O throttling: bps=%" PRId64
" bps_rd=%" PRId64 " bps_wr=%" PRId64
+ " bps_threshold=%" PRId64
+ " bps_rd_threshold=%" PRId64
+ " bps_wr_threshold=%" PRId64
" iops=%" PRId64 " iops_rd=%" PRId64
- " iops_wr=%" PRId64 "\n",
+ " iops_wr=%" PRId64
+ " iops_threshold=%" PRId64
+ " iops_rd_threshold=%" PRId64
+ " iops_wr_threshold=%" PRId64 "\n",
info->value->inserted->bps,
info->value->inserted->bps_rd,
info->value->inserted->bps_wr,
+ info->value->inserted->bps_threshold,
+ info->value->inserted->bps_rd_threshold,
+ info->value->inserted->bps_wr_threshold,
info->value->inserted->iops,
info->value->inserted->iops_rd,
- info->value->inserted->iops_wr);
+ info->value->inserted->iops_wr,
+ info->value->inserted->iops_threshold,
+ info->value->inserted->iops_rd_threshold,
+ info->value->inserted->iops_wr_threshold);
+ } else {
+ monitor_printf(mon, " [not inserted]");
}
if (verbose) {
@@ -1094,7 +1108,19 @@ void hmp_block_set_io_throttle(Monitor *mon, const QDict *qdict)
qdict_get_int(qdict, "bps_wr"),
qdict_get_int(qdict, "iops"),
qdict_get_int(qdict, "iops_rd"),
- qdict_get_int(qdict, "iops_wr"), &err);
+ qdict_get_int(qdict, "iops_wr"),
+ false, /* no burst threshold via QMP */
+ 0,
+ false,
+ 0,
+ false,
+ 0,
+ false,
+ 0,
+ false,
+ 0,
+ false,
+ 0, &err);
hmp_handle_error(mon, &err);
}
diff --git a/qapi-schema.json b/qapi-schema.json
index 7b9fef1..a6f3792 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -769,6 +769,18 @@
#
# @image: the info of image used (since: 1.6)
#
+# @bps_threshold: #optional total threshold in bytes (Since 1.6)
+#
+# @bps_rd_threshold: #optional read threshold in bytes (Since 1.6)
+#
+# @bps_wr_threshold: #optional write threshold in bytes (Since 1.6)
+#
+# @iops_threshold: #optional total I/O operations threshold (Since 1.6)
+#
+# @iops_rd_threshold: #optional read I/O operations threshold (Since 1.6)
+#
+# @iops_wr_threshold: #optional write I/O operations threshold (Since 1.6)
+#
# Since: 0.14.0
#
# Notes: This interface is only found in @BlockInfo.
@@ -779,7 +791,10 @@
'encrypted': 'bool', 'encryption_key_missing': 'bool',
'bps': 'int', 'bps_rd': 'int', 'bps_wr': 'int',
'iops': 'int', 'iops_rd': 'int', 'iops_wr': 'int',
- 'image': 'ImageInfo' } }
+ 'image': 'ImageInfo',
+ '*bps_threshold': 'int', '*bps_rd_threshold': 'int',
+ '*bps_wr_threshold': 'int', '*iops_threshold': 'int',
+ '*iops_rd_threshold': 'int', '*iops_wr_threshold': 'int' }}
##
# @BlockDeviceIoStatus:
@@ -2158,6 +2173,18 @@
#
# @iops_wr: write I/O operations per second
#
+# @bps_threshold: #optional total threshold in bytes (Since 1.6)
+#
+# @bps_rd_threshold: #optional read threshold in bytes (Since 1.6)
+#
+# @bps_wr_threshold: #optional write threshold in bytes (Since 1.6)
+#
+# @iops_threshold: #optional total I/O operations threshold (Since 1.6)
+#
+# @iops_rd_threshold: #optional read I/O operations threshold (Since 1.6)
+#
+# @iops_wr_threshold: #optional write I/O operations threshold (Since 1.6)
+#
# Returns: Nothing on success
# If @device is not a valid block device, DeviceNotFound
#
@@ -2165,7 +2192,10 @@
##
{ 'command': 'block_set_io_throttle',
'data': { 'device': 'str', 'bps': 'int', 'bps_rd': 'int', 'bps_wr': 'int',
- 'iops': 'int', 'iops_rd': 'int', 'iops_wr': 'int' } }
+ 'iops': 'int', 'iops_rd': 'int', 'iops_wr': 'int',
+ '*bps_threshold': 'int', '*bps_rd_threshold': 'int',
+ '*bps_wr_threshold': 'int', '*iops_threshold': 'int',
+ '*iops_rd_threshold': 'int', '*iops_wr_threshold': 'int' }}
##
# @block-stream:
diff --git a/qemu-options.hx b/qemu-options.hx
index 4e98b4f..4d5ee66 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -409,7 +409,7 @@ DEF("drive", HAS_ARG, QEMU_OPTION_drive,
" [,cache=writethrough|writeback|none|directsync|unsafe][,format=f]\n"
" [,serial=s][,addr=A][,id=name][,aio=threads|native]\n"
" [,readonly=on|off][,copy-on-read=on|off]\n"
- " [[,bps=b]|[[,bps_rd=r][,bps_wr=w]]][[,iops=i]|[[,iops_rd=r][,iops_wr=w]]\n"
+ " [[,bps=b]|[[,bps_rd=r][,bps_wr=w]]][[,iops=i]|[[,iops_rd=r][,iops_wr=w][,bps_threshold=bt]|[[,bps_rd_threshold=rt][,bps_wr_threshold=wt]]][[,iops_threshold=it]|[[,iops_rd_threshold=rt][,iops_wr_threshold=wt]]\n"
" use 'file' as a drive image\n", QEMU_ARCH_ALL)
STEXI
@item -drive @var{option}[,@var{option}[,@var{option}[,...]]]
diff --git a/qmp-commands.hx b/qmp-commands.hx
index e075df4..11c638a 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -1385,7 +1385,7 @@ EQMP
{
.name = "block_set_io_throttle",
- .args_type = "device:B,bps:l,bps_rd:l,bps_wr:l,iops:l,iops_rd:l,iops_wr:l",
+ .args_type = "device:B,bps:l,bps_rd:l,bps_wr:l,iops:l,iops_rd:l,iops_wr:l,bps_threshold:l?,bps_rd_threshold:l?,bps_wr_threshold:l?,iops_threshold:l?,iops_rd_threshold:l?,iops_wr_threshold:l?",
.mhandler.cmd_new = qmp_marshal_input_block_set_io_throttle,
},
@@ -1400,10 +1400,16 @@ Arguments:
- "device": device name (json-string)
- "bps": total throughput limit in bytes per second(json-int)
- "bps_rd": read throughput limit in bytes per second(json-int)
-- "bps_wr": read throughput limit in bytes per second(json-int)
+- "bps_wr": write throughput limit in bytes per second(json-int)
- "iops": total I/O operations per second(json-int)
- "iops_rd": read I/O operations per second(json-int)
- "iops_wr": write I/O operations per second(json-int)
+- "bps_threshold": total threshold in bytes(json-int)
+- "bps_rd_threshold": read threshold in bytes(json-int)
+- "bps_wr_threshold": write threshold in bytes(json-int)
+- "iops_threshold": total I/O operations threshold(json-int)
+- "iops_rd_threshold": read I/O operations threshold(json-int)
+- "iops_wr_threshold": write I/O operations threshold(json-int)
Example:
@@ -1413,7 +1419,13 @@ Example:
"bps_wr": "0",
"iops": "0",
"iops_rd": "0",
- "iops_wr": "0" } }
+ "iops_wr": "0",
+ "bps_threshold": "8000000",
+ "bps_rd_threshold": "0",
+ "bps_wr_threshold": "0",
+ "iops_threshold": "0",
+ "iops_rd_threshold": "0",
+ "iops_wr_threshold": "0" } }
<- { "return": {} }
EQMP
@@ -1754,6 +1766,12 @@ Each json-object contain the following:
- "iops": limit total I/O operations per second (json-int)
- "iops_rd": limit read operations per second (json-int)
- "iops_wr": limit write operations per second (json-int)
+ - "bps_threshold": total threshold in bytes(json-int)
+ - "bps_rd_threshold": read threshold in bytes(json-int)
+ - "bps_wr_threshold": write threshold in bytes(json-int)
+ - "iops_threshold": total I/O operations threshold(json-int)
+ - "iops_rd_threshold": read I/O operations threshold(json-int)
+ - "iops_wr_threshold": write I/O operations threshold(json-int)
- "image": the detail of the image, it is a json-object containing
the following:
- "filename": image file name (json-string)
@@ -1823,6 +1841,12 @@ Example:
"iops":1000000,
"iops_rd":0,
"iops_wr":0,
+ "bps_threshold": "8000000",
+ "bps_rd_threshold": "0",
+ "bps_wr_threshold": "0",
+ "iops_threshold": "0",
+ "iops_rd_threshold": "0",
+ "iops_wr_threshold": "0",
"image":{
"filename":"disks/test.qcow2",
"format":"qcow2",
--
1.7.10.4
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [Qemu-devel] [PATCH V3 for-1.6 4/5] block: Add iops_sector_count to do the iops accounting for a given io size.
2013-07-23 16:29 [Qemu-devel] [PATCH V3 for-1.6 0/5] Leaky bucket throttling and features Benoît Canet
` (2 preceding siblings ...)
2013-07-23 16:29 ` [Qemu-devel] [PATCH V3 for-1.6 3/5] block: Add support for throttling burst threshold in QMP and the command line Benoît Canet
@ 2013-07-23 16:29 ` Benoît Canet
2013-07-23 16:29 ` [Qemu-devel] [PATCH V3 for-1.6 5/5] block: Add throttling percentage metrics Benoît Canet
` (2 subsequent siblings)
6 siblings, 0 replies; 20+ messages in thread
From: Benoît Canet @ 2013-07-23 16:29 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, Benoît Canet, stefanha
This feature can be used in case where users are avoiding the iops limit by
doing jumbo I/Os hammering the storage backend.
Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
block.c | 8 +++++++-
block/qapi.c | 4 ++++
blockdev.c | 22 +++++++++++++++++++++-
hmp.c | 8 ++++++--
include/block/block_int.h | 1 +
qapi-schema.json | 10 ++++++++--
qemu-options.hx | 2 +-
qmp-commands.hx | 8 ++++++--
8 files changed, 54 insertions(+), 9 deletions(-)
diff --git a/block.c b/block.c
index f1cd9c0..bb4f8e4 100644
--- a/block.c
+++ b/block.c
@@ -368,6 +368,12 @@ static bool bdrv_is_any_threshold_exceeded(BlockDriverState *bs, int nb_sectors,
bool is_write)
{
bool bps_ret, iops_ret;
+ double ios = 1.0;
+
+ if (bs->io_limits.iops_sector_count) {
+ ios = ((double) nb_sectors) / bs->io_limits.iops_sector_count;
+ ios = MAX(ios, 1.0);
+ }
/* check if any bandwith or per IO threshold has been exceeded */
bps_ret = bdrv_is_bps_threshold_exceeded(bs, is_write);
@@ -391,7 +397,7 @@ static bool bdrv_is_any_threshold_exceeded(BlockDriverState *bs, int nb_sectors,
/* the IO is authorized so do the accounting and return false */
bs->leaky_buckets.bytes[is_write] += (int64_t)nb_sectors *
BDRV_SECTOR_SIZE;
- bs->leaky_buckets.ios[is_write]++;
+ bs->leaky_buckets.ios[is_write] += ios;
return false;
}
diff --git a/block/qapi.c b/block/qapi.c
index 03f1604..f81081c 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -259,6 +259,10 @@ void bdrv_query_info(BlockDriverState *bs,
bs->io_limits.iops_threshold[BLOCK_IO_LIMIT_WRITE];
info->inserted->iops_wr_threshold =
bs->io_limits.iops_threshold[BLOCK_IO_LIMIT_WRITE];
+ info->inserted->has_iops_sector_count =
+ bs->io_limits.iops_sector_count;
+ info->inserted->iops_sector_count =
+ bs->io_limits.iops_sector_count;
}
bs0 = bs;
diff --git a/blockdev.c b/blockdev.c
index 241ebe8..d56e8a1 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -352,6 +352,11 @@ static bool do_check_io_limits(BlockIOLimit *io_limits, Error **errp)
return false;
}
+ if (io_limits->iops_sector_count < 0) {
+ error_setg(errp, "iops_sector_count must be 0 or greater");
+ return false;
+ }
+
return true;
}
@@ -563,6 +568,10 @@ DriveInfo *drive_init(QemuOpts *all_opts, BlockInterfaceType block_default_type)
qemu_opt_get_number(opts, "iops_wr_threshold",
io_limits.iops[BLOCK_IO_LIMIT_WRITE] / THROTTLE_HZ);
+ io_limits.iops_sector_count =
+ qemu_opt_get_number(opts, "iops_sector_count", 0);
+
+
if (!do_check_io_limits(&io_limits, &error)) {
error_report("%s", error_get_pretty(error));
error_free(error);
@@ -1260,7 +1269,9 @@ void qmp_block_set_io_throttle(const char *device, int64_t bps, int64_t bps_rd,
bool has_iops_rd_threshold,
int64_t iops_rd_threshold,
bool has_iops_wr_threshold,
- int64_t iops_wr_threshold, Error **errp)
+ int64_t iops_wr_threshold,
+ bool has_iops_sector_count,
+ int64_t iops_sector_count, Error **errp)
{
BlockIOLimit io_limits;
BlockDriverState *bs;
@@ -1283,6 +1294,7 @@ void qmp_block_set_io_throttle(const char *device, int64_t bps, int64_t bps_rd,
io_limits.iops_threshold[BLOCK_IO_LIMIT_TOTAL] = iops / THROTTLE_HZ;
io_limits.iops_threshold[BLOCK_IO_LIMIT_READ] = iops_rd / THROTTLE_HZ;
io_limits.iops_threshold[BLOCK_IO_LIMIT_WRITE] = iops_wr / THROTTLE_HZ;
+ io_limits.iops_sector_count = 0;
/* override them with givens values if present */
if (has_bps_threshold) {
@@ -1304,6 +1316,10 @@ void qmp_block_set_io_throttle(const char *device, int64_t bps, int64_t bps_rd,
io_limits.iops_threshold[BLOCK_IO_LIMIT_WRITE] = iops_wr_threshold;
}
+ if (has_iops_sector_count) {
+ io_limits.iops_sector_count = iops_sector_count;
+ }
+
if (!do_check_io_limits(&io_limits, errp)) {
return;
}
@@ -2007,6 +2023,10 @@ QemuOptsList qemu_common_drive_opts = {
.type = QEMU_OPT_NUMBER,
.help = "write bytes threshold",
},{
+ .name = "iops_sector_count",
+ .type = QEMU_OPT_NUMBER,
+ .help = "when limiting by iops max size of an I/O in sector",
+ },{
.name = "copy-on-read",
.type = QEMU_OPT_BOOL,
.help = "copy read data from backing file into image file",
diff --git a/hmp.c b/hmp.c
index d75aa99..3912305 100644
--- a/hmp.c
+++ b/hmp.c
@@ -347,7 +347,8 @@ void hmp_info_block(Monitor *mon, const QDict *qdict)
" iops_wr=%" PRId64
" iops_threshold=%" PRId64
" iops_rd_threshold=%" PRId64
- " iops_wr_threshold=%" PRId64 "\n",
+ " iops_wr_threshold=%" PRId64
+ " iops_sector_count=%" PRId64 "\n",
info->value->inserted->bps,
info->value->inserted->bps_rd,
info->value->inserted->bps_wr,
@@ -359,7 +360,8 @@ void hmp_info_block(Monitor *mon, const QDict *qdict)
info->value->inserted->iops_wr,
info->value->inserted->iops_threshold,
info->value->inserted->iops_rd_threshold,
- info->value->inserted->iops_wr_threshold);
+ info->value->inserted->iops_wr_threshold,
+ info->value->inserted->iops_sector_count);
} else {
monitor_printf(mon, " [not inserted]");
}
@@ -1120,6 +1122,8 @@ void hmp_block_set_io_throttle(Monitor *mon, const QDict *qdict)
false,
0,
false,
+ 0,
+ false, /* No default I/O size */
0, &err);
hmp_handle_error(mon, &err);
}
diff --git a/include/block/block_int.h b/include/block/block_int.h
index e32ad1f..74d7503 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -76,6 +76,7 @@ typedef struct BlockIOLimit {
int64_t iops[3];
int64_t bps_threshold[3];
int64_t iops_threshold[3];
+ int64_t iops_sector_count;
} BlockIOLimit;
typedef struct BlockIOBaseValue {
diff --git a/qapi-schema.json b/qapi-schema.json
index a6f3792..d579fda 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -781,6 +781,8 @@
#
# @iops_wr_threshold: #optional write I/O operations threshold (Since 1.6)
#
+# @iops_sector_count: #optional an I/O size in sector (Since 1.6)
+#
# Since: 0.14.0
#
# Notes: This interface is only found in @BlockInfo.
@@ -794,7 +796,8 @@
'image': 'ImageInfo',
'*bps_threshold': 'int', '*bps_rd_threshold': 'int',
'*bps_wr_threshold': 'int', '*iops_threshold': 'int',
- '*iops_rd_threshold': 'int', '*iops_wr_threshold': 'int' }}
+ '*iops_rd_threshold': 'int', '*iops_wr_threshold': 'int',
+ '*iops_sector_count': 'int' } }
##
# @BlockDeviceIoStatus:
@@ -2185,6 +2188,8 @@
#
# @iops_wr_threshold: #optional write I/O operations threshold (Since 1.6)
#
+# @iops_sector_count: #optional an I/O size in sector (Since 1.6)
+#
# Returns: Nothing on success
# If @device is not a valid block device, DeviceNotFound
#
@@ -2195,7 +2200,8 @@
'iops': 'int', 'iops_rd': 'int', 'iops_wr': 'int',
'*bps_threshold': 'int', '*bps_rd_threshold': 'int',
'*bps_wr_threshold': 'int', '*iops_threshold': 'int',
- '*iops_rd_threshold': 'int', '*iops_wr_threshold': 'int' }}
+ '*iops_rd_threshold': 'int', '*iops_wr_threshold': 'int',
+ '*iops_sector_count': 'int' }}
##
# @block-stream:
diff --git a/qemu-options.hx b/qemu-options.hx
index 4d5ee66..2f417b8 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -409,7 +409,7 @@ DEF("drive", HAS_ARG, QEMU_OPTION_drive,
" [,cache=writethrough|writeback|none|directsync|unsafe][,format=f]\n"
" [,serial=s][,addr=A][,id=name][,aio=threads|native]\n"
" [,readonly=on|off][,copy-on-read=on|off]\n"
- " [[,bps=b]|[[,bps_rd=r][,bps_wr=w]]][[,iops=i]|[[,iops_rd=r][,iops_wr=w][,bps_threshold=bt]|[[,bps_rd_threshold=rt][,bps_wr_threshold=wt]]][[,iops_threshold=it]|[[,iops_rd_threshold=rt][,iops_wr_threshold=wt]]\n"
+ " [[,bps=b]|[[,bps_rd=r][,bps_wr=w]]][[,iops=i]|[[,iops_rd=r][,iops_wr=w][,bps_threshold=bt]|[[,bps_rd_threshold=rt][,bps_wr_threshold=wt]]][[,iops_threshold=it]|[[,iops_rd_threshold=rt][,iops_wr_threshold=wt][,iops_sector_count=cnt]]\n"
" use 'file' as a drive image\n", QEMU_ARCH_ALL)
STEXI
@item -drive @var{option}[,@var{option}[,@var{option}[,...]]]
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 11c638a..b40db2f 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -1385,7 +1385,7 @@ EQMP
{
.name = "block_set_io_throttle",
- .args_type = "device:B,bps:l,bps_rd:l,bps_wr:l,iops:l,iops_rd:l,iops_wr:l,bps_threshold:l?,bps_rd_threshold:l?,bps_wr_threshold:l?,iops_threshold:l?,iops_rd_threshold:l?,iops_wr_threshold:l?",
+ .args_type = "device:B,bps:l,bps_rd:l,bps_wr:l,iops:l,iops_rd:l,iops_wr:l,bps_threshold:l?,bps_rd_threshold:l?,bps_wr_threshold:l?,iops_threshold:l?,iops_rd_threshold:l?,iops_wr_threshold:l?,iops_sector_count:l?",
.mhandler.cmd_new = qmp_marshal_input_block_set_io_throttle,
},
@@ -1410,6 +1410,7 @@ Arguments:
- "iops_threshold": total I/O operations threshold(json-int)
- "iops_rd_threshold": read I/O operations threshold(json-int)
- "iops_wr_threshold": write I/O operations threshold(json-int)
+- "iops_sector_count": I/O sector count when limiting(json-int)
Example:
@@ -1425,7 +1426,8 @@ Example:
"bps_wr_threshold": "0",
"iops_threshold": "0",
"iops_rd_threshold": "0",
- "iops_wr_threshold": "0" } }
+ "iops_wr_threshold": "0",
+ "iops_sector_count": "0" } }
<- { "return": {} }
EQMP
@@ -1772,6 +1774,7 @@ Each json-object contain the following:
- "iops_threshold": total I/O operations threshold(json-int)
- "iops_rd_threshold": read I/O operations threshold(json-int)
- "iops_wr_threshold": write I/O operations threshold(json-int)
+ - "iops_sector_count": I/O sector count when limiting(json-int)
- "image": the detail of the image, it is a json-object containing
the following:
- "filename": image file name (json-string)
@@ -1847,6 +1850,7 @@ Example:
"iops_threshold": "0",
"iops_rd_threshold": "0",
"iops_wr_threshold": "0",
+ "iops_sector_count": "0",
"image":{
"filename":"disks/test.qcow2",
"format":"qcow2",
--
1.7.10.4
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [Qemu-devel] [PATCH V3 for-1.6 5/5] block: Add throttling percentage metrics.
2013-07-23 16:29 [Qemu-devel] [PATCH V3 for-1.6 0/5] Leaky bucket throttling and features Benoît Canet
` (3 preceding siblings ...)
2013-07-23 16:29 ` [Qemu-devel] [PATCH V3 for-1.6 4/5] block: Add iops_sector_count to do the iops accounting for a given io size Benoît Canet
@ 2013-07-23 16:29 ` Benoît Canet
2013-07-26 2:55 ` Fam Zheng
2013-07-25 12:08 ` [Qemu-devel] [PATCH V3 for-1.6 0/5] Leaky bucket throttling and features Fam Zheng
2013-07-26 19:07 ` Eric Blake
6 siblings, 1 reply; 20+ messages in thread
From: Benoît Canet @ 2013-07-23 16:29 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, Benoît Canet, stefanha
This metrics show how many percent of the time I/Os are put on hold by the
throttling algorithm.
This metric could be used by system administrators or cloud stack developpers
to decide when the throttling settings must be changed.
Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
block.c | 27 ++++++++++++++++++++++++++-
block/qapi.c | 4 ++++
hmp.c | 6 ++++--
include/block/block_int.h | 2 ++
qapi-schema.json | 4 +++-
5 files changed, 39 insertions(+), 4 deletions(-)
diff --git a/block.c b/block.c
index bb4f8e4..6bb8570 100644
--- a/block.c
+++ b/block.c
@@ -118,12 +118,21 @@ int is_windows_drive(const char *filename)
#endif
/* throttling disk I/O limits */
+static void bdrv_reset_throttling_metrics(BlockDriverState *bs)
+{
+ /* iddle -> reset values */
+ bs->throttling_percentage = 0;
+ bs->full_since = 0;
+}
+
void bdrv_io_limits_disable(BlockDriverState *bs)
{
bs->io_limits_enabled = false;
while (qemu_co_enter_next(&bs->throttled_reqs)) {
}
+
+ bdrv_reset_throttling_metrics(bs);
}
static void bdrv_make_bps_buckets_leak(BlockDriverState *bs, int64_t delta)
@@ -213,7 +222,8 @@ static void bdrv_make_iops_buckets_leak(BlockDriverState *bs, int64_t delta)
static void bdrv_leak_if_needed(BlockDriverState *bs)
{
int64_t now;
- int64_t delta;
+ int64_t delta; /* the delta that would be ideally the timer period */
+ int64_t delta_full; /* the delta where the bucket is full */
if (!bs->must_leak) {
return;
@@ -223,6 +233,14 @@ static void bdrv_leak_if_needed(BlockDriverState *bs)
now = qemu_get_clock_ns(rt_clock);
delta = now - bs->previous_leak;
+ /* compute throttle percentage reflecting how long IO are hold on average */
+ if (bs->full_since) {
+ delta_full = now - bs->full_since;
+ bs->throttling_percentage = (delta_full * 100) / delta;
+ bs->full_since = 0;
+ } else {
+ bs->throttling_percentage = 0;
+ }
bs->previous_leak = now;
bdrv_make_bps_buckets_leak(bs, delta);
@@ -260,6 +278,7 @@ static void bdrv_block_timer(void *opaque)
/* disable throttling time on iddle for economy purpose */
if (bdrv_throttling_is_iddle(bs)) {
bdrv_block_timer_disable(bs);
+ bdrv_reset_throttling_metrics(bs);
return;
}
@@ -280,6 +299,7 @@ static void bdrv_block_timer_enable(BlockDriverState *bs)
bs->block_timer = qemu_new_timer_ns(vm_clock, bdrv_block_timer, bs);
bs->previous_leak = qemu_get_clock_ns(rt_clock);
+ bdrv_reset_throttling_metrics(bs);
qemu_mod_timer(bs->block_timer,
qemu_get_clock_ns(vm_clock) +
BLOCK_IO_THROTTLE_PERIOD);
@@ -432,6 +452,11 @@ static void bdrv_io_limits_intercept(BlockDriverState *bs,
* not full
*/
while (bdrv_is_any_threshold_exceeded(bs, nb_sectors, is_write)) {
+ /* remember since when the code decided to block the first I/O */
+ if (qemu_co_queue_empty(&bs->throttled_reqs)) {
+ bs->full_since = qemu_get_clock_ns(rt_clock);
+ }
+
bdrv_leak_if_needed(bs);
qemu_co_queue_wait_insert_head(&bs->throttled_reqs);
bdrv_leak_if_needed(bs);
diff --git a/block/qapi.c b/block/qapi.c
index f81081c..bd1c6af 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -263,6 +263,10 @@ void bdrv_query_info(BlockDriverState *bs,
bs->io_limits.iops_sector_count;
info->inserted->iops_sector_count =
bs->io_limits.iops_sector_count;
+ info->inserted->has_throttling_percentage =
+ bs->throttling_percentage;
+ info->inserted->throttling_percentage =
+ bs->throttling_percentage;
}
bs0 = bs;
diff --git a/hmp.c b/hmp.c
index 3912305..9dc4862 100644
--- a/hmp.c
+++ b/hmp.c
@@ -348,7 +348,8 @@ void hmp_info_block(Monitor *mon, const QDict *qdict)
" iops_threshold=%" PRId64
" iops_rd_threshold=%" PRId64
" iops_wr_threshold=%" PRId64
- " iops_sector_count=%" PRId64 "\n",
+ " iops_sector_count=%" PRId64
+ " throttling_percentage=%" PRId64 "\n",
info->value->inserted->bps,
info->value->inserted->bps_rd,
info->value->inserted->bps_wr,
@@ -361,7 +362,8 @@ void hmp_info_block(Monitor *mon, const QDict *qdict)
info->value->inserted->iops_threshold,
info->value->inserted->iops_rd_threshold,
info->value->inserted->iops_wr_threshold,
- info->value->inserted->iops_sector_count);
+ info->value->inserted->iops_sector_count,
+ info->value->inserted->throttling_percentage);
} else {
monitor_printf(mon, " [not inserted]");
}
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 74d7503..4487cd9 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -271,6 +271,8 @@ struct BlockDriverState {
BlockIOLimit io_limits;
BlockIOBaseValue leaky_buckets;
int64_t previous_leak;
+ int64_t full_since;
+ int throttling_percentage;
bool must_leak;
CoQueue throttled_reqs;
QEMUTimer *block_timer;
diff --git a/qapi-schema.json b/qapi-schema.json
index d579fda..14a02e7 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -783,6 +783,8 @@
#
# @iops_sector_count: #optional an I/O size in sector (Since 1.6)
#
+# @throttling_percentage: #optional reflect throttling activity (Since 1.6)
+#
# Since: 0.14.0
#
# Notes: This interface is only found in @BlockInfo.
@@ -797,7 +799,7 @@
'*bps_threshold': 'int', '*bps_rd_threshold': 'int',
'*bps_wr_threshold': 'int', '*iops_threshold': 'int',
'*iops_rd_threshold': 'int', '*iops_wr_threshold': 'int',
- '*iops_sector_count': 'int' } }
+ '*iops_sector_count': 'int', '*throttling_percentage': 'int' } }
##
# @BlockDeviceIoStatus:
--
1.7.10.4
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [PATCH V3 for-1.6 0/5] Leaky bucket throttling and features
2013-07-23 16:29 [Qemu-devel] [PATCH V3 for-1.6 0/5] Leaky bucket throttling and features Benoît Canet
` (4 preceding siblings ...)
2013-07-23 16:29 ` [Qemu-devel] [PATCH V3 for-1.6 5/5] block: Add throttling percentage metrics Benoît Canet
@ 2013-07-25 12:08 ` Fam Zheng
2013-07-25 12:16 ` Benoît Canet
2013-07-26 19:07 ` Eric Blake
6 siblings, 1 reply; 20+ messages in thread
From: Fam Zheng @ 2013-07-25 12:08 UTC (permalink / raw)
To: Benoît Canet; +Cc: kwolf, qemu-devel, stefanha
On Tue, 07/23 18:29, Benoît Canet wrote:
> The first patch fixes the throttling which was broken by a previous commit.
>
> The next patch replace the existing throttling algorithm by the well described
> leaky bucket algorithm.
>
> Third patch implement bursting by adding *_threshold parameters to
> qmp_block_set_io_throttle.
>
> The last one allow to define the max size of an io when throttling by iops via
> iops_sector_count to avoid vm users cheating on the iops limit.
>
> The last patch adds a metric reflecting how much the I/O are throttled.
>
My ignorant question: Besides the fix, how is io throttling improved
from previously, what are the advantages of leaky bucket algorithm?
Thanks.
Fam
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [PATCH V3 for-1.6 0/5] Leaky bucket throttling and features
2013-07-25 12:08 ` [Qemu-devel] [PATCH V3 for-1.6 0/5] Leaky bucket throttling and features Fam Zheng
@ 2013-07-25 12:16 ` Benoît Canet
0 siblings, 0 replies; 20+ messages in thread
From: Benoît Canet @ 2013-07-25 12:16 UTC (permalink / raw)
To: Fam Zheng; +Cc: kwolf, qemu-devel, stefanha
Le Thursday 25 Jul 2013 à 20:08:22 (+0800), Fam Zheng a écrit :
> My ignorant question: Besides the fix, how is io throttling improved
> from previously, what are the advantages of leaky bucket algorithm?
The advantage is simplicity and the ability to allow controlled bursts.
The disavantage is it add stutters to the iops flow.
Best Regards
Benoît
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [PATCH V3 for-1.6 5/5] block: Add throttling percentage metrics.
2013-07-23 16:29 ` [Qemu-devel] [PATCH V3 for-1.6 5/5] block: Add throttling percentage metrics Benoît Canet
@ 2013-07-26 2:55 ` Fam Zheng
2013-07-26 9:49 ` Benoît Canet
0 siblings, 1 reply; 20+ messages in thread
From: Fam Zheng @ 2013-07-26 2:55 UTC (permalink / raw)
To: Benoît Canet; +Cc: kwolf, qemu-devel, stefanha
On Tue, 07/23 18:29, Benoît Canet wrote:
> This metrics show how many percent of the time I/Os are put on hold by the
> throttling algorithm.
> This metric could be used by system administrators or cloud stack developpers
> to decide when the throttling settings must be changed.
>
> Signed-off-by: Benoit Canet <benoit@irqsave.net>
> ---
> block.c | 27 ++++++++++++++++++++++++++-
> block/qapi.c | 4 ++++
> hmp.c | 6 ++++--
> include/block/block_int.h | 2 ++
> qapi-schema.json | 4 +++-
> 5 files changed, 39 insertions(+), 4 deletions(-)
>
> diff --git a/block.c b/block.c
> index bb4f8e4..6bb8570 100644
> --- a/block.c
> +++ b/block.c
> @@ -118,12 +118,21 @@ int is_windows_drive(const char *filename)
> #endif
>
> /* throttling disk I/O limits */
> +static void bdrv_reset_throttling_metrics(BlockDriverState *bs)
> +{
> + /* iddle -> reset values */
> + bs->throttling_percentage = 0;
> + bs->full_since = 0;
> +}
> +
> void bdrv_io_limits_disable(BlockDriverState *bs)
> {
> bs->io_limits_enabled = false;
>
> while (qemu_co_enter_next(&bs->throttled_reqs)) {
> }
> +
> + bdrv_reset_throttling_metrics(bs);
> }
>
> static void bdrv_make_bps_buckets_leak(BlockDriverState *bs, int64_t delta)
> @@ -213,7 +222,8 @@ static void bdrv_make_iops_buckets_leak(BlockDriverState *bs, int64_t delta)
> static void bdrv_leak_if_needed(BlockDriverState *bs)
> {
> int64_t now;
> - int64_t delta;
> + int64_t delta; /* the delta that would be ideally the timer period */
> + int64_t delta_full; /* the delta where the bucket is full */
>
> if (!bs->must_leak) {
> return;
> @@ -223,6 +233,14 @@ static void bdrv_leak_if_needed(BlockDriverState *bs)
>
> now = qemu_get_clock_ns(rt_clock);
> delta = now - bs->previous_leak;
> + /* compute throttle percentage reflecting how long IO are hold on average */
> + if (bs->full_since) {
> + delta_full = now - bs->full_since;
> + bs->throttling_percentage = (delta_full * 100) / delta;
> + bs->full_since = 0;
If I understand it, the percentage is recalculated every leak check. So
it only reflects the instant io flow, instead of historical statistics?
But I think for system admin purpose, it's good to know a longer range
io activity character. Or do you think management tool should sample it?
> + } else {
> + bs->throttling_percentage = 0;
> + }
> bs->previous_leak = now;
>
> bdrv_make_bps_buckets_leak(bs, delta);
> @@ -260,6 +278,7 @@ static void bdrv_block_timer(void *opaque)
> /* disable throttling time on iddle for economy purpose */
> if (bdrv_throttling_is_iddle(bs)) {
> bdrv_block_timer_disable(bs);
> + bdrv_reset_throttling_metrics(bs);
> return;
> }
>
> @@ -280,6 +299,7 @@ static void bdrv_block_timer_enable(BlockDriverState *bs)
>
> bs->block_timer = qemu_new_timer_ns(vm_clock, bdrv_block_timer, bs);
> bs->previous_leak = qemu_get_clock_ns(rt_clock);
> + bdrv_reset_throttling_metrics(bs);
> qemu_mod_timer(bs->block_timer,
> qemu_get_clock_ns(vm_clock) +
> BLOCK_IO_THROTTLE_PERIOD);
> @@ -432,6 +452,11 @@ static void bdrv_io_limits_intercept(BlockDriverState *bs,
> * not full
> */
> while (bdrv_is_any_threshold_exceeded(bs, nb_sectors, is_write)) {
> + /* remember since when the code decided to block the first I/O */
> + if (qemu_co_queue_empty(&bs->throttled_reqs)) {
> + bs->full_since = qemu_get_clock_ns(rt_clock);
> + }
> +
> bdrv_leak_if_needed(bs);
> qemu_co_queue_wait_insert_head(&bs->throttled_reqs);
> bdrv_leak_if_needed(bs);
> diff --git a/block/qapi.c b/block/qapi.c
> index f81081c..bd1c6af 100644
> --- a/block/qapi.c
> +++ b/block/qapi.c
> @@ -263,6 +263,10 @@ void bdrv_query_info(BlockDriverState *bs,
> bs->io_limits.iops_sector_count;
> info->inserted->iops_sector_count =
> bs->io_limits.iops_sector_count;
> + info->inserted->has_throttling_percentage =
> + bs->throttling_percentage;
> + info->inserted->throttling_percentage =
> + bs->throttling_percentage;
> }
>
> bs0 = bs;
> diff --git a/hmp.c b/hmp.c
> index 3912305..9dc4862 100644
> --- a/hmp.c
> +++ b/hmp.c
> @@ -348,7 +348,8 @@ void hmp_info_block(Monitor *mon, const QDict *qdict)
> " iops_threshold=%" PRId64
> " iops_rd_threshold=%" PRId64
> " iops_wr_threshold=%" PRId64
> - " iops_sector_count=%" PRId64 "\n",
> + " iops_sector_count=%" PRId64
> + " throttling_percentage=%" PRId64 "\n",
> info->value->inserted->bps,
> info->value->inserted->bps_rd,
> info->value->inserted->bps_wr,
> @@ -361,7 +362,8 @@ void hmp_info_block(Monitor *mon, const QDict *qdict)
> info->value->inserted->iops_threshold,
> info->value->inserted->iops_rd_threshold,
> info->value->inserted->iops_wr_threshold,
> - info->value->inserted->iops_sector_count);
> + info->value->inserted->iops_sector_count,
> + info->value->inserted->throttling_percentage);
> } else {
> monitor_printf(mon, " [not inserted]");
> }
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index 74d7503..4487cd9 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -271,6 +271,8 @@ struct BlockDriverState {
> BlockIOLimit io_limits;
> BlockIOBaseValue leaky_buckets;
> int64_t previous_leak;
> + int64_t full_since;
> + int throttling_percentage;
> bool must_leak;
> CoQueue throttled_reqs;
> QEMUTimer *block_timer;
> diff --git a/qapi-schema.json b/qapi-schema.json
> index d579fda..14a02e7 100644
> --- a/qapi-schema.json
> +++ b/qapi-schema.json
> @@ -783,6 +783,8 @@
> #
> # @iops_sector_count: #optional an I/O size in sector (Since 1.6)
> #
> +# @throttling_percentage: #optional reflect throttling activity (Since 1.6)
> +#
> # Since: 0.14.0
> #
> # Notes: This interface is only found in @BlockInfo.
> @@ -797,7 +799,7 @@
> '*bps_threshold': 'int', '*bps_rd_threshold': 'int',
> '*bps_wr_threshold': 'int', '*iops_threshold': 'int',
> '*iops_rd_threshold': 'int', '*iops_wr_threshold': 'int',
> - '*iops_sector_count': 'int' } }
> + '*iops_sector_count': 'int', '*throttling_percentage': 'int' } }
>
> ##
> # @BlockDeviceIoStatus:
> --
> 1.7.10.4
>
>
--
Fam
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [PATCH V3 for-1.6 3/5] block: Add support for throttling burst threshold in QMP and the command line.
2013-07-23 16:29 ` [Qemu-devel] [PATCH V3 for-1.6 3/5] block: Add support for throttling burst threshold in QMP and the command line Benoît Canet
@ 2013-07-26 3:07 ` Fam Zheng
2013-07-26 19:24 ` Eric Blake
1 sibling, 0 replies; 20+ messages in thread
From: Fam Zheng @ 2013-07-26 3:07 UTC (permalink / raw)
To: Benoît Canet; +Cc: kwolf, qemu-devel, stefanha
On Tue, 07/23 18:29, Benoît Canet wrote:
> The thresholds of the leaky bucket algorithm can be used to allow some
> burstiness.
>
> Signed-off-by: Benoit Canet <benoit@irqsave.net>
> ---
> block/qapi.c | 24 +++++++++++++
> blockdev.c | 105 +++++++++++++++++++++++++++++++++++++++++++++++-------
> hmp.c | 32 +++++++++++++++--
> qapi-schema.json | 34 ++++++++++++++++--
> qemu-options.hx | 2 +-
> qmp-commands.hx | 30 ++++++++++++++--
> 6 files changed, 205 insertions(+), 22 deletions(-)
>
> diff --git a/block/qapi.c b/block/qapi.c
> index a4bc411..03f1604 100644
> --- a/block/qapi.c
> +++ b/block/qapi.c
> @@ -235,6 +235,30 @@ void bdrv_query_info(BlockDriverState *bs,
> bs->io_limits.iops[BLOCK_IO_LIMIT_READ];
> info->inserted->iops_wr =
> bs->io_limits.iops[BLOCK_IO_LIMIT_WRITE];
> + info->inserted->has_bps_threshold =
> + bs->io_limits.bps_threshold[BLOCK_IO_LIMIT_TOTAL];
> + info->inserted->bps_threshold =
> + bs->io_limits.bps_threshold[BLOCK_IO_LIMIT_TOTAL];
> + info->inserted->has_bps_rd_threshold =
> + bs->io_limits.bps_threshold[BLOCK_IO_LIMIT_READ];
> + info->inserted->bps_rd_threshold =
> + bs->io_limits.bps_threshold[BLOCK_IO_LIMIT_READ];
> + info->inserted->has_bps_wr_threshold =
> + bs->io_limits.bps_threshold[BLOCK_IO_LIMIT_WRITE];
> + info->inserted->bps_wr_threshold =
> + bs->io_limits.bps_threshold[BLOCK_IO_LIMIT_WRITE];
> + info->inserted->has_iops_threshold =
> + bs->io_limits.iops_threshold[BLOCK_IO_LIMIT_TOTAL];
> + info->inserted->iops_threshold =
> + bs->io_limits.iops_threshold[BLOCK_IO_LIMIT_TOTAL];
> + info->inserted->iops_rd_threshold =
> + bs->io_limits.iops_threshold[BLOCK_IO_LIMIT_READ];
> + info->inserted->has_iops_rd_threshold =
> + bs->io_limits.iops_threshold[BLOCK_IO_LIMIT_READ];
> + info->inserted->has_iops_wr_threshold =
> + bs->io_limits.iops_threshold[BLOCK_IO_LIMIT_WRITE];
> + info->inserted->iops_wr_threshold =
> + bs->io_limits.iops_threshold[BLOCK_IO_LIMIT_WRITE];
> }
>
> bs0 = bs;
> diff --git a/blockdev.c b/blockdev.c
> index 491e4d0..241ebe8 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -341,6 +341,17 @@ static bool do_check_io_limits(BlockIOLimit *io_limits, Error **errp)
> return false;
> }
>
> + if (io_limits->bps_threshold[BLOCK_IO_LIMIT_TOTAL] < 0 ||
> + io_limits->bps_threshold[BLOCK_IO_LIMIT_WRITE] < 0 ||
> + io_limits->bps_threshold[BLOCK_IO_LIMIT_READ] < 0 ||
> + io_limits->iops_threshold[BLOCK_IO_LIMIT_TOTAL] < 0 ||
> + io_limits->iops_threshold[BLOCK_IO_LIMIT_WRITE] < 0 ||
> + io_limits->iops_threshold[BLOCK_IO_LIMIT_READ] < 0) {
> + error_setg(errp,
> + "threshold values must be 0 or greater");
> + return false;
> + }
> +
> return true;
> }
>
> @@ -523,24 +534,34 @@ DriveInfo *drive_init(QemuOpts *all_opts, BlockInterfaceType block_default_type)
> qemu_opt_get_number(opts, "bps_rd", 0);
> io_limits.bps[BLOCK_IO_LIMIT_WRITE] =
> qemu_opt_get_number(opts, "bps_wr", 0);
> + /* bps thresholds */
> + io_limits.bps_threshold[BLOCK_IO_LIMIT_TOTAL] =
> + qemu_opt_get_number(opts, "bps_threshold",
> + io_limits.bps[BLOCK_IO_LIMIT_TOTAL] / THROTTLE_HZ);
> + io_limits.bps_threshold[BLOCK_IO_LIMIT_READ] =
> + qemu_opt_get_number(opts, "bps_rd_threshold",
> + io_limits.bps[BLOCK_IO_LIMIT_READ] / THROTTLE_HZ);
> + io_limits.bps_threshold[BLOCK_IO_LIMIT_WRITE] =
> + qemu_opt_get_number(opts, "bps_wr_threshold",
> + io_limits.bps[BLOCK_IO_LIMIT_WRITE] / THROTTLE_HZ);
> +
> io_limits.iops[BLOCK_IO_LIMIT_TOTAL] =
> qemu_opt_get_number(opts, "iops", 0);
> io_limits.iops[BLOCK_IO_LIMIT_READ] =
> qemu_opt_get_number(opts, "iops_rd", 0);
> io_limits.iops[BLOCK_IO_LIMIT_WRITE] =
> qemu_opt_get_number(opts, "iops_wr", 0);
> - io_limits.bps_threshold[BLOCK_IO_LIMIT_TOTAL] =
> - io_limits.bps[BLOCK_IO_LIMIT_TOTAL] / THROTTLE_HZ;
> - io_limits.bps_threshold[BLOCK_IO_LIMIT_READ] =
> - io_limits.bps[BLOCK_IO_LIMIT_READ] / THROTTLE_HZ;
> - io_limits.bps_threshold[BLOCK_IO_LIMIT_WRITE] =
> - io_limits.bps[BLOCK_IO_LIMIT_WRITE] / THROTTLE_HZ;
> - io_limits.iops_threshold[BLOCK_IO_LIMIT_TOTAL] =
> - io_limits.iops[BLOCK_IO_LIMIT_TOTAL] / THROTTLE_HZ;
> +
> + /* iops thresholds */
> + io_limits.iops_threshold[BLOCK_IO_LIMIT_TOTAL] =
> + qemu_opt_get_number(opts, "iops_threshold",
> + io_limits.iops[BLOCK_IO_LIMIT_TOTAL] / THROTTLE_HZ);
> io_limits.iops_threshold[BLOCK_IO_LIMIT_READ] =
> - io_limits.iops[BLOCK_IO_LIMIT_READ] / THROTTLE_HZ;
> - io_limits.iops_threshold[BLOCK_IO_LIMIT_WRITE] =
> - io_limits.iops[BLOCK_IO_LIMIT_WRITE] / THROTTLE_HZ;
> + qemu_opt_get_number(opts, "iops_rd_threshold",
> + io_limits.iops[BLOCK_IO_LIMIT_READ] / THROTTLE_HZ);
> + io_limits.iops_threshold[BLOCK_IO_LIMIT_WRITE] =
> + qemu_opt_get_number(opts, "iops_wr_threshold",
> + io_limits.iops[BLOCK_IO_LIMIT_WRITE] / THROTTLE_HZ);
>
> if (!do_check_io_limits(&io_limits, &error)) {
> error_report("%s", error_get_pretty(error));
> @@ -1224,8 +1245,22 @@ void qmp_change_blockdev(const char *device, const char *filename,
>
> /* throttling disk I/O limits */
> void qmp_block_set_io_throttle(const char *device, int64_t bps, int64_t bps_rd,
> - int64_t bps_wr, int64_t iops, int64_t iops_rd,
> - int64_t iops_wr, Error **errp)
> + int64_t bps_wr,
> + int64_t iops,
> + int64_t iops_rd,
> + int64_t iops_wr,
> + bool has_bps_threshold,
> + int64_t bps_threshold,
> + bool has_bps_rd_threshold,
> + int64_t bps_rd_threshold,
> + bool has_bps_wr_threshold,
> + int64_t bps_wr_threshold,
> + bool has_iops_threshold,
> + int64_t iops_threshold,
> + bool has_iops_rd_threshold,
> + int64_t iops_rd_threshold,
> + bool has_iops_wr_threshold,
> + int64_t iops_wr_threshold, Error **errp)
> {
> BlockIOLimit io_limits;
> BlockDriverState *bs;
> @@ -1249,6 +1284,26 @@ void qmp_block_set_io_throttle(const char *device, int64_t bps, int64_t bps_rd,
> io_limits.iops_threshold[BLOCK_IO_LIMIT_READ] = iops_rd / THROTTLE_HZ;
> io_limits.iops_threshold[BLOCK_IO_LIMIT_WRITE] = iops_wr / THROTTLE_HZ;
>
> + /* override them with givens values if present */
> + if (has_bps_threshold) {
> + io_limits.bps_threshold[BLOCK_IO_LIMIT_TOTAL] = bps_threshold;
> + }
> + if (has_bps_rd_threshold) {
> + io_limits.bps_threshold[BLOCK_IO_LIMIT_READ] = bps_rd_threshold;
> + }
> + if (has_bps_wr_threshold) {
> + io_limits.bps_threshold[BLOCK_IO_LIMIT_WRITE] = bps_wr_threshold;
> + }
> + if (has_iops_threshold) {
> + io_limits.iops_threshold[BLOCK_IO_LIMIT_TOTAL] = iops_threshold;
> + }
> + if (has_iops_rd_threshold) {
> + io_limits.iops_threshold[BLOCK_IO_LIMIT_READ] = iops_rd_threshold;
> + }
> + if (has_iops_wr_threshold) {
> + io_limits.iops_threshold[BLOCK_IO_LIMIT_WRITE] = iops_wr_threshold;
> + }
> +
> if (!do_check_io_limits(&io_limits, errp)) {
> return;
> }
> @@ -1916,6 +1971,18 @@ QemuOptsList qemu_common_drive_opts = {
> .type = QEMU_OPT_NUMBER,
> .help = "limit write operations per second",
> },{
> + .name = "iops_threshold",
> + .type = QEMU_OPT_NUMBER,
> + .help = "total I/O operations threshold",
> + },{
> + .name = "iops_rd_threshold",
> + .type = QEMU_OPT_NUMBER,
> + .help = "read operations threshold",
> + },{
> + .name = "iops_wr_threshold",
> + .type = QEMU_OPT_NUMBER,
> + .help = "write operations threshold",
> + },{
> .name = "bps",
> .type = QEMU_OPT_NUMBER,
> .help = "limit total bytes per second",
> @@ -1928,6 +1995,18 @@ QemuOptsList qemu_common_drive_opts = {
> .type = QEMU_OPT_NUMBER,
> .help = "limit write bytes per second",
> },{
> + .name = "bps_threshold",
> + .type = QEMU_OPT_NUMBER,
> + .help = "total bytes threshold",
> + },{
> + .name = "bps_rd_threshold",
> + .type = QEMU_OPT_NUMBER,
> + .help = "read bytes threshold",
> + },{
> + .name = "bps_wr_threshold",
> + .type = QEMU_OPT_NUMBER,
> + .help = "write bytes threshold",
> + },{
> .name = "copy-on-read",
> .type = QEMU_OPT_BOOL,
> .help = "copy read data from backing file into image file",
> diff --git a/hmp.c b/hmp.c
> index dc4d8d4..d75aa99 100644
> --- a/hmp.c
> +++ b/hmp.c
> @@ -340,14 +340,28 @@ void hmp_info_block(Monitor *mon, const QDict *qdict)
> {
> monitor_printf(mon, " I/O throttling: bps=%" PRId64
> " bps_rd=%" PRId64 " bps_wr=%" PRId64
> + " bps_threshold=%" PRId64
> + " bps_rd_threshold=%" PRId64
> + " bps_wr_threshold=%" PRId64
> " iops=%" PRId64 " iops_rd=%" PRId64
> - " iops_wr=%" PRId64 "\n",
> + " iops_wr=%" PRId64
> + " iops_threshold=%" PRId64
> + " iops_rd_threshold=%" PRId64
> + " iops_wr_threshold=%" PRId64 "\n",
> info->value->inserted->bps,
> info->value->inserted->bps_rd,
> info->value->inserted->bps_wr,
> + info->value->inserted->bps_threshold,
> + info->value->inserted->bps_rd_threshold,
> + info->value->inserted->bps_wr_threshold,
> info->value->inserted->iops,
> info->value->inserted->iops_rd,
> - info->value->inserted->iops_wr);
> + info->value->inserted->iops_wr,
> + info->value->inserted->iops_threshold,
> + info->value->inserted->iops_rd_threshold,
> + info->value->inserted->iops_wr_threshold);
> + } else {
> + monitor_printf(mon, " [not inserted]");
> }
>
> if (verbose) {
> @@ -1094,7 +1108,19 @@ void hmp_block_set_io_throttle(Monitor *mon, const QDict *qdict)
> qdict_get_int(qdict, "bps_wr"),
> qdict_get_int(qdict, "iops"),
> qdict_get_int(qdict, "iops_rd"),
> - qdict_get_int(qdict, "iops_wr"), &err);
> + qdict_get_int(qdict, "iops_wr"),
> + false, /* no burst threshold via QMP */
> + 0,
> + false,
> + 0,
> + false,
> + 0,
> + false,
> + 0,
> + false,
> + 0,
> + false,
> + 0, &err);
> hmp_handle_error(mon, &err);
> }
>
> diff --git a/qapi-schema.json b/qapi-schema.json
> index 7b9fef1..a6f3792 100644
> --- a/qapi-schema.json
> +++ b/qapi-schema.json
> @@ -769,6 +769,18 @@
> #
> # @image: the info of image used (since: 1.6)
> #
> +# @bps_threshold: #optional total threshold in bytes (Since 1.6)
Could you document how is @bps_threshold used and what is the difference
with @bps, please? Maybe it's obvious in "leaky bucket" context, but
users doesn't care the implementation algorithm, they may ask why they
need @bps and @bps_threshold?
And my understanding on this is: after setting @iops_threshold to 10000,
the very first 10000 ops are guaranteed not throttled, in just one shot.
After that, throttling algorithm is enabled, and @iops_threshold is
reset to ineffective. Is it so?
> +#
> +# @bps_rd_threshold: #optional read threshold in bytes (Since 1.6)
> +#
> +# @bps_wr_threshold: #optional write threshold in bytes (Since 1.6)
> +#
> +# @iops_threshold: #optional total I/O operations threshold (Since 1.6)
> +#
> +# @iops_rd_threshold: #optional read I/O operations threshold (Since 1.6)
> +#
> +# @iops_wr_threshold: #optional write I/O operations threshold (Since 1.6)
> +#
> # Since: 0.14.0
> #
> # Notes: This interface is only found in @BlockInfo.
> @@ -779,7 +791,10 @@
> 'encrypted': 'bool', 'encryption_key_missing': 'bool',
> 'bps': 'int', 'bps_rd': 'int', 'bps_wr': 'int',
> 'iops': 'int', 'iops_rd': 'int', 'iops_wr': 'int',
> - 'image': 'ImageInfo' } }
> + 'image': 'ImageInfo',
> + '*bps_threshold': 'int', '*bps_rd_threshold': 'int',
> + '*bps_wr_threshold': 'int', '*iops_threshold': 'int',
> + '*iops_rd_threshold': 'int', '*iops_wr_threshold': 'int' }}
>
> ##
> # @BlockDeviceIoStatus:
> @@ -2158,6 +2173,18 @@
> #
> # @iops_wr: write I/O operations per second
> #
> +# @bps_threshold: #optional total threshold in bytes (Since 1.6)
> +#
> +# @bps_rd_threshold: #optional read threshold in bytes (Since 1.6)
> +#
> +# @bps_wr_threshold: #optional write threshold in bytes (Since 1.6)
> +#
> +# @iops_threshold: #optional total I/O operations threshold (Since 1.6)
> +#
> +# @iops_rd_threshold: #optional read I/O operations threshold (Since 1.6)
> +#
> +# @iops_wr_threshold: #optional write I/O operations threshold (Since 1.6)
> +#
> # Returns: Nothing on success
> # If @device is not a valid block device, DeviceNotFound
> #
> @@ -2165,7 +2192,10 @@
> ##
> { 'command': 'block_set_io_throttle',
> 'data': { 'device': 'str', 'bps': 'int', 'bps_rd': 'int', 'bps_wr': 'int',
> - 'iops': 'int', 'iops_rd': 'int', 'iops_wr': 'int' } }
> + 'iops': 'int', 'iops_rd': 'int', 'iops_wr': 'int',
> + '*bps_threshold': 'int', '*bps_rd_threshold': 'int',
> + '*bps_wr_threshold': 'int', '*iops_threshold': 'int',
> + '*iops_rd_threshold': 'int', '*iops_wr_threshold': 'int' }}
>
> ##
> # @block-stream:
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 4e98b4f..4d5ee66 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -409,7 +409,7 @@ DEF("drive", HAS_ARG, QEMU_OPTION_drive,
> " [,cache=writethrough|writeback|none|directsync|unsafe][,format=f]\n"
> " [,serial=s][,addr=A][,id=name][,aio=threads|native]\n"
> " [,readonly=on|off][,copy-on-read=on|off]\n"
> - " [[,bps=b]|[[,bps_rd=r][,bps_wr=w]]][[,iops=i]|[[,iops_rd=r][,iops_wr=w]]\n"
> + " [[,bps=b]|[[,bps_rd=r][,bps_wr=w]]][[,iops=i]|[[,iops_rd=r][,iops_wr=w][,bps_threshold=bt]|[[,bps_rd_threshold=rt][,bps_wr_threshold=wt]]][[,iops_threshold=it]|[[,iops_rd_threshold=rt][,iops_wr_threshold=wt]]\n"
This line is so long, can you wrap it?
> " use 'file' as a drive image\n", QEMU_ARCH_ALL)
> STEXI
> @item -drive @var{option}[,@var{option}[,@var{option}[,...]]]
> diff --git a/qmp-commands.hx b/qmp-commands.hx
> index e075df4..11c638a 100644
> --- a/qmp-commands.hx
> +++ b/qmp-commands.hx
> @@ -1385,7 +1385,7 @@ EQMP
>
> {
> .name = "block_set_io_throttle",
> - .args_type = "device:B,bps:l,bps_rd:l,bps_wr:l,iops:l,iops_rd:l,iops_wr:l",
> + .args_type = "device:B,bps:l,bps_rd:l,bps_wr:l,iops:l,iops_rd:l,iops_wr:l,bps_threshold:l?,bps_rd_threshold:l?,bps_wr_threshold:l?,iops_threshold:l?,iops_rd_threshold:l?,iops_wr_threshold:l?",
Same here.
> .mhandler.cmd_new = qmp_marshal_input_block_set_io_throttle,
> },
>
> @@ -1400,10 +1400,16 @@ Arguments:
> - "device": device name (json-string)
> - "bps": total throughput limit in bytes per second(json-int)
> - "bps_rd": read throughput limit in bytes per second(json-int)
> -- "bps_wr": read throughput limit in bytes per second(json-int)
> +- "bps_wr": write throughput limit in bytes per second(json-int)
> - "iops": total I/O operations per second(json-int)
> - "iops_rd": read I/O operations per second(json-int)
> - "iops_wr": write I/O operations per second(json-int)
> +- "bps_threshold": total threshold in bytes(json-int)
> +- "bps_rd_threshold": read threshold in bytes(json-int)
> +- "bps_wr_threshold": write threshold in bytes(json-int)
> +- "iops_threshold": total I/O operations threshold(json-int)
> +- "iops_rd_threshold": read I/O operations threshold(json-int)
> +- "iops_wr_threshold": write I/O operations threshold(json-int)
>
> Example:
>
> @@ -1413,7 +1419,13 @@ Example:
> "bps_wr": "0",
> "iops": "0",
> "iops_rd": "0",
> - "iops_wr": "0" } }
> + "iops_wr": "0",
> + "bps_threshold": "8000000",
> + "bps_rd_threshold": "0",
> + "bps_wr_threshold": "0",
> + "iops_threshold": "0",
> + "iops_rd_threshold": "0",
> + "iops_wr_threshold": "0" } }
> <- { "return": {} }
>
> EQMP
> @@ -1754,6 +1766,12 @@ Each json-object contain the following:
> - "iops": limit total I/O operations per second (json-int)
> - "iops_rd": limit read operations per second (json-int)
> - "iops_wr": limit write operations per second (json-int)
> + - "bps_threshold": total threshold in bytes(json-int)
> + - "bps_rd_threshold": read threshold in bytes(json-int)
> + - "bps_wr_threshold": write threshold in bytes(json-int)
> + - "iops_threshold": total I/O operations threshold(json-int)
> + - "iops_rd_threshold": read I/O operations threshold(json-int)
> + - "iops_wr_threshold": write I/O operations threshold(json-int)
> - "image": the detail of the image, it is a json-object containing
> the following:
> - "filename": image file name (json-string)
> @@ -1823,6 +1841,12 @@ Example:
> "iops":1000000,
> "iops_rd":0,
> "iops_wr":0,
> + "bps_threshold": "8000000",
> + "bps_rd_threshold": "0",
> + "bps_wr_threshold": "0",
> + "iops_threshold": "0",
> + "iops_rd_threshold": "0",
> + "iops_wr_threshold": "0",
> "image":{
> "filename":"disks/test.qcow2",
> "format":"qcow2",
> --
> 1.7.10.4
>
>
--
Fam
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [PATCH V3 for-1.6 2/5] block: Modify the throttling code to implement the leaky bucket algorithm.
2013-07-23 16:29 ` [Qemu-devel] [PATCH V3 for-1.6 2/5] block: Modify the throttling code to implement the leaky bucket algorithm Benoît Canet
@ 2013-07-26 3:40 ` Fam Zheng
2013-07-26 9:48 ` Benoît Canet
0 siblings, 1 reply; 20+ messages in thread
From: Fam Zheng @ 2013-07-26 3:40 UTC (permalink / raw)
To: Benoît Canet; +Cc: kwolf, qemu-devel, stefanha
On Tue, 07/23 18:29, Benoît Canet wrote:
> This patch replace the previous algorithm by the well described leaky bucket
> algorithm: A bucket is filled by the incoming IOs and a periodic timer decrement
> the counter to make the bucket leak. When a given threshold is reached the
> bucket is full and the IOs are hold.
>
> In this patch the threshold is set to a default value to make the code behave
> like the previous implementation.
>
> In the next patch the threshold will be exposed in QMP to let the user control
> the burstiness of the throttling.
>
> Signed-off-by: Benoit Canet <benoit@irqsave.net>
> ---
> block.c | 454 +++++++++++++++++++++++++++------------------
> blockdev.c | 71 +++++--
> include/block/block_int.h | 15 +-
> 3 files changed, 339 insertions(+), 201 deletions(-)
>
> diff --git a/block.c b/block.c
> index dc72643..f1cd9c0 100644
> --- a/block.c
> +++ b/block.c
> @@ -86,13 +86,6 @@ static void coroutine_fn bdrv_co_do_rw(void *opaque);
> static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs,
> int64_t sector_num, int nb_sectors);
>
> -static bool bdrv_exceed_bps_limits(BlockDriverState *bs, int nb_sectors,
> - bool is_write, double elapsed_time, uint64_t *wait);
> -static bool bdrv_exceed_iops_limits(BlockDriverState *bs, bool is_write,
> - double elapsed_time, uint64_t *wait);
> -static bool bdrv_exceed_io_limits(BlockDriverState *bs, int nb_sectors,
> - bool is_write, int64_t *wait);
> -
> static QTAILQ_HEAD(, BlockDriverState) bdrv_states =
> QTAILQ_HEAD_INITIALIZER(bdrv_states);
>
> @@ -101,6 +94,8 @@ static QLIST_HEAD(, BlockDriver) bdrv_drivers =
>
> /* If non-zero, use only whitelisted block drivers */
> static int use_bdrv_whitelist;
> +/* boolean used to inform the throttling code that a bdrv_drain_all is issued */
> +static bool draining;
>
> #ifdef _WIN32
> static int is_windows_drive_prefix(const char *filename)
> @@ -129,28 +124,170 @@ void bdrv_io_limits_disable(BlockDriverState *bs)
>
> while (qemu_co_enter_next(&bs->throttled_reqs)) {
> }
> +}
>
> - if (bs->block_timer) {
> - qemu_del_timer(bs->block_timer);
> - qemu_free_timer(bs->block_timer);
> - bs->block_timer = NULL;
> +static void bdrv_make_bps_buckets_leak(BlockDriverState *bs, int64_t delta)
> +{
> + int64_t *bytes = bs->leaky_buckets.bytes;
> + int64_t read_leak, write_leak;
> +
> + /* the limit apply to both reads and writes */
> + if (bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL]) {
> + /* compute half the total leak */
> + int64_t leak = ((bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL] * delta) /
> + NANOSECONDS_PER_SECOND);
> + int remain = leak % 2;
> + leak /= 2;
> +
> + /* the read bucket is smaller than half the quantity to leak so take
> + * care adding the leak difference to write leak
> + */
> + if (bytes[BLOCK_IO_LIMIT_READ] <= leak) {
> + read_leak = bytes[BLOCK_IO_LIMIT_READ];
> + write_leak = 2 * leak + remain - bytes[BLOCK_IO_LIMIT_READ];
> + /* symetric case */
> + } else if (bytes[BLOCK_IO_LIMIT_WRITE] <= leak) {
> + write_leak = bytes[BLOCK_IO_LIMIT_WRITE];
> + read_leak = 2 * leak + remain - bytes[BLOCK_IO_LIMIT_WRITE];
> + /* both bucket above leak count use half the total leak for both */
> + } else {
> + write_leak = leak;
> + read_leak = leak + remain;
> + }
> + /* else we consider that limits are separated */
> + } else {
> + read_leak = (bs->io_limits.bps[BLOCK_IO_LIMIT_READ] * delta) /
> + NANOSECONDS_PER_SECOND;
> + write_leak = (bs->io_limits.bps[BLOCK_IO_LIMIT_WRITE] * delta) /
> + NANOSECONDS_PER_SECOND;
> + }
> +
> + /* make the buckets leak */
> + bytes[BLOCK_IO_LIMIT_READ] = MAX(bytes[BLOCK_IO_LIMIT_READ] - read_leak,
> + 0);
> + bytes[BLOCK_IO_LIMIT_WRITE] = MAX(bytes[BLOCK_IO_LIMIT_WRITE] - write_leak,
> + 0);
> +}
> +
> +static void bdrv_make_iops_buckets_leak(BlockDriverState *bs, int64_t delta)
> +{
> + double *ios = bs->leaky_buckets.ios;
> + int64_t read_leak, write_leak;
> +
> + /* the limit apply to both reads and writes */
> + if (bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL]) {
> + /* compute half the total leak */
> + int64_t leak = ((bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL] * delta) /
> + NANOSECONDS_PER_SECOND);
You have the total leak here...
> + int remain = leak % 2;
> + leak /= 2;
> +
> + /* the read bucket is smaller than half the quantity to leak so take
> + * care adding the leak difference to write leak
> + */
> + if (ios[BLOCK_IO_LIMIT_READ] <= leak) {
> + read_leak = ios[BLOCK_IO_LIMIT_READ];
> + write_leak = 2 * leak + remain - ios[BLOCK_IO_LIMIT_READ];
> + /* symetric case */
> + } else if (ios[BLOCK_IO_LIMIT_WRITE] <= leak) {
> + write_leak = ios[BLOCK_IO_LIMIT_WRITE];
> + read_leak = 2 * leak + remain - ios[BLOCK_IO_LIMIT_WRITE];
> + /* both bucket above leak count use half the total leak for both */
> + } else {
> + write_leak = leak;
> + read_leak = leak + remain;
> + }
I think it is easier to understand written like this:
int64_t total_leak = ((bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL] * delta) /
NANOSECONDS_PER_SECOND);
if (ios[BLOCK_IO_LIMIT_READ] <= total_leak / 2) {
read_leak = ios[BLOCK_IO_LIMIT_READ];
write_leak = total_leak - read_leak;
/* symetric case */
} else if (ios[BLOCK_IO_LIMIT_WRITE] <= total_leak / 2) {
write_leak = ios[BLOCK_IO_LIMIT_WRITE];
read_leak = total_leak - write_leak;
/* both bucket above leak count use half the total leak for both */
} else {
write_leak = total_leak / 2;
read_leak = (total_leak + 1) / 2;
}
> + /* else we consider that limits are separated */
> + } else {
> + read_leak = (bs->io_limits.iops[BLOCK_IO_LIMIT_READ] * delta) /
> + NANOSECONDS_PER_SECOND;
> + write_leak = (bs->io_limits.iops[BLOCK_IO_LIMIT_WRITE] * delta) /
> + NANOSECONDS_PER_SECOND;
> + }
> +
> + /* make the buckets leak */
> + ios[BLOCK_IO_LIMIT_READ] = MAX(ios[BLOCK_IO_LIMIT_READ] - read_leak, 0);
> + ios[BLOCK_IO_LIMIT_WRITE] = MAX(ios[BLOCK_IO_LIMIT_WRITE] - write_leak, 0);
> +}
> +
> +static void bdrv_leak_if_needed(BlockDriverState *bs)
> +{
> + int64_t now;
> + int64_t delta;
> +
> + if (!bs->must_leak) {
> + return;
> + }
> +
> + bs->must_leak = false;
> +
> + now = qemu_get_clock_ns(rt_clock);
> + delta = now - bs->previous_leak;
> + bs->previous_leak = now;
> +
> + bdrv_make_bps_buckets_leak(bs, delta);
> + bdrv_make_iops_buckets_leak(bs, delta);
> +}
> +
> +static void bdrv_block_timer_disable(BlockDriverState *bs)
> +{
> + if (!bs->block_timer) {
> + return;
> }
>
> - bs->slice_start = 0;
> - bs->slice_end = 0;
> + qemu_del_timer(bs->block_timer);
> + qemu_free_timer(bs->block_timer);
> + bs->block_timer = NULL;
> +}
> +
> +static bool bdrv_throttling_is_iddle(BlockDriverState *bs)
I don't quite understad the wording here, is iddle equivalent to idle?
> +{
> + int64_t delta = qemu_get_clock_ns(rt_clock) - bs->previous_leak;
> +
> + if (delta < BLOCK_IO_THROTTLE_PERIOD * 2) {
> + return false;
> + }
> +
> + /* iddle */
> + return true;
> }
>
> +/* This callback is the timer in charge of making the leaky buckets leak */
> static void bdrv_block_timer(void *opaque)
Will be more readable for me if you could rename it to
bdrv_clock_timer_cb.
> {
> BlockDriverState *bs = opaque;
>
> + /* disable throttling time on iddle for economy purpose */
> + if (bdrv_throttling_is_iddle(bs)) {
> + bdrv_block_timer_disable(bs);
> + return;
> + }
> +
> + /* rearm the timer */
> + qemu_mod_timer(bs->block_timer,
> + qemu_get_clock_ns(vm_clock) +
> + BLOCK_IO_THROTTLE_PERIOD);
> +
> + bs->must_leak = true;
> qemu_co_enter_next(&bs->throttled_reqs);
> }
>
> +static void bdrv_block_timer_enable(BlockDriverState *bs)
> +{
> + if (bs->block_timer) {
> + return;
> + }
> +
> + bs->block_timer = qemu_new_timer_ns(vm_clock, bdrv_block_timer, bs);
> + bs->previous_leak = qemu_get_clock_ns(rt_clock);
> + qemu_mod_timer(bs->block_timer,
> + qemu_get_clock_ns(vm_clock) +
> + BLOCK_IO_THROTTLE_PERIOD);
> +}
> +
> void bdrv_io_limits_enable(BlockDriverState *bs)
> {
> qemu_co_queue_init(&bs->throttled_reqs);
> - bs->block_timer = qemu_new_timer_ns(vm_clock, bdrv_block_timer, bs);
> bs->io_limits_enabled = true;
> }
>
> @@ -165,15 +302,118 @@ bool bdrv_io_limits_enabled(BlockDriverState *bs)
> || io_limits->iops[BLOCK_IO_LIMIT_TOTAL];
> }
>
> +/* This function check if the correct bandwith threshold has been exceeded
What does the "correct bandwidth threshold" mean?
And s/bandwith/bandwidth/, series wide.
> + *
> + * @is_write: true if the current IO is a write, false if it's a read
> + * @ret: true if threshold has been exceeded else false
> + */
> +static bool bdrv_is_bps_threshold_exceeded(BlockDriverState *bs, bool is_write)
> +{
> + /* limit is on total read + write bps : do the sum and compare with total
> + * threshold
> + */
> + if (bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL]) {
> + int64_t bytes = bs->leaky_buckets.bytes[BLOCK_IO_LIMIT_READ] +
> + bs->leaky_buckets.bytes[BLOCK_IO_LIMIT_WRITE];
> + return bs->io_limits.bps_threshold[BLOCK_IO_LIMIT_TOTAL] < bytes;
> + }
> +
> + /* check wether the threshold corresponding to the current io type (read,
> + * write) has been exceeded
> + */
> + if (bs->io_limits.bps[is_write]) {
It looks dangerous to use is_write as index of the array.
> + return bs->io_limits.bps_threshold[is_write] <
> + bs->leaky_buckets.bytes[is_write];
> + }
> +
> + /* no limit */
> + return false;
> +}
> +
> +/* This function check if the correct iops threshold has been exceeded
> + *
> + * @is_write: true if the current IO is a write, false if it's a read
> + * @ret: true if threshold has been exceeded else false
> + */
> +static bool bdrv_is_iops_threshold_exceeded(BlockDriverState *bs, bool is_write)
> +{
> + /* limit is on total read + write iops : do the sum and compare with total
> + * threshold
> + */
> + if (bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL]) {
> + double ios = bs->leaky_buckets.ios[BLOCK_IO_LIMIT_READ] +
> + bs->leaky_buckets.ios[BLOCK_IO_LIMIT_WRITE];
> + return bs->io_limits.iops_threshold[BLOCK_IO_LIMIT_TOTAL] < ios;
> + }
> +
> + /* check wether the threshold corresponding to the current io type (read,
> + * write) has been exceeded
> + */
> + if (bs->io_limits.iops[is_write]) {
> + return bs->io_limits.iops_threshold[is_write] <
> + bs->leaky_buckets.ios[is_write];
> + }
> +
> + /* no limit */
> + return false;
> +}
> +
> +/* This function check if any bandwith or iops threshold has been exceeded
> + *
> + * @nb_sectors: the number of sectors of the current IO
> + * @is_write: true if the current IO is a write, false if it's a read
> + * @ret: true if any threshold has been exceeded else false
> + */
> +static bool bdrv_is_any_threshold_exceeded(BlockDriverState *bs, int nb_sectors,
> + bool is_write)
> +{
> + bool bps_ret, iops_ret;
> +
> + /* check if any bandwith or per IO threshold has been exceeded */
> + bps_ret = bdrv_is_bps_threshold_exceeded(bs, is_write);
> + iops_ret = bdrv_is_iops_threshold_exceeded(bs, is_write);
> +
> + /* if so the IO will be blocked so do not account it and return true
> + * also return false if a bdrv_drain_all is in progress
> + */
> + if (!draining && (bps_ret || iops_ret)) {
> + return true;
> + }
> +
> + /* NOTE: the counter can go above the threshold when authorizing an IO.
> + * At next call the code will punish the guest by blocking the
> + * next IO until the counter has been decremented below the threshold.
> + * This way if a guest issue a jumbo IO bigger than the threshold it
> + * will have a chance no be authorized and will not result in a guest
> + * IO deadlock.
> + */
> +
> + /* the IO is authorized so do the accounting and return false */
> + bs->leaky_buckets.bytes[is_write] += (int64_t)nb_sectors *
> + BDRV_SECTOR_SIZE;
> + bs->leaky_buckets.ios[is_write]++;
> +
> + return false;
> +}
> +
> static void bdrv_io_limits_intercept(BlockDriverState *bs,
> bool is_write, int nb_sectors)
> {
> - int64_t wait_time = -1;
> + /* enable block timer if needed when intercepting I/Os */
> + if (!bs->block_timer) {
Already checking for bs->block_timer in bdrv_block_timer_enable().
> + bdrv_block_timer_enable(bs);
> + }
>
> + bdrv_leak_if_needed(bs);
> + /* if some IOs are already queued because the bucket is full put the current
> + * IO at the end of the queue (FIFO)
> + */
> if (!qemu_co_queue_empty(&bs->throttled_reqs)) {
> qemu_co_queue_wait(&bs->throttled_reqs);
> }
>
> + bdrv_leak_if_needed(bs);
> +
> /* In fact, we hope to keep each request's timing, in FIFO mode. The next
> * throttled requests will not be dequeued until the current request is
> * allowed to be serviced. So if the current request still exceeds the
> @@ -181,13 +421,19 @@ static void bdrv_io_limits_intercept(BlockDriverState *bs,
> * be still in throttled_reqs queue.
> */
>
> - while (bdrv_exceed_io_limits(bs, nb_sectors, is_write, &wait_time)) {
> - qemu_mod_timer(bs->block_timer,
> - wait_time + qemu_get_clock_ns(vm_clock));
> + /* if a threshold is exceeded the leaky bucket is full so the code put the
> + * IO in the throttle_reqs queue until the bucket has leaked enough to be
> + * not full
> + */
> + while (bdrv_is_any_threshold_exceeded(bs, nb_sectors, is_write)) {
> + bdrv_leak_if_needed(bs);
> qemu_co_queue_wait_insert_head(&bs->throttled_reqs);
> + bdrv_leak_if_needed(bs);
> }
>
> + bdrv_leak_if_needed(bs);
> qemu_co_queue_next(&bs->throttled_reqs);
> + bdrv_leak_if_needed(bs);
> }
>
> /* check if the path starts with "<protocol>:" */
> @@ -1439,6 +1685,9 @@ void bdrv_drain_all(void)
> BlockDriverState *bs;
> bool busy;
>
> + /* tell the throttling code we are draining */
> + draining = true;
> +
> do {
> busy = qemu_aio_wait();
>
> @@ -1457,6 +1706,8 @@ void bdrv_drain_all(void)
> assert(QLIST_EMPTY(&bs->tracked_requests));
> assert(qemu_co_queue_empty(&bs->throttled_reqs));
> }
> +
> + draining = false;
> }
>
> /* make a BlockDriverState anonymous by removing from bdrv_state list.
> @@ -1492,9 +1743,7 @@ static void bdrv_move_feature_fields(BlockDriverState *bs_dest,
> bs_dest->enable_write_cache = bs_src->enable_write_cache;
>
> /* i/o timing parameters */
> - bs_dest->slice_start = bs_src->slice_start;
> - bs_dest->slice_end = bs_src->slice_end;
> - bs_dest->slice_submitted = bs_src->slice_submitted;
> + bs_dest->leaky_buckets = bs_src->leaky_buckets;
> bs_dest->io_limits = bs_src->io_limits;
> bs_dest->throttled_reqs = bs_src->throttled_reqs;
> bs_dest->block_timer = bs_src->block_timer;
> @@ -3551,169 +3800,6 @@ void bdrv_aio_cancel(BlockDriverAIOCB *acb)
> acb->aiocb_info->cancel(acb);
> }
>
> -/* block I/O throttling */
> -static bool bdrv_exceed_bps_limits(BlockDriverState *bs, int nb_sectors,
> - bool is_write, double elapsed_time, uint64_t *wait)
> -{
> - uint64_t bps_limit = 0;
> - uint64_t extension;
> - double bytes_limit, bytes_base, bytes_res;
> - double slice_time, wait_time;
> -
> - if (bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL]) {
> - bps_limit = bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL];
> - } else if (bs->io_limits.bps[is_write]) {
> - bps_limit = bs->io_limits.bps[is_write];
> - } else {
> - if (wait) {
> - *wait = 0;
> - }
> -
> - return false;
> - }
> -
> - slice_time = bs->slice_end - bs->slice_start;
> - slice_time /= (NANOSECONDS_PER_SECOND);
> - bytes_limit = bps_limit * slice_time;
> - bytes_base = bs->slice_submitted.bytes[is_write];
> - if (bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL]) {
> - bytes_base += bs->slice_submitted.bytes[!is_write];
> - }
> -
> - /* bytes_base: the bytes of data which have been read/written; and
> - * it is obtained from the history statistic info.
> - * bytes_res: the remaining bytes of data which need to be read/written.
> - * (bytes_base + bytes_res) / bps_limit: used to calcuate
> - * the total time for completing reading/writting all data.
> - */
> - bytes_res = (unsigned) nb_sectors * BDRV_SECTOR_SIZE;
> -
> - if (bytes_base + bytes_res <= bytes_limit) {
> - if (wait) {
> - *wait = 0;
> - }
> -
> - return false;
> - }
> -
> - /* Calc approx time to dispatch */
> - wait_time = (bytes_base + bytes_res) / bps_limit - elapsed_time;
> -
> - /* When the I/O rate at runtime exceeds the limits,
> - * bs->slice_end need to be extended in order that the current statistic
> - * info can be kept until the timer fire, so it is increased and tuned
> - * based on the result of experiment.
> - */
> - extension = wait_time * NANOSECONDS_PER_SECOND;
> - extension = DIV_ROUND_UP(extension, BLOCK_IO_SLICE_TIME) *
> - BLOCK_IO_SLICE_TIME;
> - bs->slice_end += extension;
> - if (wait) {
> - *wait = wait_time * NANOSECONDS_PER_SECOND;
> - }
> -
> - return true;
> -}
> -
> -static bool bdrv_exceed_iops_limits(BlockDriverState *bs, bool is_write,
> - double elapsed_time, uint64_t *wait)
> -{
> - uint64_t iops_limit = 0;
> - double ios_limit, ios_base;
> - double slice_time, wait_time;
> -
> - if (bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL]) {
> - iops_limit = bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL];
> - } else if (bs->io_limits.iops[is_write]) {
> - iops_limit = bs->io_limits.iops[is_write];
> - } else {
> - if (wait) {
> - *wait = 0;
> - }
> -
> - return false;
> - }
> -
> - slice_time = bs->slice_end - bs->slice_start;
> - slice_time /= (NANOSECONDS_PER_SECOND);
> - ios_limit = iops_limit * slice_time;
> - ios_base = bs->slice_submitted.ios[is_write];
> - if (bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL]) {
> - ios_base += bs->slice_submitted.ios[!is_write];
> - }
> -
> - if (ios_base + 1 <= ios_limit) {
> - if (wait) {
> - *wait = 0;
> - }
> -
> - return false;
> - }
> -
> - /* Calc approx time to dispatch, in seconds */
> - wait_time = (ios_base + 1) / iops_limit;
> - if (wait_time > elapsed_time) {
> - wait_time = wait_time - elapsed_time;
> - } else {
> - wait_time = 0;
> - }
> -
> - /* Exceeded current slice, extend it by another slice time */
> - bs->slice_end += BLOCK_IO_SLICE_TIME;
> - if (wait) {
> - *wait = wait_time * NANOSECONDS_PER_SECOND;
> - }
> -
> - return true;
> -}
> -
> -static bool bdrv_exceed_io_limits(BlockDriverState *bs, int nb_sectors,
> - bool is_write, int64_t *wait)
> -{
> - int64_t now, max_wait;
> - uint64_t bps_wait = 0, iops_wait = 0;
> - double elapsed_time;
> - int bps_ret, iops_ret;
> -
> - now = qemu_get_clock_ns(vm_clock);
> - if (now > bs->slice_end) {
> - bs->slice_start = now;
> - bs->slice_end = now + BLOCK_IO_SLICE_TIME;
> - memset(&bs->slice_submitted, 0, sizeof(bs->slice_submitted));
> - }
> -
> - elapsed_time = now - bs->slice_start;
> - elapsed_time /= (NANOSECONDS_PER_SECOND);
> -
> - bps_ret = bdrv_exceed_bps_limits(bs, nb_sectors,
> - is_write, elapsed_time, &bps_wait);
> - iops_ret = bdrv_exceed_iops_limits(bs, is_write,
> - elapsed_time, &iops_wait);
> - if (bps_ret || iops_ret) {
> - max_wait = bps_wait > iops_wait ? bps_wait : iops_wait;
> - if (wait) {
> - *wait = max_wait;
> - }
> -
> - now = qemu_get_clock_ns(vm_clock);
> - if (bs->slice_end < now + max_wait) {
> - bs->slice_end = now + max_wait;
> - }
> -
> - return true;
> - }
> -
> - if (wait) {
> - *wait = 0;
> - }
> -
> - bs->slice_submitted.bytes[is_write] += (int64_t)nb_sectors *
> - BDRV_SECTOR_SIZE;
> - bs->slice_submitted.ios[is_write]++;
> -
> - return false;
> -}
> -
> /**************************************************************/
> /* async block device emulation */
>
> diff --git a/blockdev.c b/blockdev.c
> index c5abd65..491e4d0 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -280,10 +280,25 @@ static int parse_block_error_action(const char *buf, bool is_read)
> }
> }
>
> +static bool check_io_limit(int64_t limit)
> +{
> + if (!limit) {
> + return false;
> + }
> +
> + if (limit < (THROTTLE_HZ * 2)) {
> + return true;
> + }
> +
> + return false;
> +}
> +
> static bool do_check_io_limits(BlockIOLimit *io_limits, Error **errp)
> {
> bool bps_flag;
> bool iops_flag;
> + bool bps_threshold_flag;
> + bool iops_threshold_flag;
>
> assert(io_limits);
>
> @@ -299,13 +314,30 @@ static bool do_check_io_limits(BlockIOLimit *io_limits, Error **errp)
> return false;
> }
>
> - if (io_limits->bps[BLOCK_IO_LIMIT_TOTAL] < 0 ||
> - io_limits->bps[BLOCK_IO_LIMIT_WRITE] < 0 ||
> - io_limits->bps[BLOCK_IO_LIMIT_READ] < 0 ||
> - io_limits->iops[BLOCK_IO_LIMIT_TOTAL] < 0 ||
> - io_limits->iops[BLOCK_IO_LIMIT_WRITE] < 0 ||
> - io_limits->iops[BLOCK_IO_LIMIT_READ] < 0) {
> - error_setg(errp, "bps and iops values must be 0 or greater");
> + bps_threshold_flag =
> + (io_limits->bps_threshold[BLOCK_IO_LIMIT_TOTAL] != 0)
> + && ((io_limits->bps_threshold[BLOCK_IO_LIMIT_READ] != 0)
> + || (io_limits->bps_threshold[BLOCK_IO_LIMIT_WRITE] != 0));
> + iops_threshold_flag =
> + (io_limits->iops_threshold[BLOCK_IO_LIMIT_TOTAL] != 0)
> + && ((io_limits->iops_threshold[BLOCK_IO_LIMIT_READ] != 0)
> + || (io_limits->iops_threshold[BLOCK_IO_LIMIT_WRITE] != 0));
> + if (bps_threshold_flag || iops_threshold_flag) {
> + error_setg(errp, "bps_threshold(iops_threshold) and "
> + "bps_rd_threshold/bps_wr_threshold"
> + "(iops_rd_threshold/iops_wr_threshold) "
> + "cannot be used at the same time");
> + return false;
> + }
> +
> + if (check_io_limit(io_limits->bps[BLOCK_IO_LIMIT_TOTAL]) ||
> + check_io_limit(io_limits->bps[BLOCK_IO_LIMIT_WRITE]) ||
> + check_io_limit(io_limits->bps[BLOCK_IO_LIMIT_READ]) ||
> + check_io_limit(io_limits->iops[BLOCK_IO_LIMIT_TOTAL]) ||
> + check_io_limit(io_limits->iops[BLOCK_IO_LIMIT_WRITE]) ||
> + check_io_limit(io_limits->iops[BLOCK_IO_LIMIT_READ])) {
> + error_setg(errp, "bps and iops values must be %i or greater",
> + THROTTLE_HZ * 2);
> return false;
> }
>
> @@ -497,6 +529,18 @@ DriveInfo *drive_init(QemuOpts *all_opts, BlockInterfaceType block_default_type)
> qemu_opt_get_number(opts, "iops_rd", 0);
> io_limits.iops[BLOCK_IO_LIMIT_WRITE] =
> qemu_opt_get_number(opts, "iops_wr", 0);
> + io_limits.bps_threshold[BLOCK_IO_LIMIT_TOTAL] =
> + io_limits.bps[BLOCK_IO_LIMIT_TOTAL] / THROTTLE_HZ;
> + io_limits.bps_threshold[BLOCK_IO_LIMIT_READ] =
> + io_limits.bps[BLOCK_IO_LIMIT_READ] / THROTTLE_HZ;
> + io_limits.bps_threshold[BLOCK_IO_LIMIT_WRITE] =
> + io_limits.bps[BLOCK_IO_LIMIT_WRITE] / THROTTLE_HZ;
> + io_limits.iops_threshold[BLOCK_IO_LIMIT_TOTAL] =
> + io_limits.iops[BLOCK_IO_LIMIT_TOTAL] / THROTTLE_HZ;
> + io_limits.iops_threshold[BLOCK_IO_LIMIT_READ] =
> + io_limits.iops[BLOCK_IO_LIMIT_READ] / THROTTLE_HZ;
> + io_limits.iops_threshold[BLOCK_IO_LIMIT_WRITE] =
> + io_limits.iops[BLOCK_IO_LIMIT_WRITE] / THROTTLE_HZ;
>
> if (!do_check_io_limits(&io_limits, &error)) {
> error_report("%s", error_get_pretty(error));
> @@ -1198,6 +1242,12 @@ void qmp_block_set_io_throttle(const char *device, int64_t bps, int64_t bps_rd,
> io_limits.iops[BLOCK_IO_LIMIT_TOTAL]= iops;
> io_limits.iops[BLOCK_IO_LIMIT_READ] = iops_rd;
> io_limits.iops[BLOCK_IO_LIMIT_WRITE]= iops_wr;
> + io_limits.bps_threshold[BLOCK_IO_LIMIT_TOTAL] = bps / THROTTLE_HZ;
> + io_limits.bps_threshold[BLOCK_IO_LIMIT_READ] = bps_rd / THROTTLE_HZ;
> + io_limits.bps_threshold[BLOCK_IO_LIMIT_WRITE] = bps_wr / THROTTLE_HZ;
> + io_limits.iops_threshold[BLOCK_IO_LIMIT_TOTAL] = iops / THROTTLE_HZ;
> + io_limits.iops_threshold[BLOCK_IO_LIMIT_READ] = iops_rd / THROTTLE_HZ;
> + io_limits.iops_threshold[BLOCK_IO_LIMIT_WRITE] = iops_wr / THROTTLE_HZ;
>
> if (!do_check_io_limits(&io_limits, errp)) {
> return;
> @@ -1209,11 +1259,10 @@ void qmp_block_set_io_throttle(const char *device, int64_t bps, int64_t bps_rd,
> bdrv_io_limits_enable(bs);
> } else if (bs->io_limits_enabled && !bdrv_io_limits_enabled(bs)) {
> bdrv_io_limits_disable(bs);
> - } else {
> - if (bs->block_timer) {
> - qemu_mod_timer(bs->block_timer, qemu_get_clock_ns(vm_clock));
> - }
> }
> +
> + /* reset leaky bucket to get the system in a known state */
> + memset(&bs->leaky_buckets, 0, sizeof(bs->leaky_buckets));
> }
>
> int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data)
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index c6ac871..e32ad1f 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -43,8 +43,9 @@
> #define BLOCK_IO_LIMIT_WRITE 1
> #define BLOCK_IO_LIMIT_TOTAL 2
>
> -#define BLOCK_IO_SLICE_TIME 100000000
> #define NANOSECONDS_PER_SECOND 1000000000.0
> +#define THROTTLE_HZ 1
> +#define BLOCK_IO_THROTTLE_PERIOD (NANOSECONDS_PER_SECOND / THROTTLE_HZ)
>
> #define BLOCK_OPT_SIZE "size"
> #define BLOCK_OPT_ENCRYPT "encryption"
> @@ -73,11 +74,13 @@ typedef struct BdrvTrackedRequest {
> typedef struct BlockIOLimit {
> int64_t bps[3];
> int64_t iops[3];
> + int64_t bps_threshold[3];
> + int64_t iops_threshold[3];
> } BlockIOLimit;
>
> typedef struct BlockIOBaseValue {
> - uint64_t bytes[2];
> - uint64_t ios[2];
> + int64_t bytes[2];
> + double ios[2];
> } BlockIOBaseValue;
>
> struct BlockDriver {
> @@ -264,10 +267,10 @@ struct BlockDriverState {
> unsigned int copy_on_read_in_flight;
>
> /* the time for latest disk I/O */
> - int64_t slice_start;
> - int64_t slice_end;
> BlockIOLimit io_limits;
> - BlockIOBaseValue slice_submitted;
> + BlockIOBaseValue leaky_buckets;
> + int64_t previous_leak;
> + bool must_leak;
> CoQueue throttled_reqs;
> QEMUTimer *block_timer;
> bool io_limits_enabled;
> --
> 1.7.10.4
>
>
--
Fam
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [PATCH V3 for-1.6 2/5] block: Modify the throttling code to implement the leaky bucket algorithm.
2013-07-26 3:40 ` Fam Zheng
@ 2013-07-26 9:48 ` Benoît Canet
0 siblings, 0 replies; 20+ messages in thread
From: Benoît Canet @ 2013-07-26 9:48 UTC (permalink / raw)
To: Fam Zheng; +Cc: kwolf, qemu-devel, stefanha
> I think it is easier to understand written like this:
>
> int64_t total_leak = ((bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL] * delta) /
> NANOSECONDS_PER_SECOND);
> if (ios[BLOCK_IO_LIMIT_READ] <= total_leak / 2) {
> read_leak = ios[BLOCK_IO_LIMIT_READ];
> write_leak = total_leak - read_leak;
> /* symetric case */
> } else if (ios[BLOCK_IO_LIMIT_WRITE] <= total_leak / 2) {
> write_leak = ios[BLOCK_IO_LIMIT_WRITE];
> read_leak = total_leak - write_leak;
> /* both bucket above leak count use half the total leak for both */
> } else {
> write_leak = total_leak / 2;
> read_leak = (total_leak + 1) / 2;
> }
Thanks,
I will propagate these changes in the new infinite HZ algorithm I am currently
writing.
>
> > + /* else we consider that limits are separated */
> > + } else {
> > + read_leak = (bs->io_limits.iops[BLOCK_IO_LIMIT_READ] * delta) /
> > + NANOSECONDS_PER_SECOND;
> > + write_leak = (bs->io_limits.iops[BLOCK_IO_LIMIT_WRITE] * delta) /
> > + NANOSECONDS_PER_SECOND;
> > + }
> > +
> > + /* make the buckets leak */
> > + ios[BLOCK_IO_LIMIT_READ] = MAX(ios[BLOCK_IO_LIMIT_READ] - read_leak, 0);
> > + ios[BLOCK_IO_LIMIT_WRITE] = MAX(ios[BLOCK_IO_LIMIT_WRITE] - write_leak, 0);
> > +}
> > +
> > +static void bdrv_leak_if_needed(BlockDriverState *bs)
> > +{
> > + int64_t now;
> > + int64_t delta;
> > +
> > + if (!bs->must_leak) {
> > + return;
> > + }
> > +
> > + bs->must_leak = false;
> > +
> > + now = qemu_get_clock_ns(rt_clock);
> > + delta = now - bs->previous_leak;
> > + bs->previous_leak = now;
> > +
> > + bdrv_make_bps_buckets_leak(bs, delta);
> > + bdrv_make_iops_buckets_leak(bs, delta);
> > +}
> > +
> > +static void bdrv_block_timer_disable(BlockDriverState *bs)
> > +{
> > + if (!bs->block_timer) {
> > + return;
> > }
> >
> > - bs->slice_start = 0;
> > - bs->slice_end = 0;
> > + qemu_del_timer(bs->block_timer);
> > + qemu_free_timer(bs->block_timer);
> > + bs->block_timer = NULL;
> > +}
> > +
> > +static bool bdrv_throttling_is_iddle(BlockDriverState *bs)
>
> I don't quite understad the wording here, is iddle equivalent to idle?
>
> > +{
> > + int64_t delta = qemu_get_clock_ns(rt_clock) - bs->previous_leak;
> > +
> > + if (delta < BLOCK_IO_THROTTLE_PERIOD * 2) {
> > + return false;
> > + }
> > +
> > + /* iddle */
> > + return true;
> > }
> >
> > +/* This callback is the timer in charge of making the leaky buckets leak */
> > static void bdrv_block_timer(void *opaque)
>
> Will be more readable for me if you could rename it to
> bdrv_clock_timer_cb.
>
> > {
> > BlockDriverState *bs = opaque;
> >
> > + /* disable throttling time on iddle for economy purpose */
> > + if (bdrv_throttling_is_iddle(bs)) {
> > + bdrv_block_timer_disable(bs);
> > + return;
> > + }
> > +
> > + /* rearm the timer */
> > + qemu_mod_timer(bs->block_timer,
> > + qemu_get_clock_ns(vm_clock) +
> > + BLOCK_IO_THROTTLE_PERIOD);
> > +
> > + bs->must_leak = true;
> > qemu_co_enter_next(&bs->throttled_reqs);
> > }
> >
> > +static void bdrv_block_timer_enable(BlockDriverState *bs)
> > +{
> > + if (bs->block_timer) {
> > + return;
> > + }
> > +
> > + bs->block_timer = qemu_new_timer_ns(vm_clock, bdrv_block_timer, bs);
> > + bs->previous_leak = qemu_get_clock_ns(rt_clock);
> > + qemu_mod_timer(bs->block_timer,
> > + qemu_get_clock_ns(vm_clock) +
> > + BLOCK_IO_THROTTLE_PERIOD);
> > +}
> > +
> > void bdrv_io_limits_enable(BlockDriverState *bs)
> > {
> > qemu_co_queue_init(&bs->throttled_reqs);
> > - bs->block_timer = qemu_new_timer_ns(vm_clock, bdrv_block_timer, bs);
> > bs->io_limits_enabled = true;
> > }
> >
> > @@ -165,15 +302,118 @@ bool bdrv_io_limits_enabled(BlockDriverState *bs)
> > || io_limits->iops[BLOCK_IO_LIMIT_TOTAL];
> > }
> >
> > +/* This function check if the correct bandwith threshold has been exceeded
>
> What does the "correct bandwidth threshold" mean?
>
> And s/bandwith/bandwidth/, series wide.
>
> > + *
> > + * @is_write: true if the current IO is a write, false if it's a read
> > + * @ret: true if threshold has been exceeded else false
> > + */
> > +static bool bdrv_is_bps_threshold_exceeded(BlockDriverState *bs, bool is_write)
> > +{
> > + /* limit is on total read + write bps : do the sum and compare with total
> > + * threshold
> > + */
> > + if (bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL]) {
> > + int64_t bytes = bs->leaky_buckets.bytes[BLOCK_IO_LIMIT_READ] +
> > + bs->leaky_buckets.bytes[BLOCK_IO_LIMIT_WRITE];
> > + return bs->io_limits.bps_threshold[BLOCK_IO_LIMIT_TOTAL] < bytes;
> > + }
> > +
> > + /* check wether the threshold corresponding to the current io type (read,
> > + * write) has been exceeded
> > + */
> > + if (bs->io_limits.bps[is_write]) {
>
> It looks dangerous to use is_write as index of the array.
>
> > + return bs->io_limits.bps_threshold[is_write] <
> > + bs->leaky_buckets.bytes[is_write];
> > + }
> > +
> > + /* no limit */
> > + return false;
> > +}
> > +
> > +/* This function check if the correct iops threshold has been exceeded
> > + *
> > + * @is_write: true if the current IO is a write, false if it's a read
> > + * @ret: true if threshold has been exceeded else false
> > + */
> > +static bool bdrv_is_iops_threshold_exceeded(BlockDriverState *bs, bool is_write)
> > +{
> > + /* limit is on total read + write iops : do the sum and compare with total
> > + * threshold
> > + */
> > + if (bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL]) {
> > + double ios = bs->leaky_buckets.ios[BLOCK_IO_LIMIT_READ] +
> > + bs->leaky_buckets.ios[BLOCK_IO_LIMIT_WRITE];
> > + return bs->io_limits.iops_threshold[BLOCK_IO_LIMIT_TOTAL] < ios;
> > + }
> > +
> > + /* check wether the threshold corresponding to the current io type (read,
> > + * write) has been exceeded
> > + */
> > + if (bs->io_limits.iops[is_write]) {
> > + return bs->io_limits.iops_threshold[is_write] <
> > + bs->leaky_buckets.ios[is_write];
> > + }
> > +
> > + /* no limit */
> > + return false;
> > +}
> > +
> > +/* This function check if any bandwith or iops threshold has been exceeded
> > + *
> > + * @nb_sectors: the number of sectors of the current IO
> > + * @is_write: true if the current IO is a write, false if it's a read
> > + * @ret: true if any threshold has been exceeded else false
> > + */
> > +static bool bdrv_is_any_threshold_exceeded(BlockDriverState *bs, int nb_sectors,
> > + bool is_write)
> > +{
> > + bool bps_ret, iops_ret;
> > +
> > + /* check if any bandwith or per IO threshold has been exceeded */
> > + bps_ret = bdrv_is_bps_threshold_exceeded(bs, is_write);
> > + iops_ret = bdrv_is_iops_threshold_exceeded(bs, is_write);
> > +
> > + /* if so the IO will be blocked so do not account it and return true
> > + * also return false if a bdrv_drain_all is in progress
> > + */
> > + if (!draining && (bps_ret || iops_ret)) {
> > + return true;
> > + }
> > +
> > + /* NOTE: the counter can go above the threshold when authorizing an IO.
> > + * At next call the code will punish the guest by blocking the
> > + * next IO until the counter has been decremented below the threshold.
> > + * This way if a guest issue a jumbo IO bigger than the threshold it
> > + * will have a chance no be authorized and will not result in a guest
> > + * IO deadlock.
> > + */
> > +
> > + /* the IO is authorized so do the accounting and return false */
> > + bs->leaky_buckets.bytes[is_write] += (int64_t)nb_sectors *
> > + BDRV_SECTOR_SIZE;
> > + bs->leaky_buckets.ios[is_write]++;
> > +
> > + return false;
> > +}
> > +
> > static void bdrv_io_limits_intercept(BlockDriverState *bs,
> > bool is_write, int nb_sectors)
> > {
> > - int64_t wait_time = -1;
> > + /* enable block timer if needed when intercepting I/Os */
> > + if (!bs->block_timer) {
>
> Already checking for bs->block_timer in bdrv_block_timer_enable().
>
> > + bdrv_block_timer_enable(bs);
> > + }
> >
> > + bdrv_leak_if_needed(bs);
> > + /* if some IOs are already queued because the bucket is full put the current
> > + * IO at the end of the queue (FIFO)
> > + */
> > if (!qemu_co_queue_empty(&bs->throttled_reqs)) {
> > qemu_co_queue_wait(&bs->throttled_reqs);
> > }
> >
> > + bdrv_leak_if_needed(bs);
> > +
> > /* In fact, we hope to keep each request's timing, in FIFO mode. The next
> > * throttled requests will not be dequeued until the current request is
> > * allowed to be serviced. So if the current request still exceeds the
> > @@ -181,13 +421,19 @@ static void bdrv_io_limits_intercept(BlockDriverState *bs,
> > * be still in throttled_reqs queue.
> > */
> >
> > - while (bdrv_exceed_io_limits(bs, nb_sectors, is_write, &wait_time)) {
> > - qemu_mod_timer(bs->block_timer,
> > - wait_time + qemu_get_clock_ns(vm_clock));
> > + /* if a threshold is exceeded the leaky bucket is full so the code put the
> > + * IO in the throttle_reqs queue until the bucket has leaked enough to be
> > + * not full
> > + */
> > + while (bdrv_is_any_threshold_exceeded(bs, nb_sectors, is_write)) {
> > + bdrv_leak_if_needed(bs);
> > qemu_co_queue_wait_insert_head(&bs->throttled_reqs);
> > + bdrv_leak_if_needed(bs);
> > }
> >
> > + bdrv_leak_if_needed(bs);
> > qemu_co_queue_next(&bs->throttled_reqs);
> > + bdrv_leak_if_needed(bs);
> > }
> >
> > /* check if the path starts with "<protocol>:" */
> > @@ -1439,6 +1685,9 @@ void bdrv_drain_all(void)
> > BlockDriverState *bs;
> > bool busy;
> >
> > + /* tell the throttling code we are draining */
> > + draining = true;
> > +
> > do {
> > busy = qemu_aio_wait();
> >
> > @@ -1457,6 +1706,8 @@ void bdrv_drain_all(void)
> > assert(QLIST_EMPTY(&bs->tracked_requests));
> > assert(qemu_co_queue_empty(&bs->throttled_reqs));
> > }
> > +
> > + draining = false;
> > }
> >
> > /* make a BlockDriverState anonymous by removing from bdrv_state list.
> > @@ -1492,9 +1743,7 @@ static void bdrv_move_feature_fields(BlockDriverState *bs_dest,
> > bs_dest->enable_write_cache = bs_src->enable_write_cache;
> >
> > /* i/o timing parameters */
> > - bs_dest->slice_start = bs_src->slice_start;
> > - bs_dest->slice_end = bs_src->slice_end;
> > - bs_dest->slice_submitted = bs_src->slice_submitted;
> > + bs_dest->leaky_buckets = bs_src->leaky_buckets;
> > bs_dest->io_limits = bs_src->io_limits;
> > bs_dest->throttled_reqs = bs_src->throttled_reqs;
> > bs_dest->block_timer = bs_src->block_timer;
> > @@ -3551,169 +3800,6 @@ void bdrv_aio_cancel(BlockDriverAIOCB *acb)
> > acb->aiocb_info->cancel(acb);
> > }
> >
> > -/* block I/O throttling */
> > -static bool bdrv_exceed_bps_limits(BlockDriverState *bs, int nb_sectors,
> > - bool is_write, double elapsed_time, uint64_t *wait)
> > -{
> > - uint64_t bps_limit = 0;
> > - uint64_t extension;
> > - double bytes_limit, bytes_base, bytes_res;
> > - double slice_time, wait_time;
> > -
> > - if (bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL]) {
> > - bps_limit = bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL];
> > - } else if (bs->io_limits.bps[is_write]) {
> > - bps_limit = bs->io_limits.bps[is_write];
> > - } else {
> > - if (wait) {
> > - *wait = 0;
> > - }
> > -
> > - return false;
> > - }
> > -
> > - slice_time = bs->slice_end - bs->slice_start;
> > - slice_time /= (NANOSECONDS_PER_SECOND);
> > - bytes_limit = bps_limit * slice_time;
> > - bytes_base = bs->slice_submitted.bytes[is_write];
> > - if (bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL]) {
> > - bytes_base += bs->slice_submitted.bytes[!is_write];
> > - }
> > -
> > - /* bytes_base: the bytes of data which have been read/written; and
> > - * it is obtained from the history statistic info.
> > - * bytes_res: the remaining bytes of data which need to be read/written.
> > - * (bytes_base + bytes_res) / bps_limit: used to calcuate
> > - * the total time for completing reading/writting all data.
> > - */
> > - bytes_res = (unsigned) nb_sectors * BDRV_SECTOR_SIZE;
> > -
> > - if (bytes_base + bytes_res <= bytes_limit) {
> > - if (wait) {
> > - *wait = 0;
> > - }
> > -
> > - return false;
> > - }
> > -
> > - /* Calc approx time to dispatch */
> > - wait_time = (bytes_base + bytes_res) / bps_limit - elapsed_time;
> > -
> > - /* When the I/O rate at runtime exceeds the limits,
> > - * bs->slice_end need to be extended in order that the current statistic
> > - * info can be kept until the timer fire, so it is increased and tuned
> > - * based on the result of experiment.
> > - */
> > - extension = wait_time * NANOSECONDS_PER_SECOND;
> > - extension = DIV_ROUND_UP(extension, BLOCK_IO_SLICE_TIME) *
> > - BLOCK_IO_SLICE_TIME;
> > - bs->slice_end += extension;
> > - if (wait) {
> > - *wait = wait_time * NANOSECONDS_PER_SECOND;
> > - }
> > -
> > - return true;
> > -}
> > -
> > -static bool bdrv_exceed_iops_limits(BlockDriverState *bs, bool is_write,
> > - double elapsed_time, uint64_t *wait)
> > -{
> > - uint64_t iops_limit = 0;
> > - double ios_limit, ios_base;
> > - double slice_time, wait_time;
> > -
> > - if (bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL]) {
> > - iops_limit = bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL];
> > - } else if (bs->io_limits.iops[is_write]) {
> > - iops_limit = bs->io_limits.iops[is_write];
> > - } else {
> > - if (wait) {
> > - *wait = 0;
> > - }
> > -
> > - return false;
> > - }
> > -
> > - slice_time = bs->slice_end - bs->slice_start;
> > - slice_time /= (NANOSECONDS_PER_SECOND);
> > - ios_limit = iops_limit * slice_time;
> > - ios_base = bs->slice_submitted.ios[is_write];
> > - if (bs->io_limits.iops[BLOCK_IO_LIMIT_TOTAL]) {
> > - ios_base += bs->slice_submitted.ios[!is_write];
> > - }
> > -
> > - if (ios_base + 1 <= ios_limit) {
> > - if (wait) {
> > - *wait = 0;
> > - }
> > -
> > - return false;
> > - }
> > -
> > - /* Calc approx time to dispatch, in seconds */
> > - wait_time = (ios_base + 1) / iops_limit;
> > - if (wait_time > elapsed_time) {
> > - wait_time = wait_time - elapsed_time;
> > - } else {
> > - wait_time = 0;
> > - }
> > -
> > - /* Exceeded current slice, extend it by another slice time */
> > - bs->slice_end += BLOCK_IO_SLICE_TIME;
> > - if (wait) {
> > - *wait = wait_time * NANOSECONDS_PER_SECOND;
> > - }
> > -
> > - return true;
> > -}
> > -
> > -static bool bdrv_exceed_io_limits(BlockDriverState *bs, int nb_sectors,
> > - bool is_write, int64_t *wait)
> > -{
> > - int64_t now, max_wait;
> > - uint64_t bps_wait = 0, iops_wait = 0;
> > - double elapsed_time;
> > - int bps_ret, iops_ret;
> > -
> > - now = qemu_get_clock_ns(vm_clock);
> > - if (now > bs->slice_end) {
> > - bs->slice_start = now;
> > - bs->slice_end = now + BLOCK_IO_SLICE_TIME;
> > - memset(&bs->slice_submitted, 0, sizeof(bs->slice_submitted));
> > - }
> > -
> > - elapsed_time = now - bs->slice_start;
> > - elapsed_time /= (NANOSECONDS_PER_SECOND);
> > -
> > - bps_ret = bdrv_exceed_bps_limits(bs, nb_sectors,
> > - is_write, elapsed_time, &bps_wait);
> > - iops_ret = bdrv_exceed_iops_limits(bs, is_write,
> > - elapsed_time, &iops_wait);
> > - if (bps_ret || iops_ret) {
> > - max_wait = bps_wait > iops_wait ? bps_wait : iops_wait;
> > - if (wait) {
> > - *wait = max_wait;
> > - }
> > -
> > - now = qemu_get_clock_ns(vm_clock);
> > - if (bs->slice_end < now + max_wait) {
> > - bs->slice_end = now + max_wait;
> > - }
> > -
> > - return true;
> > - }
> > -
> > - if (wait) {
> > - *wait = 0;
> > - }
> > -
> > - bs->slice_submitted.bytes[is_write] += (int64_t)nb_sectors *
> > - BDRV_SECTOR_SIZE;
> > - bs->slice_submitted.ios[is_write]++;
> > -
> > - return false;
> > -}
> > -
> > /**************************************************************/
> > /* async block device emulation */
> >
> > diff --git a/blockdev.c b/blockdev.c
> > index c5abd65..491e4d0 100644
> > --- a/blockdev.c
> > +++ b/blockdev.c
> > @@ -280,10 +280,25 @@ static int parse_block_error_action(const char *buf, bool is_read)
> > }
> > }
> >
> > +static bool check_io_limit(int64_t limit)
> > +{
> > + if (!limit) {
> > + return false;
> > + }
> > +
> > + if (limit < (THROTTLE_HZ * 2)) {
> > + return true;
> > + }
> > +
> > + return false;
> > +}
> > +
> > static bool do_check_io_limits(BlockIOLimit *io_limits, Error **errp)
> > {
> > bool bps_flag;
> > bool iops_flag;
> > + bool bps_threshold_flag;
> > + bool iops_threshold_flag;
> >
> > assert(io_limits);
> >
> > @@ -299,13 +314,30 @@ static bool do_check_io_limits(BlockIOLimit *io_limits, Error **errp)
> > return false;
> > }
> >
> > - if (io_limits->bps[BLOCK_IO_LIMIT_TOTAL] < 0 ||
> > - io_limits->bps[BLOCK_IO_LIMIT_WRITE] < 0 ||
> > - io_limits->bps[BLOCK_IO_LIMIT_READ] < 0 ||
> > - io_limits->iops[BLOCK_IO_LIMIT_TOTAL] < 0 ||
> > - io_limits->iops[BLOCK_IO_LIMIT_WRITE] < 0 ||
> > - io_limits->iops[BLOCK_IO_LIMIT_READ] < 0) {
> > - error_setg(errp, "bps and iops values must be 0 or greater");
> > + bps_threshold_flag =
> > + (io_limits->bps_threshold[BLOCK_IO_LIMIT_TOTAL] != 0)
> > + && ((io_limits->bps_threshold[BLOCK_IO_LIMIT_READ] != 0)
> > + || (io_limits->bps_threshold[BLOCK_IO_LIMIT_WRITE] != 0));
> > + iops_threshold_flag =
> > + (io_limits->iops_threshold[BLOCK_IO_LIMIT_TOTAL] != 0)
> > + && ((io_limits->iops_threshold[BLOCK_IO_LIMIT_READ] != 0)
> > + || (io_limits->iops_threshold[BLOCK_IO_LIMIT_WRITE] != 0));
> > + if (bps_threshold_flag || iops_threshold_flag) {
> > + error_setg(errp, "bps_threshold(iops_threshold) and "
> > + "bps_rd_threshold/bps_wr_threshold"
> > + "(iops_rd_threshold/iops_wr_threshold) "
> > + "cannot be used at the same time");
> > + return false;
> > + }
> > +
> > + if (check_io_limit(io_limits->bps[BLOCK_IO_LIMIT_TOTAL]) ||
> > + check_io_limit(io_limits->bps[BLOCK_IO_LIMIT_WRITE]) ||
> > + check_io_limit(io_limits->bps[BLOCK_IO_LIMIT_READ]) ||
> > + check_io_limit(io_limits->iops[BLOCK_IO_LIMIT_TOTAL]) ||
> > + check_io_limit(io_limits->iops[BLOCK_IO_LIMIT_WRITE]) ||
> > + check_io_limit(io_limits->iops[BLOCK_IO_LIMIT_READ])) {
> > + error_setg(errp, "bps and iops values must be %i or greater",
> > + THROTTLE_HZ * 2);
> > return false;
> > }
> >
> > @@ -497,6 +529,18 @@ DriveInfo *drive_init(QemuOpts *all_opts, BlockInterfaceType block_default_type)
> > qemu_opt_get_number(opts, "iops_rd", 0);
> > io_limits.iops[BLOCK_IO_LIMIT_WRITE] =
> > qemu_opt_get_number(opts, "iops_wr", 0);
> > + io_limits.bps_threshold[BLOCK_IO_LIMIT_TOTAL] =
> > + io_limits.bps[BLOCK_IO_LIMIT_TOTAL] / THROTTLE_HZ;
> > + io_limits.bps_threshold[BLOCK_IO_LIMIT_READ] =
> > + io_limits.bps[BLOCK_IO_LIMIT_READ] / THROTTLE_HZ;
> > + io_limits.bps_threshold[BLOCK_IO_LIMIT_WRITE] =
> > + io_limits.bps[BLOCK_IO_LIMIT_WRITE] / THROTTLE_HZ;
> > + io_limits.iops_threshold[BLOCK_IO_LIMIT_TOTAL] =
> > + io_limits.iops[BLOCK_IO_LIMIT_TOTAL] / THROTTLE_HZ;
> > + io_limits.iops_threshold[BLOCK_IO_LIMIT_READ] =
> > + io_limits.iops[BLOCK_IO_LIMIT_READ] / THROTTLE_HZ;
> > + io_limits.iops_threshold[BLOCK_IO_LIMIT_WRITE] =
> > + io_limits.iops[BLOCK_IO_LIMIT_WRITE] / THROTTLE_HZ;
> >
> > if (!do_check_io_limits(&io_limits, &error)) {
> > error_report("%s", error_get_pretty(error));
> > @@ -1198,6 +1242,12 @@ void qmp_block_set_io_throttle(const char *device, int64_t bps, int64_t bps_rd,
> > io_limits.iops[BLOCK_IO_LIMIT_TOTAL]= iops;
> > io_limits.iops[BLOCK_IO_LIMIT_READ] = iops_rd;
> > io_limits.iops[BLOCK_IO_LIMIT_WRITE]= iops_wr;
> > + io_limits.bps_threshold[BLOCK_IO_LIMIT_TOTAL] = bps / THROTTLE_HZ;
> > + io_limits.bps_threshold[BLOCK_IO_LIMIT_READ] = bps_rd / THROTTLE_HZ;
> > + io_limits.bps_threshold[BLOCK_IO_LIMIT_WRITE] = bps_wr / THROTTLE_HZ;
> > + io_limits.iops_threshold[BLOCK_IO_LIMIT_TOTAL] = iops / THROTTLE_HZ;
> > + io_limits.iops_threshold[BLOCK_IO_LIMIT_READ] = iops_rd / THROTTLE_HZ;
> > + io_limits.iops_threshold[BLOCK_IO_LIMIT_WRITE] = iops_wr / THROTTLE_HZ;
> >
> > if (!do_check_io_limits(&io_limits, errp)) {
> > return;
> > @@ -1209,11 +1259,10 @@ void qmp_block_set_io_throttle(const char *device, int64_t bps, int64_t bps_rd,
> > bdrv_io_limits_enable(bs);
> > } else if (bs->io_limits_enabled && !bdrv_io_limits_enabled(bs)) {
> > bdrv_io_limits_disable(bs);
> > - } else {
> > - if (bs->block_timer) {
> > - qemu_mod_timer(bs->block_timer, qemu_get_clock_ns(vm_clock));
> > - }
> > }
> > +
> > + /* reset leaky bucket to get the system in a known state */
> > + memset(&bs->leaky_buckets, 0, sizeof(bs->leaky_buckets));
> > }
> >
> > int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data)
> > diff --git a/include/block/block_int.h b/include/block/block_int.h
> > index c6ac871..e32ad1f 100644
> > --- a/include/block/block_int.h
> > +++ b/include/block/block_int.h
> > @@ -43,8 +43,9 @@
> > #define BLOCK_IO_LIMIT_WRITE 1
> > #define BLOCK_IO_LIMIT_TOTAL 2
> >
> > -#define BLOCK_IO_SLICE_TIME 100000000
> > #define NANOSECONDS_PER_SECOND 1000000000.0
> > +#define THROTTLE_HZ 1
> > +#define BLOCK_IO_THROTTLE_PERIOD (NANOSECONDS_PER_SECOND / THROTTLE_HZ)
> >
> > #define BLOCK_OPT_SIZE "size"
> > #define BLOCK_OPT_ENCRYPT "encryption"
> > @@ -73,11 +74,13 @@ typedef struct BdrvTrackedRequest {
> > typedef struct BlockIOLimit {
> > int64_t bps[3];
> > int64_t iops[3];
> > + int64_t bps_threshold[3];
> > + int64_t iops_threshold[3];
> > } BlockIOLimit;
> >
> > typedef struct BlockIOBaseValue {
> > - uint64_t bytes[2];
> > - uint64_t ios[2];
> > + int64_t bytes[2];
> > + double ios[2];
> > } BlockIOBaseValue;
> >
> > struct BlockDriver {
> > @@ -264,10 +267,10 @@ struct BlockDriverState {
> > unsigned int copy_on_read_in_flight;
> >
> > /* the time for latest disk I/O */
> > - int64_t slice_start;
> > - int64_t slice_end;
> > BlockIOLimit io_limits;
> > - BlockIOBaseValue slice_submitted;
> > + BlockIOBaseValue leaky_buckets;
> > + int64_t previous_leak;
> > + bool must_leak;
> > CoQueue throttled_reqs;
> > QEMUTimer *block_timer;
> > bool io_limits_enabled;
> > --
> > 1.7.10.4
> >
> >
>
> --
> Fam
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [PATCH V3 for-1.6 5/5] block: Add throttling percentage metrics.
2013-07-26 2:55 ` Fam Zheng
@ 2013-07-26 9:49 ` Benoît Canet
0 siblings, 0 replies; 20+ messages in thread
From: Benoît Canet @ 2013-07-26 9:49 UTC (permalink / raw)
To: Fam Zheng; +Cc: kwolf, qemu-devel, stefanha
> If I understand it, the percentage is recalculated every leak check. So
> it only reflects the instant io flow, instead of historical statistics?
> But I think for system admin purpose, it's good to know a longer range
> io activity character. Or do you think management tool should sample it?
Yes I wrote this with management polling in mind.
>
> > + } else {
> > + bs->throttling_percentage = 0;
> > + }
> > bs->previous_leak = now;
> >
> > bdrv_make_bps_buckets_leak(bs, delta);
> > @@ -260,6 +278,7 @@ static void bdrv_block_timer(void *opaque)
> > /* disable throttling time on iddle for economy purpose */
> > if (bdrv_throttling_is_iddle(bs)) {
> > bdrv_block_timer_disable(bs);
> > + bdrv_reset_throttling_metrics(bs);
> > return;
> > }
> >
> > @@ -280,6 +299,7 @@ static void bdrv_block_timer_enable(BlockDriverState *bs)
> >
> > bs->block_timer = qemu_new_timer_ns(vm_clock, bdrv_block_timer, bs);
> > bs->previous_leak = qemu_get_clock_ns(rt_clock);
> > + bdrv_reset_throttling_metrics(bs);
> > qemu_mod_timer(bs->block_timer,
> > qemu_get_clock_ns(vm_clock) +
> > BLOCK_IO_THROTTLE_PERIOD);
> > @@ -432,6 +452,11 @@ static void bdrv_io_limits_intercept(BlockDriverState *bs,
> > * not full
> > */
> > while (bdrv_is_any_threshold_exceeded(bs, nb_sectors, is_write)) {
> > + /* remember since when the code decided to block the first I/O */
> > + if (qemu_co_queue_empty(&bs->throttled_reqs)) {
> > + bs->full_since = qemu_get_clock_ns(rt_clock);
> > + }
> > +
> > bdrv_leak_if_needed(bs);
> > qemu_co_queue_wait_insert_head(&bs->throttled_reqs);
> > bdrv_leak_if_needed(bs);
> > diff --git a/block/qapi.c b/block/qapi.c
> > index f81081c..bd1c6af 100644
> > --- a/block/qapi.c
> > +++ b/block/qapi.c
> > @@ -263,6 +263,10 @@ void bdrv_query_info(BlockDriverState *bs,
> > bs->io_limits.iops_sector_count;
> > info->inserted->iops_sector_count =
> > bs->io_limits.iops_sector_count;
> > + info->inserted->has_throttling_percentage =
> > + bs->throttling_percentage;
> > + info->inserted->throttling_percentage =
> > + bs->throttling_percentage;
> > }
> >
> > bs0 = bs;
> > diff --git a/hmp.c b/hmp.c
> > index 3912305..9dc4862 100644
> > --- a/hmp.c
> > +++ b/hmp.c
> > @@ -348,7 +348,8 @@ void hmp_info_block(Monitor *mon, const QDict *qdict)
> > " iops_threshold=%" PRId64
> > " iops_rd_threshold=%" PRId64
> > " iops_wr_threshold=%" PRId64
> > - " iops_sector_count=%" PRId64 "\n",
> > + " iops_sector_count=%" PRId64
> > + " throttling_percentage=%" PRId64 "\n",
> > info->value->inserted->bps,
> > info->value->inserted->bps_rd,
> > info->value->inserted->bps_wr,
> > @@ -361,7 +362,8 @@ void hmp_info_block(Monitor *mon, const QDict *qdict)
> > info->value->inserted->iops_threshold,
> > info->value->inserted->iops_rd_threshold,
> > info->value->inserted->iops_wr_threshold,
> > - info->value->inserted->iops_sector_count);
> > + info->value->inserted->iops_sector_count,
> > + info->value->inserted->throttling_percentage);
> > } else {
> > monitor_printf(mon, " [not inserted]");
> > }
> > diff --git a/include/block/block_int.h b/include/block/block_int.h
> > index 74d7503..4487cd9 100644
> > --- a/include/block/block_int.h
> > +++ b/include/block/block_int.h
> > @@ -271,6 +271,8 @@ struct BlockDriverState {
> > BlockIOLimit io_limits;
> > BlockIOBaseValue leaky_buckets;
> > int64_t previous_leak;
> > + int64_t full_since;
> > + int throttling_percentage;
> > bool must_leak;
> > CoQueue throttled_reqs;
> > QEMUTimer *block_timer;
> > diff --git a/qapi-schema.json b/qapi-schema.json
> > index d579fda..14a02e7 100644
> > --- a/qapi-schema.json
> > +++ b/qapi-schema.json
> > @@ -783,6 +783,8 @@
> > #
> > # @iops_sector_count: #optional an I/O size in sector (Since 1.6)
> > #
> > +# @throttling_percentage: #optional reflect throttling activity (Since 1.6)
> > +#
> > # Since: 0.14.0
> > #
> > # Notes: This interface is only found in @BlockInfo.
> > @@ -797,7 +799,7 @@
> > '*bps_threshold': 'int', '*bps_rd_threshold': 'int',
> > '*bps_wr_threshold': 'int', '*iops_threshold': 'int',
> > '*iops_rd_threshold': 'int', '*iops_wr_threshold': 'int',
> > - '*iops_sector_count': 'int' } }
> > + '*iops_sector_count': 'int', '*throttling_percentage': 'int' } }
> >
> > ##
> > # @BlockDeviceIoStatus:
> > --
> > 1.7.10.4
> >
> >
>
> --
> Fam
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [PATCH V3 for-1.6 0/5] Leaky bucket throttling and features
2013-07-23 16:29 [Qemu-devel] [PATCH V3 for-1.6 0/5] Leaky bucket throttling and features Benoît Canet
` (5 preceding siblings ...)
2013-07-25 12:08 ` [Qemu-devel] [PATCH V3 for-1.6 0/5] Leaky bucket throttling and features Fam Zheng
@ 2013-07-26 19:07 ` Eric Blake
6 siblings, 0 replies; 20+ messages in thread
From: Eric Blake @ 2013-07-26 19:07 UTC (permalink / raw)
To: Benoît Canet; +Cc: kwolf, qemu-devel, stefanha
[-- Attachment #1: Type: text/plain, Size: 1574 bytes --]
On 07/23/2013 10:29 AM, Benoît Canet wrote:
> The first patch fixes the throttling which was broken by a previous commit.
>
> The next patch replace the existing throttling algorithm by the well described
> leaky bucket algorithm.
>
> Third patch implement bursting by adding *_threshold parameters to
> qmp_block_set_io_throttle.
>
> The last one allow to define the max size of an io when throttling by iops via
> iops_sector_count to avoid vm users cheating on the iops limit.
>
> The last patch adds a metric reflecting how much the I/O are throttled.
>
> since v1:
> Add throttling percentage metric [Benoît]
>
> since v2:
> Enable timer only during I/O activity [Stefan]
> Mark function as coroutine_fn [Stefan]
> Guard these function checking they are in a coroutine [Stefan]
> Use defines to access buckets [Stefan]
> Fix typo [Stefan]
> reset throttling metric on iddle [Benoît]
> rename invalid to check_io_limit [Stefan]
>
> Benoît Canet (5):
> block: Repair the throttling code.
This is probably okay for 1.6 (in fact, now's the RIGHT time to be
fixing bugs).
> block: Modify the throttling code to implement the leaky bucket
> algorithm.
but this sounds too risky now that we have passed soft freeze; as
mentioned on the thread on blockdev-add, this part is definitely 1.7
material, and the series should be split and subject lines updated to
reflect that.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 621 bytes --]
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [PATCH V3 for-1.6 1/5] block: Repair the throttling code.
2013-07-23 16:29 ` [Qemu-devel] [PATCH V3 for-1.6 1/5] block: Repair the throttling code Benoît Canet
@ 2013-07-26 19:16 ` Eric Blake
2013-07-26 19:42 ` Alex Bligh
0 siblings, 1 reply; 20+ messages in thread
From: Eric Blake @ 2013-07-26 19:16 UTC (permalink / raw)
To: Benoît Canet; +Cc: kwolf, qemu-devel, stefanha
[-- Attachment #1: Type: text/plain, Size: 1688 bytes --]
On 07/23/2013 10:29 AM, Benoît Canet wrote:
> The throttling code was segfaulting since commit
> 02ffb504485f0920cfc75a0982a602f824a9a4f4 because some qemu_co_queue_next caller
> does not run in a coroutine.
> qemu_co_queue_do_restart assume that the caller is a coroutinne.
s/assume/assumes/; s/coroutinne/coroutine/
> As suggested by Stefan fix this by entering the coroutine directly.
> Also make sure like suggested that qemu_co_queue_next() and
> qemu_co_queue_restart_all() can be called only in coroutines.
>
> Signed-off-by: Benoit Canet <benoit@irqsave.net>
> ---
> block.c | 8 ++++----
> include/block/coroutine.h | 9 +++++++--
> qemu-coroutine-lock.c | 20 ++++++++++++++++++--
> 3 files changed, 29 insertions(+), 8 deletions(-)
>
> diff --git a/block.c b/block.c
> index b560241..dc72643 100644
> --- a/block.c
> +++ b/block.c
> @@ -127,7 +127,8 @@ void bdrv_io_limits_disable(BlockDriverState *bs)
> {
> bs->io_limits_enabled = false;
>
> - while (qemu_co_queue_next(&bs->throttled_reqs));
> + while (qemu_co_enter_next(&bs->throttled_reqs)) {
> + }
On first read, I missed the s/queue/enter/ change and thought all you
were doing was the s/;/{}/ change. Is the style change necessary to
keep checkpatch.pl happy? If not, then keeping the old style would draw
better attention to the bug fix.
This patch is worth 1.6, even if the rest of the series is not. I'm not
a coroutine expert, so take this with a grain of salt:
Reviewed-by: Eric Blake <eblake@redhat.com>
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 621 bytes --]
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [PATCH V3 for-1.6 3/5] block: Add support for throttling burst threshold in QMP and the command line.
2013-07-23 16:29 ` [Qemu-devel] [PATCH V3 for-1.6 3/5] block: Add support for throttling burst threshold in QMP and the command line Benoît Canet
2013-07-26 3:07 ` Fam Zheng
@ 2013-07-26 19:24 ` Eric Blake
2013-07-26 20:55 ` Benoît Canet
2013-08-08 13:37 ` Benoît Canet
1 sibling, 2 replies; 20+ messages in thread
From: Eric Blake @ 2013-07-26 19:24 UTC (permalink / raw)
To: Benoît Canet; +Cc: kwolf, qemu-devel, stefanha
[-- Attachment #1: Type: text/plain, Size: 2726 bytes --]
On 07/23/2013 10:29 AM, Benoît Canet wrote:
> The thresholds of the leaky bucket algorithm can be used to allow some
> burstiness.
>
> Signed-off-by: Benoit Canet <benoit@irqsave.net>
> ---
> block/qapi.c | 24 +++++++++++++
> blockdev.c | 105 +++++++++++++++++++++++++++++++++++++++++++++++-------
> hmp.c | 32 +++++++++++++++--
> qapi-schema.json | 34 ++++++++++++++++--
> qemu-options.hx | 2 +-
> qmp-commands.hx | 30 ++++++++++++++--
> 6 files changed, 205 insertions(+), 22 deletions(-)
>
> @@ -1916,6 +1971,18 @@ QemuOptsList qemu_common_drive_opts = {
> .type = QEMU_OPT_NUMBER,
> .help = "limit write operations per second",
> },{
> + .name = "iops_threshold",
> + .type = QEMU_OPT_NUMBER,
> + .help = "total I/O operations threshold",
> + },{
Kevin's series renamed these to have a dash in the name, and also moved
all the throttling parameters into a sub-struct. Does it make more
sense to have just '*throttling' with that sub-struct containing 12
parameters, 6 for limits and 6 for thresholds, or would it be better to
have '*throttling' with 6 members for limits, as well as
'*throttling-threshold' with the other 6 members? Naming-wise,
throttling.bps-total and throttling-threshold.bps-total convey as much
information as throttling.bps-total and throttling.bps-total-threshold.
> +++ b/qapi-schema.json
> @@ -769,6 +769,18 @@
> #
> # @image: the info of image used (since: 1.6)
> #
> +# @bps_threshold: #optional total threshold in bytes (Since 1.6)
As others have stated, it would be worth explaining at the high level
the difference between limit and threshold (ie. why I need two different
parameters), as well as what the threshold defaults to if omitted (which
WILL be the case when driven by older management tools).
> +++ b/qemu-options.hx
> @@ -409,7 +409,7 @@ DEF("drive", HAS_ARG, QEMU_OPTION_drive,
> " [,cache=writethrough|writeback|none|directsync|unsafe][,format=f]\n"
> " [,serial=s][,addr=A][,id=name][,aio=threads|native]\n"
> " [,readonly=on|off][,copy-on-read=on|off]\n"
> - " [[,bps=b]|[[,bps_rd=r][,bps_wr=w]]][[,iops=i]|[[,iops_rd=r][,iops_wr=w]]\n"
> + " [[,bps=b]|[[,bps_rd=r][,bps_wr=w]]][[,iops=i]|[[,iops_rd=r][,iops_wr=w][,bps_threshold=bt]|[[,bps_rd_threshold=rt][,bps_wr_threshold=wt]]][[,iops_threshold=it]|[[,iops_rd_threshold=rt][,iops_wr_threshold=wt]]\n"
Is it worth line-wrapping this, so the help text doesn't pass 80 columns?
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 621 bytes --]
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [PATCH V3 for-1.6 1/5] block: Repair the throttling code.
2013-07-26 19:16 ` Eric Blake
@ 2013-07-26 19:42 ` Alex Bligh
0 siblings, 0 replies; 20+ messages in thread
From: Alex Bligh @ 2013-07-26 19:42 UTC (permalink / raw)
To: Eric Blake, Benoît Canet; +Cc: kwolf, Alex Bligh, qemu-devel, stefanha
--On 26 July 2013 13:16:04 -0600 Eric Blake <eblake@redhat.com> wrote:
>> - while (qemu_co_queue_next(&bs->throttled_reqs));
>> + while (qemu_co_enter_next(&bs->throttled_reqs)) {
>> + }
>
> On first read, I missed the s/queue/enter/ change and thought all you
> were doing was the s/;/{}/ change. Is the style change necessary to
> keep checkpatch.pl happy? If not, then keeping the old style would draw
> better attention to the bug fix.
checkpatch.pl doesn't like:
while (func());
It appears happy with:
do {} while (func());
which is marginally less ugly than:
while (func()) {
};
in my opinion, particularly when used multiple times in succession.
I think this is probably a bug in checkpatch.pl.
--
Alex Bligh
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [PATCH V3 for-1.6 3/5] block: Add support for throttling burst threshold in QMP and the command line.
2013-07-26 19:24 ` Eric Blake
@ 2013-07-26 20:55 ` Benoît Canet
2013-08-08 13:37 ` Benoît Canet
1 sibling, 0 replies; 20+ messages in thread
From: Benoît Canet @ 2013-07-26 20:55 UTC (permalink / raw)
To: Eric Blake; +Cc: kwolf, qemu-devel, stefanha
> Kevin's series renamed these to have a dash in the name, and also moved
> all the throttling parameters into a sub-struct. Does it make more
> sense to have just '*throttling' with that sub-struct containing 12
> parameters, 6 for limits and 6 for thresholds, or would it be better to
> have '*throttling' with 6 members for limits, as well as
> '*throttling-threshold' with the other 6 members? Naming-wise,
> throttling.bps-total and throttling-threshold.bps-total convey as much
> information as throttling.bps-total and throttling.bps-total-threshold.
In fact my series add up to 13 parameters.
The last one is iops_sector_count so maybe I'll go the big sub-struct way.
Best regards
Benoît
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [PATCH V3 for-1.6 3/5] block: Add support for throttling burst threshold in QMP and the command line.
2013-07-26 19:24 ` Eric Blake
2013-07-26 20:55 ` Benoît Canet
@ 2013-08-08 13:37 ` Benoît Canet
2013-08-09 7:09 ` Kevin Wolf
1 sibling, 1 reply; 20+ messages in thread
From: Benoît Canet @ 2013-08-08 13:37 UTC (permalink / raw)
To: Eric Blake; +Cc: kwolf, qemu-devel, stefanha
> Kevin's series renamed these to have a dash in the name, and also moved
> all the throttling parameters into a sub-struct. Does it make more
> sense to have just '*throttling' with that sub-struct containing 12
> parameters, 6 for limits and 6 for thresholds, or would it be better to
> have '*throttling' with 6 members for limits, as well as
> '*throttling-threshold' with the other 6 members? Naming-wise,
> throttling.bps-total and throttling-threshold.bps-total convey as much
> information as throttling.bps-total and throttling.bps-total-threshold.
Eric & Kevin:
Should compatible old style values be added for the new throttling options in
order to avoid a mixed style mess in qemu-options.hx ?
Best regards
Benoît
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [PATCH V3 for-1.6 3/5] block: Add support for throttling burst threshold in QMP and the command line.
2013-08-08 13:37 ` Benoît Canet
@ 2013-08-09 7:09 ` Kevin Wolf
0 siblings, 0 replies; 20+ messages in thread
From: Kevin Wolf @ 2013-08-09 7:09 UTC (permalink / raw)
To: Benoît Canet; +Cc: qemu-devel, stefanha
Am 08.08.2013 um 15:37 hat Benoît Canet geschrieben:
> > Kevin's series renamed these to have a dash in the name, and also moved
> > all the throttling parameters into a sub-struct. Does it make more
> > sense to have just '*throttling' with that sub-struct containing 12
> > parameters, 6 for limits and 6 for thresholds, or would it be better to
> > have '*throttling' with 6 members for limits, as well as
> > '*throttling-threshold' with the other 6 members? Naming-wise,
> > throttling.bps-total and throttling-threshold.bps-total convey as much
> > information as throttling.bps-total and throttling.bps-total-threshold.
>
> Eric & Kevin:
>
> Should compatible old style values be added for the new throttling options in
> order to avoid a mixed style mess in qemu-options.hx ?
No, I don't think new options should get an old-style alias. We should
simply convert the documentation in qemu-options.hx consistently to the
new style.
Kevin
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2013-08-09 7:09 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-07-23 16:29 [Qemu-devel] [PATCH V3 for-1.6 0/5] Leaky bucket throttling and features Benoît Canet
2013-07-23 16:29 ` [Qemu-devel] [PATCH V3 for-1.6 1/5] block: Repair the throttling code Benoît Canet
2013-07-26 19:16 ` Eric Blake
2013-07-26 19:42 ` Alex Bligh
2013-07-23 16:29 ` [Qemu-devel] [PATCH V3 for-1.6 2/5] block: Modify the throttling code to implement the leaky bucket algorithm Benoît Canet
2013-07-26 3:40 ` Fam Zheng
2013-07-26 9:48 ` Benoît Canet
2013-07-23 16:29 ` [Qemu-devel] [PATCH V3 for-1.6 3/5] block: Add support for throttling burst threshold in QMP and the command line Benoît Canet
2013-07-26 3:07 ` Fam Zheng
2013-07-26 19:24 ` Eric Blake
2013-07-26 20:55 ` Benoît Canet
2013-08-08 13:37 ` Benoît Canet
2013-08-09 7:09 ` Kevin Wolf
2013-07-23 16:29 ` [Qemu-devel] [PATCH V3 for-1.6 4/5] block: Add iops_sector_count to do the iops accounting for a given io size Benoît Canet
2013-07-23 16:29 ` [Qemu-devel] [PATCH V3 for-1.6 5/5] block: Add throttling percentage metrics Benoît Canet
2013-07-26 2:55 ` Fam Zheng
2013-07-26 9:49 ` Benoît Canet
2013-07-25 12:08 ` [Qemu-devel] [PATCH V3 for-1.6 0/5] Leaky bucket throttling and features Fam Zheng
2013-07-25 12:16 ` Benoît Canet
2013-07-26 19:07 ` Eric Blake
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).