qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Kevin Wolf <kwolf@redhat.com>
To: qemu-block@nongnu.org
Cc: kwolf@redhat.com, qemu-devel@nongnu.org
Subject: [Qemu-devel] [PULL 27/32] Improve block job rate limiting for small bandwidth values
Date: Fri,  8 Jul 2016 19:21:39 +0200	[thread overview]
Message-ID: <1467998504-15744-28-git-send-email-kwolf@redhat.com> (raw)
In-Reply-To: <1467998504-15744-1-git-send-email-kwolf@redhat.com>

From: Sascha Silbe <silbe@linux.vnet.ibm.com>

ratelimit_calculate_delay() previously reset the accounting every time
slice, no matter how much data had been processed before. This had (at
least) two consequences:

1. The minimum speed is rather large, e.g. 5 MiB/s for commit and stream.

   Not sure if there are real-world use cases where this would be a
   problem. Mirroring and backup over a slow link (e.g. DSL) would
   come to mind, though.

2. Tests for block job operations (e.g. cancel) were rather racy

   All block jobs currently use a time slice of 100ms. That's a
   reasonable value to get smooth output during regular
   operation. However this also meant that the state of block jobs
   changed every 100ms, no matter how low the configured limit was. On
   busy hosts, qemu often transferred additional chunks until the test
   case had a chance to cancel the job.

Fix the block job rate limit code to delay for more than one time
slice to address the above issues. To make it easier to handle
oversized chunks we switch the semantics from returning a delay
_before_ the current request to a delay _after_ the current
request. If necessary, this delay consists of multiple time slice
units.

Since the mirror job sends multiple chunks in one go even if the rate
limit was exceeded in between, we need to keep track of the start of
the current time slice so we can correctly re-compute the delay for
the updated amount of data.

The minimum bandwidth now is 1 data unit per time slice. The block
jobs are currently passing the amount of data transferred in sectors
and using 100ms time slices, so this translates to 5120
bytes/second. With chunk sizes usually being O(512KiB), tests have
plenty of time (O(100s)) to operate on block jobs. The chance of a
race condition now is fairly remote, except possibly on insanely
loaded systems.

Signed-off-by: Sascha Silbe <silbe@linux.vnet.ibm.com>
Message-id: 1467127721-9564-2-git-send-email-silbe@linux.vnet.ibm.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/commit.c           | 13 +++++--------
 block/mirror.c           |  4 +++-
 block/stream.c           | 12 ++++--------
 include/qemu/ratelimit.h | 43 ++++++++++++++++++++++++++++++++++---------
 4 files changed, 46 insertions(+), 26 deletions(-)

diff --git a/block/commit.c b/block/commit.c
index 5d11eb6..553e18d 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -113,6 +113,7 @@ static void coroutine_fn commit_run(void *opaque)
     CommitBlockJob *s = opaque;
     CommitCompleteData *data;
     int64_t sector_num, end;
+    uint64_t delay_ns = 0;
     int ret = 0;
     int n = 0;
     void *buf = NULL;
@@ -142,10 +143,8 @@ static void coroutine_fn commit_run(void *opaque)
     buf = blk_blockalign(s->top, COMMIT_BUFFER_SIZE);
 
     for (sector_num = 0; sector_num < end; sector_num += n) {
-        uint64_t delay_ns = 0;
         bool copy;
 
-wait:
         /* Note that even when no rate limit is applied we need to yield
          * with no pending I/O here so that bdrv_drain_all() returns.
          */
@@ -161,12 +160,6 @@ wait:
         copy = (ret == 1);
         trace_commit_one_iteration(s, sector_num, n, ret);
         if (copy) {
-            if (s->common.speed) {
-                delay_ns = ratelimit_calculate_delay(&s->limit, n);
-                if (delay_ns > 0) {
-                    goto wait;
-                }
-            }
             ret = commit_populate(s->top, s->base, sector_num, n, buf);
             bytes_written += n * BDRV_SECTOR_SIZE;
         }
@@ -182,6 +175,10 @@ wait:
         }
         /* Publish progress */
         s->common.offset += n * BDRV_SECTOR_SIZE;
+
+        if (copy && s->common.speed) {
+            delay_ns = ratelimit_calculate_delay(&s->limit, n);
+        }
     }
 
     ret = 0;
diff --git a/block/mirror.c b/block/mirror.c
index 71e5ad4..b1e633e 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -422,7 +422,9 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
         assert(io_sectors);
         sector_num += io_sectors;
         nb_chunks -= DIV_ROUND_UP(io_sectors, sectors_per_chunk);
-        delay_ns += ratelimit_calculate_delay(&s->limit, io_sectors);
+        if (s->common.speed) {
+            delay_ns = ratelimit_calculate_delay(&s->limit, io_sectors);
+        }
     }
     return delay_ns;
 }
diff --git a/block/stream.c b/block/stream.c
index 2e7c654..3187481 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -95,6 +95,7 @@ static void coroutine_fn stream_run(void *opaque)
     BlockDriverState *base = s->base;
     int64_t sector_num = 0;
     int64_t end = -1;
+    uint64_t delay_ns = 0;
     int error = 0;
     int ret = 0;
     int n = 0;
@@ -123,10 +124,8 @@ static void coroutine_fn stream_run(void *opaque)
     }
 
     for (sector_num = 0; sector_num < end; sector_num += n) {
-        uint64_t delay_ns = 0;
         bool copy;
 
-wait:
         /* Note that even when no rate limit is applied we need to yield
          * with no pending I/O here so that bdrv_drain_all() returns.
          */
@@ -156,12 +155,6 @@ wait:
         }
         trace_stream_one_iteration(s, sector_num, n, ret);
         if (copy) {
-            if (s->common.speed) {
-                delay_ns = ratelimit_calculate_delay(&s->limit, n);
-                if (delay_ns > 0) {
-                    goto wait;
-                }
-            }
             ret = stream_populate(blk, sector_num, n, buf);
         }
         if (ret < 0) {
@@ -182,6 +175,9 @@ wait:
 
         /* Publish progress */
         s->common.offset += n * BDRV_SECTOR_SIZE;
+        if (copy && s->common.speed) {
+            delay_ns = ratelimit_calculate_delay(&s->limit, n);
+        }
     }
 
     if (!base) {
diff --git a/include/qemu/ratelimit.h b/include/qemu/ratelimit.h
index d413a4a..12db769 100644
--- a/include/qemu/ratelimit.h
+++ b/include/qemu/ratelimit.h
@@ -15,34 +15,59 @@
 #define QEMU_RATELIMIT_H 1
 
 typedef struct {
-    int64_t next_slice_time;
+    int64_t slice_start_time;
+    int64_t slice_end_time;
     uint64_t slice_quota;
     uint64_t slice_ns;
     uint64_t dispatched;
 } RateLimit;
 
+/** Calculate and return delay for next request in ns
+ *
+ * Record that we sent @p n data units. If we may send more data units
+ * in the current time slice, return 0 (i.e. no delay). Otherwise
+ * return the amount of time (in ns) until the start of the next time
+ * slice that will permit sending the next chunk of data.
+ *
+ * Recording sent data units even after exceeding the quota is
+ * permitted; the time slice will be extended accordingly.
+ */
 static inline int64_t ratelimit_calculate_delay(RateLimit *limit, uint64_t n)
 {
     int64_t now = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
+    uint64_t delay_slices;
 
-    if (limit->next_slice_time < now) {
-        limit->next_slice_time = now + limit->slice_ns;
+    assert(limit->slice_quota && limit->slice_ns);
+
+    if (limit->slice_end_time < now) {
+        /* Previous, possibly extended, time slice finished; reset the
+         * accounting. */
+        limit->slice_start_time = now;
+        limit->slice_end_time = now + limit->slice_ns;
         limit->dispatched = 0;
     }
-    if (limit->dispatched == 0 || limit->dispatched + n <= limit->slice_quota) {
-        limit->dispatched += n;
+
+    limit->dispatched += n;
+    if (limit->dispatched < limit->slice_quota) {
+        /* We may send further data within the current time slice, no
+         * need to delay the next request. */
         return 0;
-    } else {
-        limit->dispatched = n;
-        return limit->next_slice_time - now;
     }
+
+    /* Quota exceeded. Calculate the next time slice we may start
+     * sending data again. */
+    delay_slices = (limit->dispatched + limit->slice_quota - 1) /
+        limit->slice_quota;
+    limit->slice_end_time = limit->slice_start_time +
+        delay_slices * limit->slice_ns;
+    return limit->slice_end_time - now;
 }
 
 static inline void ratelimit_set_speed(RateLimit *limit, uint64_t speed,
                                        uint64_t slice_ns)
 {
     limit->slice_ns = slice_ns;
-    limit->slice_quota = ((double)speed * slice_ns)/1000000000ULL;
+    limit->slice_quota = MAX(((double)speed * slice_ns) / 1000000000ULL, 1);
 }
 
 #endif
-- 
1.8.3.1

  parent reply	other threads:[~2016-07-08 17:22 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-08 17:21 [Qemu-devel] [PULL 00/32] Block layer patches Kevin Wolf
2016-07-08 17:21 ` [Qemu-devel] [PULL 01/32] stream: Fix prototype of stream_start() Kevin Wolf
2016-07-08 17:21 ` [Qemu-devel] [PULL 02/32] blockjob: Update description of the 'id' field Kevin Wolf
2016-07-08 17:21 ` [Qemu-devel] [PULL 03/32] blockjob: Add block_job_get() Kevin Wolf
2016-07-08 17:21 ` [Qemu-devel] [PULL 04/32] block: Use block_job_get() in find_block_job() Kevin Wolf
2016-07-08 17:21 ` [Qemu-devel] [PULL 05/32] blockjob: Add 'job_id' parameter to block_job_create() Kevin Wolf
2016-07-08 17:21 ` [Qemu-devel] [PULL 06/32] mirror: Add 'job-id' parameter to 'blockdev-mirror' and 'drive-mirror' Kevin Wolf
2016-07-08 17:21 ` [Qemu-devel] [PULL 07/32] backup: Add 'job-id' parameter to 'blockdev-backup' and 'drive-backup' Kevin Wolf
2016-07-08 17:21 ` [Qemu-devel] [PULL 08/32] stream: Add 'job-id' parameter to 'block-stream' Kevin Wolf
2016-07-08 17:21 ` [Qemu-devel] [PULL 09/32] commit: Add 'job-id' parameter to 'block-commit' Kevin Wolf
2016-07-08 17:21 ` [Qemu-devel] [PULL 10/32] qemu-img: Set the ID of the block job in img_commit() Kevin Wolf
2016-07-08 17:21 ` [Qemu-devel] [PULL 11/32] blockjob: Update description of the 'device' field in the QMP API Kevin Wolf
2016-07-08 17:21 ` [Qemu-devel] [PULL 12/32] osdep: Introduce qemu_dup Kevin Wolf
2016-07-08 17:21 ` [Qemu-devel] [PULL 13/32] raw-posix: Use qemu_dup Kevin Wolf
2016-07-08 17:21 ` [Qemu-devel] [PULL 14/32] coroutine: use QSIMPLEQ instead of QTAILQ Kevin Wolf
2016-07-08 17:21 ` [Qemu-devel] [PULL 15/32] test-coroutine: prepare for the next patch Kevin Wolf
2016-07-08 17:21 ` [Qemu-devel] [PULL 16/32] coroutine: move entry argument to qemu_coroutine_create Kevin Wolf
2016-07-08 17:21 ` [Qemu-devel] [PULL 17/32] block/qdev: Allow node name for drive properties Kevin Wolf
2016-07-08 17:21 ` [Qemu-devel] [PULL 18/32] block/qdev: Allow configuring WCE with qdev properties Kevin Wolf
2016-07-08 17:21 ` [Qemu-devel] [PULL 19/32] commit: Fix use of error handling policy Kevin Wolf
2016-07-08 21:36   ` [Qemu-devel] [Qemu-block] " Eric Blake
2016-07-11 11:22     ` Kevin Wolf
2016-07-11 11:40       ` [Qemu-devel] " Paolo Bonzini
2016-07-11 12:37         ` Kevin Wolf
2016-07-11 11:57       ` [Qemu-devel] [Qemu-block] " Max Reitz
2016-07-08 17:21 ` [Qemu-devel] [PULL 20/32] block/qdev: Allow configuring rerror/werror with qdev properties Kevin Wolf
2016-07-08 17:21 ` [Qemu-devel] [PULL 21/32] qemu-iotests: Test setting WCE with qdev Kevin Wolf
2016-07-08 17:21 ` [Qemu-devel] [PULL 22/32] block: Remove BB options from blockdev-add Kevin Wolf
2016-07-08 17:21 ` [Qemu-devel] [PULL 23/32] qemu-img: Use strerror() for generic resize error Kevin Wolf
2016-07-08 17:21 ` [Qemu-devel] [PULL 24/32] qcow2: Avoid making the L1 table too big Kevin Wolf
2016-07-08 17:21 ` [Qemu-devel] [PULL 25/32] qemu-io: Use correct range limitations Kevin Wolf
2016-07-08 17:21 ` [Qemu-devel] [PULL 26/32] qcow2: Fix qcow2_get_cluster_offset() Kevin Wolf
2016-07-08 17:21 ` Kevin Wolf [this message]
2016-07-08 17:21 ` [Qemu-devel] [PULL 28/32] vmdk: fix metadata write regression Kevin Wolf
2016-07-08 17:21 ` [Qemu-devel] [PULL 29/32] blockdev: Fix regression with the default naming of throttling groups Kevin Wolf
2016-07-08 17:21 ` [Qemu-devel] [PULL 30/32] qemu-iotests: Test " Kevin Wolf
2016-07-08 17:21 ` [Qemu-devel] [PULL 31/32] hmp: use snapshot name to determine whether a snapshot is 'fully available' Kevin Wolf
2016-07-08 17:21 ` [Qemu-devel] [PULL 32/32] hmp: show all of snapshot info on every block dev in output of 'info snapshots' Kevin Wolf
2016-07-11 16:14 ` [Qemu-devel] [PULL 00/32] Block layer patches Peter Maydell
2016-07-11 16:25   ` Eric Blake

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1467998504-15744-28-git-send-email-kwolf@redhat.com \
    --to=kwolf@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).