* [Qemu-devel] [PATCH 00/20] Block device mirroring enhancements, 12-12-12 edition
@ 2012-12-12 13:46 Paolo Bonzini
2012-12-12 13:46 ` [Qemu-devel] [PATCH 01/20] host-utils: add ffsl Paolo Bonzini
` (20 more replies)
0 siblings, 21 replies; 53+ messages in thread
From: Paolo Bonzini @ 2012-12-12 13:46 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, jcody, stefanha
Here are my block mirroring patches for QEMU 1.4, please review.
Patches 1-12 implement improvements and optimizations to the code
that went in 1.3.
Patches 13-20 (lacking tests for now) implement a persistent dirty
bitmap. The bitmap can be used to run storage migration jobs across
multiple executions of the VM or, in the future, offline via qemu-img.
Paolo
Paolo Bonzini (20):
host-utils: add ffsl
add hierarchical bitmap data type and test cases
block: implement dirty bitmap using HBitmap
block: make round_to_clusters public
mirror: perform COW if the cluster size is bigger than the granularity
block: return count of dirty sectors, not chunks
block: allow customizing the granularity of the dirty bitmap
mirror: allow customizing the granularity
mirror: switch mirror_iteration to AIO
mirror: add buf-size argument to drive-mirror
mirror: support more than one in-flight AIO operation
mirror: support arbitrarily-sized iterations
oslib: add a wrapper for mmap/munmap
hbitmap: add hbitmap_alloc_with_data and hbitmap_required_size
hbitmap: add hbitmap_copy
block: split bdrv_enable_dirty_tracking and
bdrv_disable_dirty_tracking
block: support a persistent dirty bitmap
mirror: add support for persistent dirty bitmap
block: choose the default dirty bitmap granularity in
bdrv_enable_dirty_tracking
monitor: add commands to start/stop dirty bitmap
Makefile.objs | 2 +-
block-migration.c | 22 +-
block.c | 279 +++++++++++++++++---------
block.h | 18 +-
block/mirror.c | 393 ++++++++++++++++++++++++++++++------
block_int.h | 15 +-
blockdev.c | 83 ++++++--
blockdev.h | 1 +
hbitmap.c | 485 +++++++++++++++++++++++++++++++++++++++++++++
hbitmap.h | 243 +++++++++++++++++++++++
hmp-commands.hx | 39 ++++
hmp.c | 29 ++-
hmp.h | 2 +
host-utils.h | 26 +++
osdep.h | 10 +
oslib-posix.c | 47 +++++
oslib-win32.c | 59 ++++++
qapi-schema.json | 71 ++++++-
qmp-commands.hx | 75 ++++++-
tests/Makefile | 2 +
tests/qemu-iotests/041 | 52 +++++
tests/qemu-iotests/041.out | 4 +-
tests/test-hbitmap.c | 475 ++++++++++++++++++++++++++++++++++++++++++++
trace-events | 12 ++
24 files changed, 2247 insertions(+), 197 deletions(-)
create mode 100644 hbitmap.c
create mode 100644 hbitmap.h
create mode 100644 tests/test-hbitmap.c
--
1.8.0.1
^ permalink raw reply [flat|nested] 53+ messages in thread
* [Qemu-devel] [PATCH 01/20] host-utils: add ffsl
2012-12-12 13:46 [Qemu-devel] [PATCH 00/20] Block device mirroring enhancements, 12-12-12 edition Paolo Bonzini
@ 2012-12-12 13:46 ` Paolo Bonzini
2012-12-12 23:41 ` Eric Blake
2012-12-12 13:46 ` [Qemu-devel] [PATCH 02/20] add hierarchical bitmap data type and test cases Paolo Bonzini
` (19 subsequent siblings)
20 siblings, 1 reply; 53+ messages in thread
From: Paolo Bonzini @ 2012-12-12 13:46 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, jcody, stefanha
We can provide fast versions based on the other functions defined
by host-utils.h. Some care is required on glibc, which provides
ffsl already.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
host-utils.h | 26 ++++++++++++++++++++++++++
1 file changed, 26 insertions(+)
diff --git a/host-utils.h b/host-utils.h
index 821db93..231d580 100644
--- a/host-utils.h
+++ b/host-utils.h
@@ -24,6 +24,7 @@
*/
#include "compiler.h" /* QEMU_GNUC_PREREQ */
+#include <string.h> /* ffsl */
#if defined(__x86_64__)
#define __HAVE_FAST_MULU64__
@@ -234,3 +235,28 @@ static inline int ctpop64(uint64_t val)
return val;
#endif
}
+
+/* glibc does not provide an inline version of ffsl, so always define
+ * ours. We need to give it a different name, however.
+ */
+#ifdef __GLIBC__
+#define ffsl qemu_ffsl
+#endif
+static inline int ffsl(long val)
+{
+ if (!val) {
+ return 0;
+ }
+
+#if QEMU_GNUC_PREREQ(3, 4)
+ return __builtin_ctzl(val) + 1;
+#else
+ if (sizeof(long) == 4) {
+ return ctz32(val) + 1;
+ } else if (sizeof(long) == 8) {
+ return ctz64(val) + 1;
+ } else {
+ abort();
+ }
+#endif
+}
--
1.8.0.1
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [Qemu-devel] [PATCH 02/20] add hierarchical bitmap data type and test cases
2012-12-12 13:46 [Qemu-devel] [PATCH 00/20] Block device mirroring enhancements, 12-12-12 edition Paolo Bonzini
2012-12-12 13:46 ` [Qemu-devel] [PATCH 01/20] host-utils: add ffsl Paolo Bonzini
@ 2012-12-12 13:46 ` Paolo Bonzini
2012-12-14 0:04 ` Eric Blake
2013-01-11 18:27 ` Stefan Hajnoczi
2012-12-12 13:46 ` [Qemu-devel] [PATCH 03/20] block: implement dirty bitmap using HBitmap Paolo Bonzini
` (18 subsequent siblings)
20 siblings, 2 replies; 53+ messages in thread
From: Paolo Bonzini @ 2012-12-12 13:46 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, jcody, stefanha
HBitmaps provides an array of bits. The bits are stored as usual in an
array of unsigned longs, but HBitmap is also optimized to provide fast
iteration over set bits; going from one bit to the next is O(logB n)
worst case, with B = sizeof(long) * CHAR_BIT: the result is low enough
that the number of levels is in fact fixed.
In order to do this, it stacks multiple bitmaps with progressively coarser
granularity; in all levels except the last, bit N is set iff the N-th
unsigned long is nonzero in the immediately next level. When iteration
completes on the last level it can examine the 2nd-last level to quickly
skip entire words, and even do so recursively to skip blocks of 64 words or
powers thereof (32 on 32-bit machines).
Given an index in the bitmap, it can be split in group of bits like
this (for the 64-bit case):
bits 0-57 => word in the last bitmap | bits 58-63 => bit in the word
bits 0-51 => word in the 2nd-last bitmap | bits 52-57 => bit in the word
bits 0-45 => word in the 3rd-last bitmap | bits 46-51 => bit in the word
So it is easy to move up simply by shifting the index right by
log2(BITS_PER_LONG) bits. To move down, you shift the index left
similarly, and add the word index within the group. Iteration uses
ffs (find first set bit) to find the next word to examine; this
operation can be done in constant time in most current architectures.
Setting or clearing a range of m bits on all levels, the work to perform
is O(m + m/W + m/W^2 + ...), which is O(m) like on a regular bitmap.
When iterating on a bitmap, each bit (on any level) is only visited
once. Hence, The total cost of visiting a bitmap with m bits in it is
the number of bits that are set in all bitmaps. Unless the bitmap is
extremely sparse, this is also O(m + m/W + m/W^2 + ...), so the amortized
cost of advancing from one bit to the next is usually constant.
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
hbitmap.c | 400 ++++++++++++++++++++++++++++++++++++++++++++++++++
hbitmap.h | 207 ++++++++++++++++++++++++++
tests/Makefile | 2 +
tests/test-hbitmap.c | 408 +++++++++++++++++++++++++++++++++++++++++++++++++++
trace-events | 5 +
5 files changed, 1022 insertions(+)
create mode 100644 hbitmap.c
create mode 100644 hbitmap.h
create mode 100644 tests/test-hbitmap.c
diff --git a/hbitmap.c b/hbitmap.c
new file mode 100644
index 0000000..ae59f39
--- /dev/null
+++ b/hbitmap.c
@@ -0,0 +1,400 @@
+/*
+ * Hierarchical Bitmap Data Type
+ *
+ * Copyright Red Hat, Inc., 2012
+ *
+ * Author: Paolo Bonzini <pbonzini@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later. See the COPYING file in the top-level directory.
+ */
+
+#include "osdep.h"
+#include "hbitmap.h"
+#include "host-utils.h"
+#include "trace.h"
+#include <string.h>
+#include <glib.h>
+#include <assert.h>
+
+/* HBitmaps provides an array of bits. The bits are stored as usual in an
+ * array of unsigned longs, but HBitmap is also optimized to provide fast
+ * iteration over set bits; going from one bit to the next is O(logB n)
+ * worst case, with B = sizeof(long) * CHAR_BIT: the result is low enough
+ * that the number of levels is in fact fixed.
+ *
+ * In order to do this, it stacks multiple bitmaps with progressively coarser
+ * granularity; in all levels except the last, bit N is set iff the N-th
+ * unsigned long is nonzero in the immediately next level. When iteration
+ * completes on the last level it can examine the 2nd-last level to quickly
+ * skip entire words, and even do so recursively to skip blocks of 64 words or
+ * powers thereof (32 on 32-bit machines).
+ *
+ * Given an index in the bitmap, it can be split in group of bits like
+ * this (for the 64-bit case):
+ *
+ * bits 0-57 => word in the last bitmap | bits 58-63 => bit in the word
+ * bits 0-51 => word in the 2nd-last bitmap | bits 52-57 => bit in the word
+ * bits 0-45 => word in the 3rd-last bitmap | bits 46-51 => bit in the word
+ *
+ * So it is easy to move up simply by shifting the index right by
+ * log2(BITS_PER_LONG) bits. To move down, you shift the index left
+ * similarly, and add the word index within the group. Iteration uses
+ * ffs (find first set bit) to find the next word to examine; this
+ * operation can be done in constant time in most current architectures.
+ *
+ * Setting or clearing a range of m bits on all levels, the work to perform
+ * is O(m + m/W + m/W^2 + ...), which is O(m) like on a regular bitmap.
+ *
+ * When iterating on a bitmap, each bit (on any level) is only visited
+ * once. Hence, The total cost of visiting a bitmap with m bits in it is
+ * the number of bits that are set in all bitmaps. Unless the bitmap is
+ * extremely sparse, this is also O(m + m/W + m/W^2 + ...), so the amortized
+ * cost of advancing from one bit to the next is usually constant (worst case
+ * O(logB n) as in the non-amortized complexity).
+ */
+
+struct HBitmap {
+ /* Number of total bits in the bottom level. */
+ uint64_t size;
+
+ /* Number of set bits in the bottom level. */
+ uint64_t count;
+
+ /* A scaling factor. Given a granularity of G, each bit in the bitmap will
+ * will actually represent a group of 2^G elements. Each operation on a
+ * range of bits first rounds the bits to determine which group they land
+ * in, and then affect the entire page; iteration will only visit the first
+ * bit of each group. Here is an example of operations in a size-16,
+ * granularity-1 HBitmap:
+ *
+ * initial state 00000000
+ * set(start=0, count=9) 11111000 (iter: 0, 2, 4, 6, 8)
+ * reset(start=1, count=3) 00111000 (iter: 4, 6, 8)
+ * set(start=9, count=2) 00111100 (iter: 4, 6, 8, 10)
+ * reset(start=5, count=5) 00000000
+ *
+ * From an implementation point of view, when setting or resetting bits,
+ * the bitmap will scale bit numbers right by this amount of bits. When
+ * iterating, the bitmap will scale bit numbers left by this amount of
+ * bits.
+ */
+ int granularity;
+
+ /* A number of progressively less coarse bitmaps (i.e. level 0 is the
+ * coarsest). Each bit in level N represents a word in level N+1 that
+ * has a set bit, except the last level where each bit represents the
+ * actual bitmap.
+ *
+ * Note that all bitmaps have the same number of levels. Even a 1-bit
+ * bitmap will still allocate HBITMAP_LEVELS arrays.
+ */
+ unsigned long *levels[HBITMAP_LEVELS];
+};
+
+static inline int popcountl(unsigned long l)
+{
+ return BITS_PER_LONG == 32 ? ctpop32(l) : ctpop64(l);
+}
+
+/* Advance hbi to the next nonzero word and return it. hbi->pos
+ * is updated. Returns zero if we reach the end of the bitmap.
+ */
+unsigned long hbitmap_iter_skip_words(HBitmapIter *hbi)
+{
+ size_t pos = hbi->pos;
+ const HBitmap *hb = hbi->hb;
+ unsigned i = HBITMAP_LEVELS - 1;
+
+ unsigned long cur;
+ do {
+ cur = hbi->cur[--i];
+ pos >>= BITS_PER_LEVEL;
+ } while (cur == 0);
+
+ /* Check for end of iteration. We always use fewer than BITS_PER_LONG
+ * bits in the level 0 bitmap; thus we can repurpose the most significant
+ * bit as a sentinel. The sentinel is set in hbitmap_alloc and ensures
+ * that the above loop ends even without an explicit check on i.
+ */
+
+ if (i == 0 && cur == (1UL << (BITS_PER_LONG - 1))) {
+ return 0;
+ }
+ for (; i < HBITMAP_LEVELS - 1; i++) {
+ /* Shift back pos to the left, matching the right shifts above.
+ * The index of this word's least significant set bit provides
+ * the low-order bits.
+ */
+ pos = (pos << BITS_PER_LEVEL) + ffsl(cur) - 1;
+ hbi->cur[i] = cur & (cur - 1);
+
+ /* Set up next level for iteration. */
+ cur = hb->levels[i + 1][pos];
+ }
+
+ hbi->pos = pos;
+ trace_hbitmap_iter_skip_words(hbi->hb, hbi, pos, cur);
+
+ assert(cur);
+ return cur;
+}
+
+void hbitmap_iter_init(HBitmapIter *hbi, const HBitmap *hb, uint64_t first)
+{
+ unsigned i, bit;
+ uint64_t pos;
+
+ hbi->hb = hb;
+ pos = first >> hb->granularity;
+ hbi->pos = pos >> BITS_PER_LEVEL;
+ hbi->granularity = hb->granularity;
+
+ for (i = HBITMAP_LEVELS; i-- > 0; ) {
+ bit = pos & (BITS_PER_LONG - 1);
+ pos >>= BITS_PER_LEVEL;
+
+ /* Drop bits representing items before first. */
+ hbi->cur[i] = hb->levels[i][pos] & ~((1UL << bit) - 1);
+
+ /* We have already added level i+1, so the lowest set bit has
+ * been processed. Clear it.
+ */
+ if (i != HBITMAP_LEVELS - 1) {
+ hbi->cur[i] &= ~(1UL << bit);
+ }
+ }
+}
+
+bool hbitmap_empty(const HBitmap *hb)
+{
+ return hb->count == 0;
+}
+
+int hbitmap_granularity(const HBitmap *hb)
+{
+ return hb->granularity;
+}
+
+uint64_t hbitmap_count(const HBitmap *hb)
+{
+ return hb->count << hb->granularity;
+}
+
+/* Count the number of set bits between start and end, not accounting for
+ * the granularity. Also an example of how to use hbitmap_iter_next_word.
+ */
+static uint64_t hb_count_between(HBitmap *hb, uint64_t start, uint64_t last)
+{
+ HBitmapIter hbi;
+ uint64_t count = 0;
+ uint64_t end = last + 1;
+ unsigned long cur;
+ size_t pos;
+
+ hbitmap_iter_init(&hbi, hb, start << hb->granularity);
+ for (;;) {
+ pos = hbitmap_iter_next_word(&hbi, &cur);
+ if (pos >= (end >> BITS_PER_LEVEL)) {
+ break;
+ }
+ count += popcountl(cur);
+ }
+
+ if (pos == (end >> BITS_PER_LEVEL)) {
+ /* Drop bits representing the END-th and subsequent items. */
+ int bit = end & (BITS_PER_LONG - 1);
+ cur &= (1UL << bit) - 1;
+ count += popcountl(cur);
+ }
+
+ return count;
+}
+
+/* Setting starts at the last layer and propagates up if an element
+ * changes from zero to non-zero.
+ */
+static inline bool hb_set_elem(unsigned long *elem, uint64_t start, uint64_t last)
+{
+ unsigned long mask;
+ bool changed;
+
+ assert((last >> BITS_PER_LEVEL) == (start >> BITS_PER_LEVEL));
+ assert(start <= last);
+
+ mask = 2UL << (last & (BITS_PER_LONG - 1));
+ mask -= 1UL << (start & (BITS_PER_LONG - 1));
+ changed = (*elem == 0);
+ *elem |= mask;
+ return changed;
+}
+
+/* The recursive workhorse (the depth is limited to HBITMAP_LEVELS)... */
+static void hb_set_between(HBitmap *hb, int level, uint64_t start, uint64_t last)
+{
+ size_t pos = start >> BITS_PER_LEVEL;
+ size_t lastpos = last >> BITS_PER_LEVEL;
+ bool changed = false;
+ size_t i;
+
+ i = pos;
+ if (i < lastpos) {
+ uint64_t next = (start | (BITS_PER_LONG - 1)) + 1;
+ changed |= hb_set_elem(&hb->levels[level][i], start, next - 1);
+ for (;;) {
+ start = next;
+ next += BITS_PER_LONG;
+ if (++i == lastpos) {
+ break;
+ }
+ changed |= (hb->levels[level][i] == 0);
+ hb->levels[level][i] = ~0UL;
+ }
+ }
+ changed |= hb_set_elem(&hb->levels[level][i], start, last);
+
+ /* If there was any change in this layer, we may have to update
+ * the one above.
+ */
+ if (level > 0 && changed) {
+ hb_set_between(hb, level - 1, pos, lastpos);
+ }
+}
+
+void hbitmap_set(HBitmap *hb, uint64_t start, uint64_t count)
+{
+ /* Compute range in the last layer. */
+ uint64_t last = start + count - 1;
+
+ trace_hbitmap_set(hb, start, count,
+ start >> hb->granularity, last >> hb->granularity);
+
+ start >>= hb->granularity;
+ last >>= hb->granularity;
+ count = last - start + 1;
+
+ hb->count += count - hb_count_between(hb, start, last);
+ hb_set_between(hb, HBITMAP_LEVELS - 1, start, last);
+}
+
+/* Resetting works the other way round: propagate up if the new
+ * value is zero.
+ */
+static inline bool hb_reset_elem(unsigned long *elem, uint64_t start, uint64_t last)
+{
+ unsigned long mask;
+ bool blanked;
+
+ assert((last >> BITS_PER_LEVEL) == (start >> BITS_PER_LEVEL));
+ assert(start <= last);
+
+ mask = 2UL << (last & (BITS_PER_LONG - 1));
+ mask -= 1UL << (start & (BITS_PER_LONG - 1));
+ blanked = *elem != 0 && ((*elem & ~mask) == 0);
+ *elem &= ~mask;
+ return blanked;
+}
+
+/* The recursive workhorse (the depth is limited to HBITMAP_LEVELS)... */
+static void hb_reset_between(HBitmap *hb, int level, uint64_t start, uint64_t last)
+{
+ size_t pos = start >> BITS_PER_LEVEL;
+ size_t lastpos = last >> BITS_PER_LEVEL;
+ bool changed = false;
+ size_t i;
+
+ i = pos;
+ if (i < lastpos) {
+ uint64_t next = (start | (BITS_PER_LONG - 1)) + 1;
+
+ /* Here we need a more complex test than when setting bits. Even if
+ * something was changed, we must not blank bits in the upper level
+ * unless the lower-level word became entirely zero. So, remove pos
+ * from the upper-level range if bits remain set.
+ */
+ if (hb_reset_elem(&hb->levels[level][i], start, next - 1)) {
+ changed = true;
+ } else {
+ pos++;
+ }
+
+ for (;;) {
+ start = next;
+ next += BITS_PER_LONG;
+ if (++i == lastpos) {
+ break;
+ }
+ changed |= (hb->levels[level][i] != 0);
+ hb->levels[level][i] = 0UL;
+ }
+ }
+
+ /* Same as above, this time for lastpos. */
+ if (hb_reset_elem(&hb->levels[level][i], start, last)) {
+ changed = true;
+ } else {
+ lastpos--;
+ }
+
+ if (level > 0 && changed) {
+ hb_reset_between(hb, level - 1, pos, lastpos);
+ }
+}
+
+void hbitmap_reset(HBitmap *hb, uint64_t start, uint64_t count)
+{
+ /* Compute range in the last layer. */
+ uint64_t last = start + count - 1;
+
+ trace_hbitmap_reset(hb, start, count,
+ start >> hb->granularity, last >> hb->granularity);
+
+ start >>= hb->granularity;
+ last >>= hb->granularity;
+
+ hb->count -= hb_count_between(hb, start, last);
+ hb_reset_between(hb, HBITMAP_LEVELS - 1, start, last);
+}
+
+bool hbitmap_get(const HBitmap *hb, uint64_t item)
+{
+ /* Compute position and bit in the last layer. */
+ uint64_t pos = item >> hb->granularity;
+ unsigned long bit = 1UL << (pos & (BITS_PER_LONG - 1));
+
+ return (hb->levels[HBITMAP_LEVELS - 1][pos >> BITS_PER_LEVEL] & bit) != 0;
+}
+
+void hbitmap_free(HBitmap *hb)
+{
+ unsigned i;
+ for (i = HBITMAP_LEVELS; i-- > 0; ) {
+ g_free(hb->levels[i]);
+ }
+ g_free(hb);
+}
+
+HBitmap *hbitmap_alloc(uint64_t size, int granularity)
+{
+ HBitmap *hb = g_malloc0(sizeof(struct HBitmap));
+ unsigned i;
+
+ assert(granularity >= 0 && granularity < 64);
+ size = (size + (1ULL << granularity) - 1) >> granularity;
+ assert(size <= ((uint64_t)1 << HBITMAP_LOG_MAX_SIZE));
+
+ hb->size = size;
+ hb->granularity = granularity;
+ for (i = HBITMAP_LEVELS; i-- > 0; ) {
+ size = MAX((size + BITS_PER_LONG - 1) >> BITS_PER_LEVEL, 1);
+ hb->levels[i] = g_malloc0(size * sizeof(unsigned long));
+ }
+
+ /* We necessarily have free bits in level 0 due to the definition
+ * of HBITMAP_LEVELS, so use one for a sentinel. This speeds up
+ * hbitmap_iter_skip_words.
+ */
+ assert(size == 1);
+ hb->levels[0][0] |= 1UL << (BITS_PER_LONG - 1);
+ return hb;
+}
diff --git a/hbitmap.h b/hbitmap.h
new file mode 100644
index 0000000..7ddfb66
--- /dev/null
+++ b/hbitmap.h
@@ -0,0 +1,207 @@
+/*
+ * Hierarchical Bitmap Data Type
+ *
+ * Copyright Red Hat, Inc., 2012
+ *
+ * Author: Paolo Bonzini <pbonzini@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later. See the COPYING file in the top-level directory.
+ */
+
+#ifndef HBITMAP_H
+#define HBITMAP_H 1
+
+#include <limits.h>
+#include <stdint.h>
+#include <stdbool.h>
+#include "bitops.h"
+
+typedef struct HBitmap HBitmap;
+typedef struct HBitmapIter HBitmapIter;
+
+#define BITS_PER_LEVEL (BITS_PER_LONG == 32 ? 5 : 6)
+
+/* For 32-bit, the largest that fits in a 4 GiB address space.
+ * For 64-bit, the number of sectors in 1 PiB. Good luck, in
+ * either case... :)
+ */
+#define HBITMAP_LOG_MAX_SIZE (BITS_PER_LONG == 32 ? 34 : 41)
+
+/* We need to place a sentinel in level 0 to speed up iteration. Thus,
+ * we do this instead of HBITMAP_LOG_MAX_SIZE / BITS_PER_LEVEL. The
+ * difference is that it allocates an extra level when HBITMAP_LOG_MAX_SIZE
+ * is an exact multiple of BITS_PER_LEVEL.
+ */
+#define HBITMAP_LEVELS ((HBITMAP_LOG_MAX_SIZE / BITS_PER_LEVEL) + 1)
+
+struct HBitmapIter {
+ const HBitmap *hb;
+
+ /* Copied from hb for access in the inline functions (hb is opaque). */
+ int granularity;
+
+ /* Entry offset into the last-level array of longs. */
+ size_t pos;
+
+ /* The currently-active path in the tree. Each item of cur[i] stores
+ * the bits (i.e. the subtrees) yet to be processed under that node.
+ */
+ unsigned long cur[HBITMAP_LEVELS];
+};
+
+/**
+ * hbitmap_alloc:
+ * @size: Number of bits in the bitmap.
+ * @granularity: Granularity of the bitmap. Aligned groups of 2^@granularity
+ * bits will be represented by a single bit. Each operation on a
+ * range of bits first rounds the bits to determine which group they land
+ * in, and then affect the entire set; iteration will only visit the first
+ * bit of each group.
+ *
+ * Allocate a new HBitmap.
+ */
+HBitmap *hbitmap_alloc(uint64_t size, int granularity);
+
+/**
+ * hbitmap_empty:
+ * @hb: HBitmap to operate on.
+ *
+ * Return whether the bitmap is empty.
+ */
+bool hbitmap_empty(const HBitmap *hb);
+
+/**
+ * hbitmap_granularity:
+ * @hb: HBitmap to operate on.
+ *
+ * Return the granularity of the HBitmap.
+ */
+int hbitmap_granularity(const HBitmap *hb);
+
+/**
+ * hbitmap_count:
+ * @hb: HBitmap to operate on.
+ *
+ * Return the number of bits set in the HBitmap.
+ */
+uint64_t hbitmap_count(const HBitmap *hb);
+
+/**
+ * hbitmap_set:
+ * @hb: HBitmap to operate on.
+ * @start: First bit to set (0-based).
+ * @count: Number of bits to set.
+ *
+ * Set a consecutive range of bits in an HBitmap.
+ */
+void hbitmap_set(HBitmap *hb, uint64_t start, uint64_t count);
+
+/**
+ * hbitmap_reset:
+ * @hb: HBitmap to operate on.
+ * @start: First bit to reset (0-based).
+ * @count: Number of bits to reset.
+ *
+ * Reset a consecutive range of bits in an HBitmap.
+ */
+void hbitmap_reset(HBitmap *hb, uint64_t start, uint64_t count);
+
+/**
+ * hbitmap_get:
+ * @hb: HBitmap to operate on.
+ * @item: Bit to query (0-based).
+ *
+ * Return whether the @item-th bit in an HBitmap is set.
+ */
+bool hbitmap_get(const HBitmap *hb, uint64_t item);
+
+/**
+ * hbitmap_free:
+ * @hb: HBitmap to operate on.
+ *
+ * Free an HBitmap and all of its associated memory.
+ */
+void hbitmap_free(HBitmap *hb);
+
+/**
+ * hbitmap_iter_init:
+ * @hbi: HBitmapIter to initialize.
+ * @hb: HBitmap to iterate on.
+ * @first: First bit to visit (0-based).
+ *
+ * Set up @hbi to iterate on the HBitmap @hb. hbitmap_iter_next will return
+ * the lowest-numbered bit that is set in @hb, starting at @first.
+ *
+ * Concurrent setting of bits is acceptable, and will at worst cause the
+ * iteration to miss some of those bits. Resetting bits before the current
+ * position of the iterator is also okay. However, concurrent resetting of
+ * bits can lead to unexpected behavior if the iterator has not yet reached
+ * those bits.
+ */
+void hbitmap_iter_init(HBitmapIter *hbi, const HBitmap *hb, uint64_t first);
+
+/* hbitmap_iter_skip_words:
+ * @hbi: HBitmapIter to operate on.
+ *
+ * Internal function used by hbitmap_iter_next and hbitmap_iter_next_word.
+ */
+unsigned long hbitmap_iter_skip_words(HBitmapIter *hbi);
+
+/**
+ * hbitmap_iter_next:
+ * @hbi: HBitmapIter to operate on.
+ *
+ * Return the next bit that is set in @hbi's associated HBitmap,
+ * or -1 if all remaining bits are zero.
+ */
+static inline int64_t hbitmap_iter_next(HBitmapIter *hbi)
+{
+ unsigned long cur = hbi->cur[HBITMAP_LEVELS - 1];
+ int64_t item;
+
+ if (cur == 0) {
+ cur = hbitmap_iter_skip_words(hbi);
+ if (cur == 0) {
+ return -1;
+ }
+ }
+
+ /* The next call will resume work from the next bit. */
+ hbi->cur[HBITMAP_LEVELS - 1] = cur & (cur - 1);
+ item = ((uint64_t)hbi->pos << BITS_PER_LEVEL) + ffsl(cur) - 1;
+
+ return item << hbi->granularity;
+}
+
+/**
+ * hbitmap_iter_next_word:
+ * @hbi: HBitmapIter to operate on.
+ * @p_cur: Location where to store the next non-zero word.
+ *
+ * Return the index of the next nonzero word that is set in @hbi's
+ * associated HBitmap, and set *p_cur to the content of that word
+ * (bits before the index that was passed to hbitmap_iter_init are
+ * trimmed on the first call). Return -1, and set *p_cur to zero,
+ * if all remaining words are zero.
+ */
+static inline size_t hbitmap_iter_next_word(HBitmapIter *hbi, unsigned long *p_cur)
+{
+ unsigned long cur = hbi->cur[HBITMAP_LEVELS - 1];
+
+ if (cur == 0) {
+ cur = hbitmap_iter_skip_words(hbi);
+ if (cur == 0) {
+ *p_cur = 0;
+ return -1;
+ }
+ }
+
+ /* The next call will resume work from the next word. */
+ hbi->cur[HBITMAP_LEVELS - 1] = 0;
+ *p_cur = cur;
+ return hbi->pos;
+}
+
+
+#endif
diff --git a/tests/Makefile b/tests/Makefile
index b60f0fb..b04ecb4 100644
--- a/tests/Makefile
+++ b/tests/Makefile
@@ -17,6 +17,7 @@ check-unit-y += tests/test-visitor-serialization$(EXESUF)
check-unit-y += tests/test-iov$(EXESUF)
check-unit-y += tests/test-aio$(EXESUF)
check-unit-y += tests/test-thread-pool$(EXESUF)
+check-unit-y += tests/test-hbitmap$(EXESUF)
check-block-$(CONFIG_POSIX) += tests/qemu-iotests-quick.sh
@@ -54,6 +55,7 @@ tests/test-coroutine$(EXESUF): tests/test-coroutine.o $(coroutine-obj-y) $(tools
tests/test-aio$(EXESUF): tests/test-aio.o $(coroutine-obj-y) $(tools-obj-y) $(block-obj-y) libqemustub.a
tests/test-thread-pool$(EXESUF): tests/test-thread-pool.o $(coroutine-obj-y) $(tools-obj-y) $(block-obj-y) libqemustub.a
tests/test-iov$(EXESUF): tests/test-iov.o iov.o
+tests/test-hbitmap$(EXESUF): tests/test-hbitmap.o hbitmap.o $(trace-obj-y)
tests/test-qapi-types.c tests/test-qapi-types.h :\
$(SRC_PATH)/qapi-schema-test.json $(SRC_PATH)/scripts/qapi-types.py
diff --git a/tests/test-hbitmap.c b/tests/test-hbitmap.c
new file mode 100644
index 0000000..b5d76a7
--- /dev/null
+++ b/tests/test-hbitmap.c
@@ -0,0 +1,408 @@
+/*
+ * Hierarchical bitmap unit-tests.
+ *
+ * Copyright (C) 2012 Red Hat Inc.
+ *
+ * Author: Paolo Bonzini <pbonzini@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include <glib.h>
+#include <stdarg.h>
+#include "hbitmap.h"
+
+#define LOG_BITS_PER_LONG (BITS_PER_LONG == 32 ? 5 : 6)
+
+#define L1 BITS_PER_LONG
+#define L2 (BITS_PER_LONG * L1)
+#define L3 (BITS_PER_LONG * L2)
+
+typedef struct TestHBitmapData {
+ HBitmap *hb;
+ unsigned long *bits;
+ size_t size;
+ int granularity;
+} TestHBitmapData;
+
+
+/* Check that the HBitmap and the shadow bitmap contain the same data,
+ * ignoring the same "first" bits.
+ */
+static void hbitmap_test_check(TestHBitmapData *data,
+ uint64_t first)
+{
+ uint64_t count = 0;
+ size_t pos;
+ int bit;
+ HBitmapIter hbi;
+ int64_t i, next;
+
+ hbitmap_iter_init(&hbi, data->hb, first);
+
+ i = first;
+ for (;;) {
+ next = hbitmap_iter_next(&hbi);
+ if (next < 0) {
+ next = data->size;
+ }
+
+ while (i < next) {
+ pos = i >> LOG_BITS_PER_LONG;
+ bit = i & (BITS_PER_LONG - 1);
+ i++;
+ g_assert_cmpint(data->bits[pos] & (1UL << bit), ==, 0);
+ }
+
+ if (next == data->size) {
+ break;
+ }
+
+ pos = i >> LOG_BITS_PER_LONG;
+ bit = i & (BITS_PER_LONG - 1);
+ i++;
+ count++;
+ g_assert_cmpint(data->bits[pos] & (1UL << bit), !=, 0);
+ }
+
+ if (first == 0) {
+ g_assert_cmpint(count << data->granularity, ==, hbitmap_count(data->hb));
+ }
+}
+
+/* This is provided instead of a test setup function so that the sizes
+ are kept in the test functions (and not in main()) */
+static void hbitmap_test_init(TestHBitmapData *data,
+ uint64_t size, int granularity)
+{
+ size_t n;
+ data->hb = hbitmap_alloc(size, granularity);
+
+ n = (size + BITS_PER_LONG - 1) / BITS_PER_LONG;
+ if (n == 0) {
+ n = 1;
+ }
+ data->bits = g_new0(unsigned long, n);
+ data->size = size;
+ data->granularity = granularity;
+ hbitmap_test_check(data, 0);
+}
+
+static void hbitmap_test_teardown(TestHBitmapData *data,
+ const void *unused)
+{
+ if (data->hb) {
+ hbitmap_free(data->hb);
+ data->hb = NULL;
+ }
+ if (data->bits) {
+ g_free(data->bits);
+ data->bits = NULL;
+ }
+}
+
+/* Set a range in the HBitmap and in the shadow "simple" bitmap.
+ * The two bitmaps are then tested against each other.
+ */
+static void hbitmap_test_set(TestHBitmapData *data,
+ uint64_t first, uint64_t count)
+{
+ hbitmap_set(data->hb, first, count);
+ while (count-- != 0) {
+ size_t pos = first >> LOG_BITS_PER_LONG;
+ int bit = first & (BITS_PER_LONG - 1);
+ first++;
+
+ data->bits[pos] |= 1UL << bit;
+ }
+
+ if (data->granularity == 0) {
+ hbitmap_test_check(data, 0);
+ }
+}
+
+/* Reset a range in the HBitmap and in the shadow "simple" bitmap.
+ */
+static void hbitmap_test_reset(TestHBitmapData *data,
+ uint64_t first, uint64_t count)
+{
+ hbitmap_reset(data->hb, first, count);
+ while (count-- != 0) {
+ size_t pos = first >> LOG_BITS_PER_LONG;
+ int bit = first & (BITS_PER_LONG - 1);
+ first++;
+
+ data->bits[pos] &= ~(1UL << bit);
+ }
+
+ if (data->granularity == 0) {
+ hbitmap_test_check(data, 0);
+ }
+}
+
+static void hbitmap_test_check_get(TestHBitmapData *data)
+{
+ uint64_t count = 0;
+ uint64_t i;
+
+ for (i = 0; i < data->size; i++) {
+ size_t pos = i >> LOG_BITS_PER_LONG;
+ int bit = i & (BITS_PER_LONG - 1);
+ unsigned long val = data->bits[pos] & (1UL << bit);
+ count += hbitmap_get(data->hb, i);
+ g_assert_cmpint(hbitmap_get(data->hb, i), ==, val != 0);
+ }
+ g_assert_cmpint(count, ==, hbitmap_count(data->hb));
+}
+
+static void test_hbitmap_zero(TestHBitmapData *data,
+ const void *unused)
+{
+ hbitmap_test_init(data, 0, 0);
+}
+
+static void test_hbitmap_unaligned(TestHBitmapData *data,
+ const void *unused)
+{
+ hbitmap_test_init(data, L3 + 23, 0);
+ hbitmap_test_set(data, 0, 1);
+ hbitmap_test_set(data, L3 + 22, 1);
+}
+
+static void test_hbitmap_iter_empty(TestHBitmapData *data,
+ const void *unused)
+{
+ hbitmap_test_init(data, L1, 0);
+}
+
+static void test_hbitmap_iter_partial(TestHBitmapData *data,
+ const void *unused)
+{
+ hbitmap_test_init(data, L3, 0);
+ hbitmap_test_set(data, 0, L3);
+ hbitmap_test_check(data, 1);
+ hbitmap_test_check(data, L1 - 1);
+ hbitmap_test_check(data, L1);
+ hbitmap_test_check(data, L1 * 2 - 1);
+ hbitmap_test_check(data, L2 - 1);
+ hbitmap_test_check(data, L2);
+ hbitmap_test_check(data, L2 + 1);
+ hbitmap_test_check(data, L2 + L1);
+ hbitmap_test_check(data, L2 + L1 * 2 - 1);
+ hbitmap_test_check(data, L2 * 2 - 1);
+ hbitmap_test_check(data, L2 * 2);
+ hbitmap_test_check(data, L2 * 2 + 1);
+ hbitmap_test_check(data, L2 * 2 + L1);
+ hbitmap_test_check(data, L2 * 2 + L1 * 2 - 1);
+ hbitmap_test_check(data, L3 / 2);
+}
+
+static void test_hbitmap_iter_past(TestHBitmapData *data,
+ const void *unused)
+{
+ hbitmap_test_init(data, L3, 0);
+ hbitmap_test_set(data, 0, L3);
+ hbitmap_test_check(data, L3);
+}
+
+static void test_hbitmap_set_all(TestHBitmapData *data,
+ const void *unused)
+{
+ hbitmap_test_init(data, L3, 0);
+ hbitmap_test_set(data, 0, L3);
+}
+
+static void test_hbitmap_get_all(TestHBitmapData *data,
+ const void *unused)
+{
+ hbitmap_test_init(data, L3, 0);
+ hbitmap_test_set(data, 0, L3);
+ hbitmap_test_check_get(data);
+}
+
+static void test_hbitmap_get_some(TestHBitmapData *data,
+ const void *unused)
+{
+ hbitmap_test_init(data, 2 * L2, 0);
+ hbitmap_test_set(data, 10, 1);
+ hbitmap_test_check_get(data);
+ hbitmap_test_set(data, L1 - 1, 1);
+ hbitmap_test_check_get(data);
+ hbitmap_test_set(data, L1, 1);
+ hbitmap_test_check_get(data);
+ hbitmap_test_set(data, L2 - 1, 1);
+ hbitmap_test_check_get(data);
+ hbitmap_test_set(data, L2, 1);
+ hbitmap_test_check_get(data);
+}
+
+static void test_hbitmap_set_one(TestHBitmapData *data,
+ const void *unused)
+{
+ hbitmap_test_init(data, 2 * L2, 0);
+ hbitmap_test_set(data, 10, 1);
+ hbitmap_test_set(data, L1 - 1, 1);
+ hbitmap_test_set(data, L1, 1);
+ hbitmap_test_set(data, L2 - 1, 1);
+ hbitmap_test_set(data, L2, 1);
+}
+
+static void test_hbitmap_set_two_elem(TestHBitmapData *data,
+ const void *unused)
+{
+ hbitmap_test_init(data, 2 * L2, 0);
+ hbitmap_test_set(data, L1 - 1, 2);
+ hbitmap_test_set(data, L1 * 2 - 1, 4);
+ hbitmap_test_set(data, L1 * 4, L1 + 1);
+ hbitmap_test_set(data, L1 * 8 - 1, L1 + 1);
+ hbitmap_test_set(data, L2 - 1, 2);
+ hbitmap_test_set(data, L2 + L1 - 1, 8);
+ hbitmap_test_set(data, L2 + L1 * 4, L1 + 1);
+ hbitmap_test_set(data, L2 + L1 * 8 - 1, L1 + 1);
+}
+
+static void test_hbitmap_set(TestHBitmapData *data,
+ const void *unused)
+{
+ hbitmap_test_init(data, L3 * 2, 0);
+ hbitmap_test_set(data, L1 - 1, L1 + 2);
+ hbitmap_test_set(data, L1 * 3 - 1, L1 + 2);
+ hbitmap_test_set(data, L1 * 5, L1 * 2 + 1);
+ hbitmap_test_set(data, L1 * 8 - 1, L1 * 2 + 1);
+ hbitmap_test_set(data, L2 - 1, L1 + 2);
+ hbitmap_test_set(data, L2 + L1 * 2 - 1, L1 + 2);
+ hbitmap_test_set(data, L2 + L1 * 4, L1 * 2 + 1);
+ hbitmap_test_set(data, L2 + L1 * 7 - 1, L1 * 2 + 1);
+ hbitmap_test_set(data, L2 * 2 - 1, L3 * 2 - L2 * 2);
+}
+
+static void test_hbitmap_set_twice(TestHBitmapData *data,
+ const void *unused)
+{
+ hbitmap_test_init(data, L1 * 3, 0);
+ hbitmap_test_set(data, 0, L1 * 3);
+ hbitmap_test_set(data, L1, 1);
+}
+
+static void test_hbitmap_set_overlap(TestHBitmapData *data,
+ const void *unused)
+{
+ hbitmap_test_init(data, L3 * 2, 0);
+ hbitmap_test_set(data, L1 - 1, L1 + 2);
+ hbitmap_test_set(data, L1 * 2 - 1, L1 * 2 + 2);
+ hbitmap_test_set(data, 0, L1 * 3);
+ hbitmap_test_set(data, L1 * 8 - 1, L2);
+ hbitmap_test_set(data, L2, L1);
+ hbitmap_test_set(data, L2 - L1 - 1, L1 * 8 + 2);
+ hbitmap_test_set(data, L2, L3 - L2 + 1);
+ hbitmap_test_set(data, L3 - L1, L1 * 3);
+ hbitmap_test_set(data, L3 - 1, 3);
+ hbitmap_test_set(data, L3 - 1, L2);
+}
+
+static void test_hbitmap_reset_empty(TestHBitmapData *data,
+ const void *unused)
+{
+ hbitmap_test_init(data, L3, 0);
+ hbitmap_test_reset(data, 0, L3);
+}
+
+static void test_hbitmap_reset(TestHBitmapData *data,
+ const void *unused)
+{
+ hbitmap_test_init(data, L3 * 2, 0);
+ hbitmap_test_set(data, L1 - 1, L1 + 2);
+ hbitmap_test_reset(data, L1 * 2 - 1, L1 * 2 + 2);
+ hbitmap_test_set(data, 0, L1 * 3);
+ hbitmap_test_reset(data, L1 * 8 - 1, L2);
+ hbitmap_test_set(data, L2, L1);
+ hbitmap_test_reset(data, L2 - L1 - 1, L1 * 8 + 2);
+ hbitmap_test_set(data, L2, L3 - L2 + 1);
+ hbitmap_test_reset(data, L3 - L1, L1 * 3);
+ hbitmap_test_set(data, L3 - 1, 3);
+ hbitmap_test_reset(data, L3 - 1, L2);
+ hbitmap_test_set(data, 0, L3 * 2);
+ hbitmap_test_reset(data, 0, L1);
+ hbitmap_test_reset(data, 0, L2);
+ hbitmap_test_reset(data, L3, L3);
+ hbitmap_test_set(data, L3 / 2, L3);
+}
+
+static void test_hbitmap_granularity(TestHBitmapData *data,
+ const void *unused)
+{
+ /* Note that hbitmap_test_check has to be invoked manually in this test. */
+ hbitmap_test_init(data, L1, 1);
+ hbitmap_test_set(data, 0, 1);
+ g_assert_cmpint(hbitmap_count(data->hb), ==, 2);
+ hbitmap_test_check(data, 0);
+ hbitmap_test_set(data, 2, 1);
+ g_assert_cmpint(hbitmap_count(data->hb), ==, 4);
+ hbitmap_test_check(data, 0);
+ hbitmap_test_set(data, 0, 3);
+ g_assert_cmpint(hbitmap_count(data->hb), ==, 4);
+ hbitmap_test_reset(data, 0, 1);
+ g_assert_cmpint(hbitmap_count(data->hb), ==, 2);
+}
+
+static void test_hbitmap_iter_granularity(TestHBitmapData *data,
+ const void *unused)
+{
+ HBitmapIter hbi;
+
+ /* Note that hbitmap_test_check has to be invoked manually in this test. */
+ hbitmap_test_init(data, 131072 << 7, 7);
+ hbitmap_iter_init(&hbi, data->hb, 0);
+ g_assert_cmpint(hbitmap_iter_next(&hbi), <, 0);
+
+ hbitmap_test_set(data, ((L2 + L1 + 1) << 7) + 8, 8);
+ hbitmap_iter_init(&hbi, data->hb, 0);
+ g_assert_cmpint(hbitmap_iter_next(&hbi), ==, (L2 + L1 + 1) << 7);
+ g_assert_cmpint(hbitmap_iter_next(&hbi), <, 0);
+
+ hbitmap_iter_init(&hbi, data->hb, (L2 + L1 + 2) << 7);
+ g_assert_cmpint(hbitmap_iter_next(&hbi), <, 0);
+
+ hbitmap_test_set(data, (131072 << 7) - 8, 8);
+ hbitmap_iter_init(&hbi, data->hb, 0);
+ g_assert_cmpint(hbitmap_iter_next(&hbi), ==, (L2 + L1 + 1) << 7);
+ g_assert_cmpint(hbitmap_iter_next(&hbi), ==, 131071 << 7);
+ g_assert_cmpint(hbitmap_iter_next(&hbi), <, 0);
+
+ hbitmap_iter_init(&hbi, data->hb, (L2 + L1 + 2) << 7);
+ g_assert_cmpint(hbitmap_iter_next(&hbi), ==, 131071 << 7);
+ g_assert_cmpint(hbitmap_iter_next(&hbi), <, 0);
+}
+
+static void hbitmap_test_add(const char *testpath,
+ void (*test_func)(TestHBitmapData *data, const void *user_data))
+{
+ g_test_add(testpath, TestHBitmapData, NULL, NULL, test_func,
+ hbitmap_test_teardown);
+}
+
+int main(int argc, char **argv)
+{
+ g_test_init(&argc, &argv, NULL);
+ hbitmap_test_add("/hbitmap/size/0", test_hbitmap_zero);
+ hbitmap_test_add("/hbitmap/size/unaligned", test_hbitmap_unaligned);
+ hbitmap_test_add("/hbitmap/iter/empty", test_hbitmap_iter_empty);
+ hbitmap_test_add("/hbitmap/iter/past", test_hbitmap_iter_past);
+ hbitmap_test_add("/hbitmap/iter/partial", test_hbitmap_iter_partial);
+ hbitmap_test_add("/hbitmap/iter/granularity", test_hbitmap_iter_granularity);
+ hbitmap_test_add("/hbitmap/get/all", test_hbitmap_get_all);
+ hbitmap_test_add("/hbitmap/get/some", test_hbitmap_get_some);
+ hbitmap_test_add("/hbitmap/set/all", test_hbitmap_set_all);
+ hbitmap_test_add("/hbitmap/set/one", test_hbitmap_set_one);
+ hbitmap_test_add("/hbitmap/set/two-elem", test_hbitmap_set_two_elem);
+ hbitmap_test_add("/hbitmap/set/general", test_hbitmap_set);
+ hbitmap_test_add("/hbitmap/set/twice", test_hbitmap_set_twice);
+ hbitmap_test_add("/hbitmap/set/overlap", test_hbitmap_set_overlap);
+ hbitmap_test_add("/hbitmap/reset/empty", test_hbitmap_reset_empty);
+ hbitmap_test_add("/hbitmap/reset/general", test_hbitmap_reset);
+ hbitmap_test_add("/hbitmap/granularity", test_hbitmap_granularity);
+ g_test_run();
+
+ return 0;
+}
diff --git a/trace-events b/trace-events
index 6c6cbf1..93e8901 100644
--- a/trace-events
+++ b/trace-events
@@ -1022,3 +1022,8 @@ spapr_pci_rtas_ibm_change_msi(unsigned func, unsigned req) "func %u, requested %
spapr_pci_rtas_ibm_query_interrupt_source_number(unsigned ioa, unsigned intr) "queries for #%u, IRQ%u"
spapr_pci_msi_write(uint64_t addr, uint64_t data, uint32_t dt_irq) "@%"PRIx64"<=%"PRIx64" IRQ %u"
spapr_pci_lsi_set(const char *busname, int pin, uint32_t irq) "%s PIN%d IRQ %u"
+
+# hbitmap.c
+hbitmap_iter_skip_words(const void *hb, void *hbi, uint64_t pos, unsigned long cur) "hb %p hbi %p pos %"PRId64" cur 0x%lx"
+hbitmap_reset(void *hb, uint64_t start, uint64_t count, uint64_t sbit, uint64_t ebit) "hb %p items %"PRIu64",%"PRIu64" bits %"PRIu64"..%"PRIu64
+hbitmap_set(void *hb, uint64_t start, uint64_t count, uint64_t sbit, uint64_t ebit) "hb %p items %"PRIu64",%"PRIu64" bits %"PRIu64"..%"PRIu64
--
1.8.0.1
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [Qemu-devel] [PATCH 03/20] block: implement dirty bitmap using HBitmap
2012-12-12 13:46 [Qemu-devel] [PATCH 00/20] Block device mirroring enhancements, 12-12-12 edition Paolo Bonzini
2012-12-12 13:46 ` [Qemu-devel] [PATCH 01/20] host-utils: add ffsl Paolo Bonzini
2012-12-12 13:46 ` [Qemu-devel] [PATCH 02/20] add hierarchical bitmap data type and test cases Paolo Bonzini
@ 2012-12-12 13:46 ` Paolo Bonzini
2012-12-14 0:27 ` Eric Blake
2012-12-12 13:46 ` [Qemu-devel] [PATCH 04/20] block: make round_to_clusters public Paolo Bonzini
` (17 subsequent siblings)
20 siblings, 1 reply; 53+ messages in thread
From: Paolo Bonzini @ 2012-12-12 13:46 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, jcody, stefanha
This actually uses the dirty bitmap in the block layer, and converts
mirroring to use an HBitmapIter.
Reviewed-by: Laszlo Ersek <lersek@redhat.com> (except block/mirror.c parts)
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
Makefile.objs | 2 +-
block.c | 94 ++++++++++------------------------------------------------
block.h | 6 ++--
block/mirror.c | 12 ++++++--
block_int.h | 4 +--
trace-events | 1 +
6 files changed, 33 insertions(+), 86 deletions(-)
diff --git a/Makefile.objs b/Makefile.objs
index 3c7abca..9dc96c3 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -46,7 +46,7 @@ coroutine-obj-$(CONFIG_WIN32) += coroutine-win32.o
# block-obj-y is code used by both qemu system emulation and qemu-img
block-obj-y = iov.o cache-utils.o qemu-option.o module.o async.o
-block-obj-y += nbd.o block.o blockjob.o aes.o qemu-config.o
+block-obj-y += nbd.o block.o blockjob.o aes.o qemu-config.o hbitmap.o
block-obj-y += thread-pool.o qemu-progress.o qemu-sockets.o uri.o notify.o
block-obj-y += $(coroutine-obj-y) $(qobject-obj-y) $(version-obj-y)
block-obj-$(CONFIG_POSIX) += event_notifier-posix.o aio-posix.o
diff --git a/block.c b/block.c
index c05875f..f033c1e 100644
--- a/block.c
+++ b/block.c
@@ -1273,7 +1273,6 @@ static void bdrv_move_feature_fields(BlockDriverState *bs_dest,
bs_dest->iostatus = bs_src->iostatus;
/* dirty bitmap */
- bs_dest->dirty_count = bs_src->dirty_count;
bs_dest->dirty_bitmap = bs_src->dirty_bitmap;
/* job */
@@ -2022,36 +2021,6 @@ int bdrv_read_unthrottled(BlockDriverState *bs, int64_t sector_num,
return ret;
}
-#define BITS_PER_LONG (sizeof(unsigned long) * 8)
-
-static void set_dirty_bitmap(BlockDriverState *bs, int64_t sector_num,
- int nb_sectors, int dirty)
-{
- int64_t start, end;
- unsigned long val, idx, bit;
-
- start = sector_num / BDRV_SECTORS_PER_DIRTY_CHUNK;
- end = (sector_num + nb_sectors - 1) / BDRV_SECTORS_PER_DIRTY_CHUNK;
-
- for (; start <= end; start++) {
- idx = start / BITS_PER_LONG;
- bit = start % BITS_PER_LONG;
- val = bs->dirty_bitmap[idx];
- if (dirty) {
- if (!(val & (1UL << bit))) {
- bs->dirty_count++;
- val |= 1UL << bit;
- }
- } else {
- if (val & (1UL << bit)) {
- bs->dirty_count--;
- val &= ~(1UL << bit);
- }
- }
- bs->dirty_bitmap[idx] = val;
- }
-}
-
/* Return < 0 if error. Important errors are:
-EIO generic I/O error (may happen for all errors)
-ENOMEDIUM No media inserted.
@@ -4254,18 +4223,15 @@ void bdrv_set_dirty_tracking(BlockDriverState *bs, int enable)
{
int64_t bitmap_size;
- bs->dirty_count = 0;
if (enable) {
if (!bs->dirty_bitmap) {
- bitmap_size = (bdrv_getlength(bs) >> BDRV_SECTOR_BITS) +
- BDRV_SECTORS_PER_DIRTY_CHUNK * BITS_PER_LONG - 1;
- bitmap_size /= BDRV_SECTORS_PER_DIRTY_CHUNK * BITS_PER_LONG;
-
- bs->dirty_bitmap = g_new0(unsigned long, bitmap_size);
+ bitmap_size = (bdrv_getlength(bs) >> BDRV_SECTOR_BITS);
+ bs->dirty_bitmap = hbitmap_alloc(bitmap_size,
+ BDRV_LOG_SECTORS_PER_DIRTY_CHUNK);
}
} else {
if (bs->dirty_bitmap) {
- g_free(bs->dirty_bitmap);
+ hbitmap_free(bs->dirty_bitmap);
bs->dirty_bitmap = NULL;
}
}
@@ -4273,67 +4239,37 @@ void bdrv_set_dirty_tracking(BlockDriverState *bs, int enable)
int bdrv_get_dirty(BlockDriverState *bs, int64_t sector)
{
- int64_t chunk = sector / (int64_t)BDRV_SECTORS_PER_DIRTY_CHUNK;
-
- if (bs->dirty_bitmap &&
- (sector << BDRV_SECTOR_BITS) < bdrv_getlength(bs)) {
- return !!(bs->dirty_bitmap[chunk / BITS_PER_LONG] &
- (1UL << (chunk % BITS_PER_LONG)));
+ if (bs->dirty_bitmap) {
+ return hbitmap_get(bs->dirty_bitmap, sector);
} else {
return 0;
}
}
-int64_t bdrv_get_next_dirty(BlockDriverState *bs, int64_t sector)
+void bdrv_dirty_iter_init(BlockDriverState *bs, HBitmapIter *hbi)
{
- int64_t chunk;
- int bit, elem;
-
- /* Avoid an infinite loop. */
- assert(bs->dirty_count > 0);
-
- sector = (sector | (BDRV_SECTORS_PER_DIRTY_CHUNK - 1)) + 1;
- chunk = sector / (int64_t)BDRV_SECTORS_PER_DIRTY_CHUNK;
-
- QEMU_BUILD_BUG_ON(sizeof(bs->dirty_bitmap[0]) * 8 != BITS_PER_LONG);
- elem = chunk / BITS_PER_LONG;
- bit = chunk % BITS_PER_LONG;
- for (;;) {
- if (sector >= bs->total_sectors) {
- sector = 0;
- bit = elem = 0;
- }
- if (bit == 0 && bs->dirty_bitmap[elem] == 0) {
- sector += BDRV_SECTORS_PER_DIRTY_CHUNK * BITS_PER_LONG;
- elem++;
- } else {
- if (bs->dirty_bitmap[elem] & (1UL << bit)) {
- return sector;
- }
- sector += BDRV_SECTORS_PER_DIRTY_CHUNK;
- if (++bit == BITS_PER_LONG) {
- bit = 0;
- elem++;
- }
- }
- }
+ hbitmap_iter_init(hbi, bs->dirty_bitmap, 0);
}
void bdrv_set_dirty(BlockDriverState *bs, int64_t cur_sector,
int nr_sectors)
{
- set_dirty_bitmap(bs, cur_sector, nr_sectors, 1);
+ hbitmap_set(bs->dirty_bitmap, cur_sector, nr_sectors);
}
void bdrv_reset_dirty(BlockDriverState *bs, int64_t cur_sector,
int nr_sectors)
{
- set_dirty_bitmap(bs, cur_sector, nr_sectors, 0);
+ hbitmap_reset(bs->dirty_bitmap, cur_sector, nr_sectors);
}
int64_t bdrv_get_dirty_count(BlockDriverState *bs)
{
- return bs->dirty_count;
+ if (bs->dirty_bitmap) {
+ return hbitmap_count(bs->dirty_bitmap) >> BDRV_LOG_SECTORS_PER_DIRTY_CHUNK;
+ } else {
+ return 0;
+ }
}
void bdrv_set_in_use(BlockDriverState *bs, int in_use)
diff --git a/block.h b/block.h
index 722c620..6a84350 100644
--- a/block.h
+++ b/block.h
@@ -350,13 +350,15 @@ int bdrv_img_create(const char *filename, const char *fmt,
void bdrv_set_buffer_alignment(BlockDriverState *bs, int align);
void *qemu_blockalign(BlockDriverState *bs, size_t size);
-#define BDRV_SECTORS_PER_DIRTY_CHUNK 2048
+#define BDRV_SECTORS_PER_DIRTY_CHUNK (1 << BDRV_LOG_SECTORS_PER_DIRTY_CHUNK)
+#define BDRV_LOG_SECTORS_PER_DIRTY_CHUNK 11
+struct HBitmapIter;
void bdrv_set_dirty_tracking(BlockDriverState *bs, int enable);
int bdrv_get_dirty(BlockDriverState *bs, int64_t sector);
void bdrv_set_dirty(BlockDriverState *bs, int64_t cur_sector, int nr_sectors);
void bdrv_reset_dirty(BlockDriverState *bs, int64_t cur_sector, int nr_sectors);
-int64_t bdrv_get_next_dirty(BlockDriverState *bs, int64_t sector);
+void bdrv_dirty_iter_init(BlockDriverState *bs, struct HBitmapIter *hbi);
int64_t bdrv_get_dirty_count(BlockDriverState *bs);
void bdrv_enable_copy_on_read(BlockDriverState *bs);
diff --git a/block/mirror.c b/block/mirror.c
index d6618a4..30bb267 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -36,6 +36,7 @@ typedef struct MirrorBlockJob {
bool synced;
bool should_complete;
int64_t sector_num;
+ HBitmapIter hbi;
uint8_t *buf;
} MirrorBlockJob;
@@ -62,8 +63,15 @@ static int coroutine_fn mirror_iteration(MirrorBlockJob *s,
int64_t end;
struct iovec iov;
+ s->sector_num = hbitmap_iter_next(&s->hbi);
+ if (s->sector_num < 0) {
+ bdrv_dirty_iter_init(source, &s->hbi);
+ s->sector_num = hbitmap_iter_next(&s->hbi);
+ trace_mirror_restart_iter(s, bdrv_get_dirty_count(source));
+ assert(s->sector_num >= 0);
+ }
+
end = s->common.len >> BDRV_SECTOR_BITS;
- s->sector_num = bdrv_get_next_dirty(source, s->sector_num);
nb_sectors = MIN(BDRV_SECTORS_PER_DIRTY_CHUNK, end - s->sector_num);
bdrv_reset_dirty(source, s->sector_num, nb_sectors);
@@ -136,7 +144,7 @@ static void coroutine_fn mirror_run(void *opaque)
}
}
- s->sector_num = -1;
+ bdrv_dirty_iter_init(bs, &s->hbi);
for (;;) {
uint64_t delay_ns;
int64_t cnt;
diff --git a/block_int.h b/block_int.h
index 9deedb8..364ab2d 100644
--- a/block_int.h
+++ b/block_int.h
@@ -32,6 +32,7 @@
#include "qapi-types.h"
#include "qerror.h"
#include "monitor.h"
+#include "hbitmap.h"
#define BLOCK_FLAG_ENCRYPT 1
#define BLOCK_FLAG_COMPAT6 4
@@ -269,8 +270,7 @@ struct BlockDriverState {
bool iostatus_enabled;
BlockDeviceIoStatus iostatus;
char device_name[32];
- unsigned long *dirty_bitmap;
- int64_t dirty_count;
+ HBitmap *dirty_bitmap;
int in_use; /* users other than guest access, eg. block migration */
QTAILQ_ENTRY(BlockDriverState) list;
diff --git a/trace-events b/trace-events
index 93e8901..b868ac6 100644
--- a/trace-events
+++ b/trace-events
@@ -79,6 +79,7 @@ commit_start(void *bs, void *base, void *top, void *s, void *co, void *opaque) "
# block/mirror.c
mirror_start(void *bs, void *s, void *co, void *opaque) "bs %p s %p co %p opaque %p"
+mirror_restart_iter(void *s, int64_t cnt) "s %p dirty count %"PRId64
mirror_before_flush(void *s) "s %p"
mirror_before_drain(void *s, int64_t cnt) "s %p dirty count %"PRId64
mirror_before_sleep(void *s, int64_t cnt, int synced) "s %p dirty count %"PRId64" synced %d"
--
1.8.0.1
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [Qemu-devel] [PATCH 04/20] block: make round_to_clusters public
2012-12-12 13:46 [Qemu-devel] [PATCH 00/20] Block device mirroring enhancements, 12-12-12 edition Paolo Bonzini
` (2 preceding siblings ...)
2012-12-12 13:46 ` [Qemu-devel] [PATCH 03/20] block: implement dirty bitmap using HBitmap Paolo Bonzini
@ 2012-12-12 13:46 ` Paolo Bonzini
2012-12-14 20:13 ` Eric Blake
2012-12-12 13:46 ` [Qemu-devel] [PATCH 05/20] mirror: perform COW if the cluster size is bigger than the granularity Paolo Bonzini
` (16 subsequent siblings)
20 siblings, 1 reply; 53+ messages in thread
From: Paolo Bonzini @ 2012-12-12 13:46 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, jcody, stefanha
This is needed in the following patch.
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
block.c | 16 ++++++++--------
block.h | 4 ++++
2 files changed, 12 insertions(+), 8 deletions(-)
diff --git a/block.c b/block.c
index f033c1e..d93c386 100644
--- a/block.c
+++ b/block.c
@@ -1660,10 +1660,10 @@ static void tracked_request_begin(BdrvTrackedRequest *req,
/**
* Round a region to cluster boundaries
*/
-static void round_to_clusters(BlockDriverState *bs,
- int64_t sector_num, int nb_sectors,
- int64_t *cluster_sector_num,
- int *cluster_nb_sectors)
+void bdrv_round_to_clusters(BlockDriverState *bs,
+ int64_t sector_num, int nb_sectors,
+ int64_t *cluster_sector_num,
+ int *cluster_nb_sectors)
{
BlockDriverInfo bdi;
@@ -1705,8 +1705,8 @@ static void coroutine_fn wait_for_overlapping_requests(BlockDriverState *bs,
* CoR read and write operations are atomic and guest writes cannot
* interleave between them.
*/
- round_to_clusters(bs, sector_num, nb_sectors,
- &cluster_sector_num, &cluster_nb_sectors);
+ bdrv_round_to_clusters(bs, sector_num, nb_sectors,
+ &cluster_sector_num, &cluster_nb_sectors);
do {
retry = false;
@@ -2172,8 +2172,8 @@ static int coroutine_fn bdrv_co_do_copy_on_readv(BlockDriverState *bs,
/* Cover entire cluster so no additional backing file I/O is required when
* allocating cluster in the image file.
*/
- round_to_clusters(bs, sector_num, nb_sectors,
- &cluster_sector_num, &cluster_nb_sectors);
+ bdrv_round_to_clusters(bs, sector_num, nb_sectors,
+ &cluster_sector_num, &cluster_nb_sectors);
trace_bdrv_co_do_copy_on_readv(bs, sector_num, nb_sectors,
cluster_sector_num, cluster_nb_sectors);
diff --git a/block.h b/block.h
index 6a84350..1418915 100644
--- a/block.h
+++ b/block.h
@@ -309,6 +309,10 @@ int bdrv_get_flags(BlockDriverState *bs);
int bdrv_write_compressed(BlockDriverState *bs, int64_t sector_num,
const uint8_t *buf, int nb_sectors);
int bdrv_get_info(BlockDriverState *bs, BlockDriverInfo *bdi);
+void bdrv_round_to_clusters(BlockDriverState *bs,
+ int64_t sector_num, int nb_sectors,
+ int64_t *cluster_sector_num,
+ int *cluster_nb_sectors);
const char *bdrv_get_encrypted_filename(BlockDriverState *bs);
void bdrv_get_backing_filename(BlockDriverState *bs,
--
1.8.0.1
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [Qemu-devel] [PATCH 05/20] mirror: perform COW if the cluster size is bigger than the granularity
2012-12-12 13:46 [Qemu-devel] [PATCH 00/20] Block device mirroring enhancements, 12-12-12 edition Paolo Bonzini
` (3 preceding siblings ...)
2012-12-12 13:46 ` [Qemu-devel] [PATCH 04/20] block: make round_to_clusters public Paolo Bonzini
@ 2012-12-12 13:46 ` Paolo Bonzini
2012-12-14 20:21 ` Eric Blake
2012-12-12 13:46 ` [Qemu-devel] [PATCH 06/20] block: return count of dirty sectors, not chunks Paolo Bonzini
` (15 subsequent siblings)
20 siblings, 1 reply; 53+ messages in thread
From: Paolo Bonzini @ 2012-12-12 13:46 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, jcody, stefanha
When mirroring runs, the backing files for the target may not yet be
ready. However, this means that a copy-on-write operation on the target
would fill the missing sectors with zeros. Copy-on-write only happens
if the granularity of the dirty bitmap is smaller than the cluster size
(and only for clusters that are allocated in the source after the job
has started copying). So far, the granularity was fixed to 1MB; to avoid
the problem we detected the situation and required the backing files to
be available in that case only.
However, we want to lower the granularity for efficiency, so we need
a better solution. The solution is to always copy a whole cluster the
first time it is touched. The code keeps a bitmap of clusters that
have already been allocated by the mirroring job, and only does "manual"
copy-on-write if the chunk being copied is zero in the bitmap.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
block/mirror.c | 60 +++++++++++++++++++++++++++++++++++++++-------
blockdev.c | 15 +++---------
tests/qemu-iotests/041 | 21 ++++++++++++++++
tests/qemu-iotests/041.out | 4 ++--
trace-events | 1 +
5 files changed, 78 insertions(+), 23 deletions(-)
diff --git a/block/mirror.c b/block/mirror.c
index 30bb267..14ce1a7 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -15,6 +15,7 @@
#include "blockjob.h"
#include "block_int.h"
#include "qemu/ratelimit.h"
+#include "bitmap.h"
enum {
/*
@@ -36,6 +37,8 @@ typedef struct MirrorBlockJob {
bool synced;
bool should_complete;
int64_t sector_num;
+ size_t buf_size;
+ unsigned long *cow_bitmap;
HBitmapIter hbi;
uint8_t *buf;
} MirrorBlockJob;
@@ -60,7 +63,7 @@ static int coroutine_fn mirror_iteration(MirrorBlockJob *s,
BlockDriverState *target = s->target;
QEMUIOVector qiov;
int ret, nb_sectors;
- int64_t end;
+ int64_t end, sector_num, cluster_num;
struct iovec iov;
s->sector_num = hbitmap_iter_next(&s->hbi);
@@ -71,22 +74,41 @@ static int coroutine_fn mirror_iteration(MirrorBlockJob *s,
assert(s->sector_num >= 0);
}
+ /* If we have no backing file yet in the destination, and the cluster size
+ * is very large, we need to do COW ourselves. The first time a cluster is
+ * copied, copy it entirely.
+ *
+ * Because both BDRV_SECTORS_PER_DIRTY_CHUNK and the cluster size are
+ * powers of two, the number of sectors to copy cannot exceed one cluster.
+ */
+ sector_num = s->sector_num;
+ nb_sectors = BDRV_SECTORS_PER_DIRTY_CHUNK;
+ cluster_num = sector_num / BDRV_SECTORS_PER_DIRTY_CHUNK;
+ if (s->cow_bitmap && !test_bit(cluster_num, s->cow_bitmap)) {
+ trace_mirror_cow(s, sector_num);
+ bdrv_round_to_clusters(s->target,
+ sector_num, BDRV_SECTORS_PER_DIRTY_CHUNK,
+ §or_num, &nb_sectors);
+ bitmap_set(s->cow_bitmap, sector_num / BDRV_SECTORS_PER_DIRTY_CHUNK,
+ nb_sectors / BDRV_SECTORS_PER_DIRTY_CHUNK);
+ }
+
end = s->common.len >> BDRV_SECTOR_BITS;
- nb_sectors = MIN(BDRV_SECTORS_PER_DIRTY_CHUNK, end - s->sector_num);
- bdrv_reset_dirty(source, s->sector_num, nb_sectors);
+ nb_sectors = MIN(nb_sectors, end - sector_num);
+ bdrv_reset_dirty(source, sector_num, nb_sectors);
/* Copy the dirty cluster. */
iov.iov_base = s->buf;
iov.iov_len = nb_sectors * 512;
qemu_iovec_init_external(&qiov, &iov, 1);
- trace_mirror_one_iteration(s, s->sector_num, nb_sectors);
- ret = bdrv_co_readv(source, s->sector_num, nb_sectors, &qiov);
+ trace_mirror_one_iteration(s, sector_num, nb_sectors);
+ ret = bdrv_co_readv(source, sector_num, nb_sectors, &qiov);
if (ret < 0) {
*p_action = mirror_error_action(s, true, -ret);
goto fail;
}
- ret = bdrv_co_writev(target, s->sector_num, nb_sectors, &qiov);
+ ret = bdrv_co_writev(target, sector_num, nb_sectors, &qiov);
if (ret < 0) {
*p_action = mirror_error_action(s, false, -ret);
s->synced = false;
@@ -96,7 +118,7 @@ static int coroutine_fn mirror_iteration(MirrorBlockJob *s,
fail:
/* Try again later. */
- bdrv_set_dirty(source, s->sector_num, nb_sectors);
+ bdrv_set_dirty(source, sector_num, nb_sectors);
return ret;
}
@@ -104,7 +126,9 @@ static void coroutine_fn mirror_run(void *opaque)
{
MirrorBlockJob *s = opaque;
BlockDriverState *bs = s->common.bs;
- int64_t sector_num, end;
+ int64_t sector_num, end, length;
+ BlockDriverInfo bdi;
+ char backing_filename[1024];
int ret = 0;
int n;
@@ -118,8 +142,23 @@ static void coroutine_fn mirror_run(void *opaque)
return;
}
+ /* If we have no backing file yet in the destination, we cannot let
+ * the destination do COW. Instead, we copy sectors around the
+ * dirty data if needed. We need a bitmap to do that.
+ */
+ bdrv_get_backing_filename(s->target, backing_filename,
+ sizeof(backing_filename));
+ if (backing_filename[0] && !s->target->backing_hd) {
+ bdrv_get_info(s->target, &bdi);
+ if (s->buf_size < bdi.cluster_size) {
+ s->buf_size = bdi.cluster_size;
+ length = (bdrv_getlength(bs) + BLOCK_SIZE - 1) / BLOCK_SIZE;
+ s->cow_bitmap = bitmap_new(length);
+ }
+ }
+
end = s->common.len >> BDRV_SECTOR_BITS;
- s->buf = qemu_blockalign(bs, BLOCK_SIZE);
+ s->buf = qemu_blockalign(bs, s->buf_size);
if (s->mode != MIRROR_SYNC_MODE_NONE) {
/* First part, loop on the sectors and initialize the dirty bitmap. */
@@ -234,6 +273,7 @@ static void coroutine_fn mirror_run(void *opaque)
immediate_exit:
g_free(s->buf);
+ g_free(s->cow_bitmap);
bdrv_set_dirty_tracking(bs, false);
bdrv_iostatus_disable(s->target);
if (s->should_complete && ret == 0) {
@@ -320,6 +360,8 @@ void mirror_start(BlockDriverState *bs, BlockDriverState *target,
s->on_target_error = on_target_error;
s->target = target;
s->mode = mode;
+ s->buf_size = BLOCK_SIZE;
+
bdrv_set_dirty_tracking(bs, true);
bdrv_set_enable_write_cache(s->target, true);
bdrv_set_on_error(s->target, on_target_error, on_target_error);
diff --git a/blockdev.c b/blockdev.c
index e73fd6e..f309dba 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1196,7 +1196,6 @@ void qmp_drive_mirror(const char *device, const char *target,
bool has_on_target_error, BlockdevOnError on_target_error,
Error **errp)
{
- BlockDriverInfo bdi;
BlockDriverState *bs;
BlockDriverState *source, *target_bs;
BlockDriver *proto_drv;
@@ -1287,6 +1286,9 @@ void qmp_drive_mirror(const char *device, const char *target,
return;
}
+ /* Mirroring takes care of copy-on-write using the source's backing
+ * file.
+ */
target_bs = bdrv_new("");
ret = bdrv_open(target_bs, target, flags | BDRV_O_NO_BACKING, drv);
@@ -1296,17 +1298,6 @@ void qmp_drive_mirror(const char *device, const char *target,
return;
}
- /* We need a backing file if we will copy parts of a cluster. */
- if (bdrv_get_info(target_bs, &bdi) >= 0 && bdi.cluster_size != 0 &&
- bdi.cluster_size >= BDRV_SECTORS_PER_DIRTY_CHUNK * 512) {
- ret = bdrv_open_backing_file(target_bs);
- if (ret < 0) {
- bdrv_delete(target_bs);
- error_set(errp, QERR_OPEN_FILE_FAILED, target);
- return;
- }
- }
-
mirror_start(bs, target_bs, speed, sync, on_source_error, on_target_error,
block_job_cb, bs, &local_err);
if (local_err != NULL) {
diff --git a/tests/qemu-iotests/041 b/tests/qemu-iotests/041
index c6eb851..d0e9a7f 100755
--- a/tests/qemu-iotests/041
+++ b/tests/qemu-iotests/041
@@ -292,6 +292,27 @@ class TestMirrorNoBacking(ImageMirroringTestCase):
self.assertTrue(self.compare_images(test_img, target_img),
'target image does not match source after mirroring')
+ def test_large_cluster(self):
+ self.assert_no_active_mirrors()
+
+ # qemu-img create fails if the image is not there
+ qemu_img('create', '-f', iotests.imgfmt, '-o', 'size=%d'
+ %(TestMirrorNoBacking.image_len), target_backing_img)
+ qemu_img('create', '-f', iotests.imgfmt, '-o', 'cluster_size=%d,backing_file=%s'
+ % (TestMirrorNoBacking.image_len, target_backing_img), target_img)
+ os.remove(target_backing_img)
+
+ result = self.vm.qmp('drive-mirror', device='drive0', sync='full',
+ mode='existing', target=target_img)
+ self.assert_qmp(result, 'return', {})
+
+ self.complete_and_wait()
+ result = self.vm.qmp('query-block')
+ self.assert_qmp(result, 'return[0]/inserted/file', target_img)
+ self.vm.shutdown()
+ self.assertTrue(self.compare_images(test_img, target_img),
+ 'target image does not match source after mirroring')
+
class TestReadErrors(ImageMirroringTestCase):
image_len = 2 * 1024 * 1024 # MB
diff --git a/tests/qemu-iotests/041.out b/tests/qemu-iotests/041.out
index 71009c2..4176bb9 100644
--- a/tests/qemu-iotests/041.out
+++ b/tests/qemu-iotests/041.out
@@ -1,5 +1,5 @@
-..................
+...................
----------------------------------------------------------------------
-Ran 18 tests
+Ran 19 tests
OK
diff --git a/trace-events b/trace-events
index b868ac6..b318b4e 100644
--- a/trace-events
+++ b/trace-events
@@ -84,6 +84,7 @@ mirror_before_flush(void *s) "s %p"
mirror_before_drain(void *s, int64_t cnt) "s %p dirty count %"PRId64
mirror_before_sleep(void *s, int64_t cnt, int synced) "s %p dirty count %"PRId64" synced %d"
mirror_one_iteration(void *s, int64_t sector_num, int nb_sectors) "s %p sector_num %"PRId64" nb_sectors %d"
+mirror_cow(void *s, int64_t sector_num) "s %p sector_num %"PRId64
# blockdev.c
qmp_block_job_cancel(void *job) "job %p"
--
1.8.0.1
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [Qemu-devel] [PATCH 06/20] block: return count of dirty sectors, not chunks
2012-12-12 13:46 [Qemu-devel] [PATCH 00/20] Block device mirroring enhancements, 12-12-12 edition Paolo Bonzini
` (4 preceding siblings ...)
2012-12-12 13:46 ` [Qemu-devel] [PATCH 05/20] mirror: perform COW if the cluster size is bigger than the granularity Paolo Bonzini
@ 2012-12-12 13:46 ` Paolo Bonzini
2012-12-14 20:49 ` Eric Blake
2012-12-12 13:46 ` [Qemu-devel] [PATCH 07/20] block: allow customizing the granularity of the dirty bitmap Paolo Bonzini
` (14 subsequent siblings)
20 siblings, 1 reply; 53+ messages in thread
From: Paolo Bonzini @ 2012-12-12 13:46 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, jcody, stefanha
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
block-migration.c | 2 +-
block.c | 5 ++---
block/mirror.c | 2 +-
3 files changed, 4 insertions(+), 5 deletions(-)
diff --git a/block-migration.c b/block-migration.c
index 71b9601..06967d8 100644
--- a/block-migration.c
+++ b/block-migration.c
@@ -490,7 +490,7 @@ static int64_t get_remaining_dirty(void)
dirty += bdrv_get_dirty_count(bmds->bs);
}
- return dirty * BLOCK_SIZE;
+ return dirty << BDRV_SECTOR_BITS;
}
static int is_stage2_completed(void)
diff --git a/block.c b/block.c
index d93c386..ece2ad5 100644
--- a/block.c
+++ b/block.c
@@ -2819,8 +2819,7 @@ BlockInfo *bdrv_query_info(BlockDriverState *bs)
if (bs->dirty_bitmap) {
info->has_dirty = true;
info->dirty = g_malloc0(sizeof(*info->dirty));
- info->dirty->count = bdrv_get_dirty_count(bs) *
- BDRV_SECTORS_PER_DIRTY_CHUNK * BDRV_SECTOR_SIZE;
+ info->dirty->count = bdrv_get_dirty_count(bs) * BDRV_SECTOR_SIZE;
}
if (bs->drv) {
@@ -4266,7 +4265,7 @@ void bdrv_reset_dirty(BlockDriverState *bs, int64_t cur_sector,
int64_t bdrv_get_dirty_count(BlockDriverState *bs)
{
if (bs->dirty_bitmap) {
- return hbitmap_count(bs->dirty_bitmap) >> BDRV_LOG_SECTORS_PER_DIRTY_CHUNK;
+ return hbitmap_count(bs->dirty_bitmap);
} else {
return 0;
}
diff --git a/block/mirror.c b/block/mirror.c
index 14ce1a7..48c37a7 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -243,7 +243,7 @@ static void coroutine_fn mirror_run(void *opaque)
trace_mirror_before_sleep(s, cnt, s->synced);
if (!s->synced) {
/* Publish progress */
- s->common.offset = end * BDRV_SECTOR_SIZE - cnt * BLOCK_SIZE;
+ s->common.offset = (end - cnt) * BDRV_SECTOR_SIZE;
if (s->common.speed) {
delay_ns = ratelimit_calculate_delay(&s->limit, BDRV_SECTORS_PER_DIRTY_CHUNK);
--
1.8.0.1
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [Qemu-devel] [PATCH 07/20] block: allow customizing the granularity of the dirty bitmap
2012-12-12 13:46 [Qemu-devel] [PATCH 00/20] Block device mirroring enhancements, 12-12-12 edition Paolo Bonzini
` (5 preceding siblings ...)
2012-12-12 13:46 ` [Qemu-devel] [PATCH 06/20] block: return count of dirty sectors, not chunks Paolo Bonzini
@ 2012-12-12 13:46 ` Paolo Bonzini
2012-12-14 21:27 ` Eric Blake
2012-12-12 13:46 ` [Qemu-devel] [PATCH 08/20] mirror: allow customizing the granularity Paolo Bonzini
` (13 subsequent siblings)
20 siblings, 1 reply; 53+ messages in thread
From: Paolo Bonzini @ 2012-12-12 13:46 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, jcody, stefanha
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
block-migration.c | 5 +++--
block.c | 17 ++++++++++-------
block.h | 5 +----
block/mirror.c | 14 ++++----------
qapi-schema.json | 4 +++-
5 files changed, 21 insertions(+), 24 deletions(-)
diff --git a/block-migration.c b/block-migration.c
index 06967d8..1e1d25e 100644
--- a/block-migration.c
+++ b/block-migration.c
@@ -23,7 +23,8 @@
#include "blockdev.h"
#include <assert.h>
-#define BLOCK_SIZE (BDRV_SECTORS_PER_DIRTY_CHUNK << BDRV_SECTOR_BITS)
+#define BLOCK_SIZE (1 << 20)
+#define BDRV_SECTORS_PER_DIRTY_CHUNK (BLOCK_SIZE >> BDRV_SECTOR_BITS)
#define BLK_MIG_FLAG_DEVICE_BLOCK 0x01
#define BLK_MIG_FLAG_EOS 0x02
@@ -264,7 +265,7 @@ static void set_dirty_tracking(int enable)
BlkMigDevState *bmds;
QSIMPLEQ_FOREACH(bmds, &block_mig_state.bmds_list, entry) {
- bdrv_set_dirty_tracking(bmds->bs, enable);
+ bdrv_set_dirty_tracking(bmds->bs, enable ? BLOCK_SIZE : 0);
}
}
diff --git a/block.c b/block.c
index ece2ad5..09f9ca9 100644
--- a/block.c
+++ b/block.c
@@ -2820,6 +2820,8 @@ BlockInfo *bdrv_query_info(BlockDriverState *bs)
info->has_dirty = true;
info->dirty = g_malloc0(sizeof(*info->dirty));
info->dirty->count = bdrv_get_dirty_count(bs) * BDRV_SECTOR_SIZE;
+ info->dirty->granularity =
+ ((int64_t) BDRV_SECTOR_SIZE << hbitmap_granularity(bs->dirty_bitmap));
}
if (bs->drv) {
@@ -4218,16 +4220,17 @@ void *qemu_blockalign(BlockDriverState *bs, size_t size)
return qemu_memalign((bs && bs->buffer_alignment) ? bs->buffer_alignment : 512, size);
}
-void bdrv_set_dirty_tracking(BlockDriverState *bs, int enable)
+void bdrv_set_dirty_tracking(BlockDriverState *bs, int granularity)
{
int64_t bitmap_size;
- if (enable) {
- if (!bs->dirty_bitmap) {
- bitmap_size = (bdrv_getlength(bs) >> BDRV_SECTOR_BITS);
- bs->dirty_bitmap = hbitmap_alloc(bitmap_size,
- BDRV_LOG_SECTORS_PER_DIRTY_CHUNK);
- }
+ assert((granularity & (granularity - 1)) == 0);
+
+ if (granularity) {
+ granularity >>= BDRV_SECTOR_BITS;
+ assert(!bs->dirty_bitmap);
+ bitmap_size = (bdrv_getlength(bs) >> BDRV_SECTOR_BITS);
+ bs->dirty_bitmap = hbitmap_alloc(bitmap_size, ffs(granularity) - 1);
} else {
if (bs->dirty_bitmap) {
hbitmap_free(bs->dirty_bitmap);
diff --git a/block.h b/block.h
index 1418915..b57896a 100644
--- a/block.h
+++ b/block.h
@@ -354,11 +354,8 @@ int bdrv_img_create(const char *filename, const char *fmt,
void bdrv_set_buffer_alignment(BlockDriverState *bs, int align);
void *qemu_blockalign(BlockDriverState *bs, size_t size);
-#define BDRV_SECTORS_PER_DIRTY_CHUNK (1 << BDRV_LOG_SECTORS_PER_DIRTY_CHUNK)
-#define BDRV_LOG_SECTORS_PER_DIRTY_CHUNK 11
-
struct HBitmapIter;
-void bdrv_set_dirty_tracking(BlockDriverState *bs, int enable);
+void bdrv_set_dirty_tracking(BlockDriverState *bs, int granularity);
int bdrv_get_dirty(BlockDriverState *bs, int64_t sector);
void bdrv_set_dirty(BlockDriverState *bs, int64_t cur_sector, int nr_sectors);
void bdrv_reset_dirty(BlockDriverState *bs, int64_t cur_sector, int nr_sectors);
diff --git a/block/mirror.c b/block/mirror.c
index 48c37a7..1b0478b 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -17,14 +17,8 @@
#include "qemu/ratelimit.h"
#include "bitmap.h"
-enum {
- /*
- * Size of data buffer for populating the image file. This should be large
- * enough to process multiple clusters in a single call, so that populating
- * contiguous regions of the image is efficient.
- */
- BLOCK_SIZE = 512 * BDRV_SECTORS_PER_DIRTY_CHUNK, /* in bytes */
-};
+#define BLOCK_SIZE (1 << 20)
+#define BDRV_SECTORS_PER_DIRTY_CHUNK (BLOCK_SIZE >> BDRV_SECTOR_BITS)
#define SLICE_TIME 100000000ULL /* ns */
@@ -274,7 +268,7 @@ static void coroutine_fn mirror_run(void *opaque)
immediate_exit:
g_free(s->buf);
g_free(s->cow_bitmap);
- bdrv_set_dirty_tracking(bs, false);
+ bdrv_set_dirty_tracking(bs, 0);
bdrv_iostatus_disable(s->target);
if (s->should_complete && ret == 0) {
if (bdrv_get_flags(s->target) != bdrv_get_flags(s->common.bs)) {
@@ -362,7 +356,7 @@ void mirror_start(BlockDriverState *bs, BlockDriverState *target,
s->mode = mode;
s->buf_size = BLOCK_SIZE;
- bdrv_set_dirty_tracking(bs, true);
+ bdrv_set_dirty_tracking(bs, BLOCK_SIZE);
bdrv_set_enable_write_cache(s->target, true);
bdrv_set_on_error(s->target, on_target_error, on_target_error);
bdrv_iostatus_enable(s->target);
diff --git a/qapi-schema.json b/qapi-schema.json
index 5dfa052..75aacdd 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -667,10 +667,12 @@
#
# @count: number of dirty bytes according to the dirty bitmap
#
+# @granularity: granularity of the dirty bitmap in bytes
+#
# Since: 1.3
##
{ 'type': 'BlockDirtyInfo',
- 'data': {'count': 'int'} }
+ 'data': {'count': 'int', 'granularity': 'int'} }
##
# @BlockInfo:
--
1.8.0.1
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [Qemu-devel] [PATCH 08/20] mirror: allow customizing the granularity
2012-12-12 13:46 [Qemu-devel] [PATCH 00/20] Block device mirroring enhancements, 12-12-12 edition Paolo Bonzini
` (6 preceding siblings ...)
2012-12-12 13:46 ` [Qemu-devel] [PATCH 07/20] block: allow customizing the granularity of the dirty bitmap Paolo Bonzini
@ 2012-12-12 13:46 ` Paolo Bonzini
2012-12-14 22:01 ` Eric Blake
2013-01-14 11:28 ` Stefan Hajnoczi
2012-12-12 13:46 ` [Qemu-devel] [PATCH 09/20] mirror: switch mirror_iteration to AIO Paolo Bonzini
` (12 subsequent siblings)
20 siblings, 2 replies; 53+ messages in thread
From: Paolo Bonzini @ 2012-12-12 13:46 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, jcody, stefanha
The desired granularity may be very different depending on the kind of
operation (e.g. continuous replication vs. collapse-to-raw) and whether
the VM is expected to perform lots of I/O while mirroring is in progress.
Allow the user to customize it, while providing a sane default so that
in general there will be no extra allocated space in the target compared
to the source.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
block/mirror.c | 50 ++++++++++++++++++++++++++++++++------------------
block_int.h | 3 ++-
blockdev.c | 15 ++++++++++++++-
hmp.c | 2 +-
qapi-schema.json | 8 +++++++-
qmp-commands.hx | 8 +++++++-
6 files changed, 63 insertions(+), 23 deletions(-)
diff --git a/block/mirror.c b/block/mirror.c
index 1b0478b..93240eb 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -17,9 +17,6 @@
#include "qemu/ratelimit.h"
#include "bitmap.h"
-#define BLOCK_SIZE (1 << 20)
-#define BDRV_SECTORS_PER_DIRTY_CHUNK (BLOCK_SIZE >> BDRV_SECTOR_BITS)
-
#define SLICE_TIME 100000000ULL /* ns */
typedef struct MirrorBlockJob {
@@ -31,6 +28,7 @@ typedef struct MirrorBlockJob {
bool synced;
bool should_complete;
int64_t sector_num;
+ int64_t granularity;
size_t buf_size;
unsigned long *cow_bitmap;
HBitmapIter hbi;
@@ -56,7 +54,7 @@ static int coroutine_fn mirror_iteration(MirrorBlockJob *s,
BlockDriverState *source = s->common.bs;
BlockDriverState *target = s->target;
QEMUIOVector qiov;
- int ret, nb_sectors;
+ int ret, nb_sectors, nb_sectors_chunk;
int64_t end, sector_num, cluster_num;
struct iovec iov;
@@ -72,19 +70,19 @@ static int coroutine_fn mirror_iteration(MirrorBlockJob *s,
* is very large, we need to do COW ourselves. The first time a cluster is
* copied, copy it entirely.
*
- * Because both BDRV_SECTORS_PER_DIRTY_CHUNK and the cluster size are
- * powers of two, the number of sectors to copy cannot exceed one cluster.
+ * Because both the granularity and the cluster size are powers of two, the
+ * number of sectors to copy cannot exceed one cluster.
*/
sector_num = s->sector_num;
- nb_sectors = BDRV_SECTORS_PER_DIRTY_CHUNK;
- cluster_num = sector_num / BDRV_SECTORS_PER_DIRTY_CHUNK;
+ nb_sectors_chunk = nb_sectors = s->granularity >> BDRV_SECTOR_BITS;
+ cluster_num = sector_num / nb_sectors_chunk;
if (s->cow_bitmap && !test_bit(cluster_num, s->cow_bitmap)) {
trace_mirror_cow(s, sector_num);
bdrv_round_to_clusters(s->target,
- sector_num, BDRV_SECTORS_PER_DIRTY_CHUNK,
+ sector_num, nb_sectors_chunk,
§or_num, &nb_sectors);
- bitmap_set(s->cow_bitmap, sector_num / BDRV_SECTORS_PER_DIRTY_CHUNK,
- nb_sectors / BDRV_SECTORS_PER_DIRTY_CHUNK);
+ bitmap_set(s->cow_bitmap, sector_num / nb_sectors_chunk,
+ nb_sectors / nb_sectors_chunk);
}
end = s->common.len >> BDRV_SECTOR_BITS;
@@ -120,7 +118,7 @@ static void coroutine_fn mirror_run(void *opaque)
{
MirrorBlockJob *s = opaque;
BlockDriverState *bs = s->common.bs;
- int64_t sector_num, end, length;
+ int64_t sector_num, end, nb_sectors_chunk, length;
BlockDriverInfo bdi;
char backing_filename[1024];
int ret = 0;
@@ -146,20 +144,21 @@ static void coroutine_fn mirror_run(void *opaque)
bdrv_get_info(s->target, &bdi);
if (s->buf_size < bdi.cluster_size) {
s->buf_size = bdi.cluster_size;
- length = (bdrv_getlength(bs) + BLOCK_SIZE - 1) / BLOCK_SIZE;
+ length = (bdrv_getlength(bs) + s->granularity - 1) / s->granularity;
s->cow_bitmap = bitmap_new(length);
}
}
end = s->common.len >> BDRV_SECTOR_BITS;
s->buf = qemu_blockalign(bs, s->buf_size);
+ nb_sectors_chunk = s->granularity >> BDRV_SECTOR_BITS;
if (s->mode != MIRROR_SYNC_MODE_NONE) {
/* First part, loop on the sectors and initialize the dirty bitmap. */
BlockDriverState *base;
base = s->mode == MIRROR_SYNC_MODE_FULL ? NULL : bs->backing_hd;
for (sector_num = 0; sector_num < end; ) {
- int64_t next = (sector_num | (BDRV_SECTORS_PER_DIRTY_CHUNK - 1)) + 1;
+ int64_t next = (sector_num | (nb_sectors_chunk - 1)) + 1;
ret = bdrv_co_is_allocated_above(bs, base,
sector_num, next - sector_num, &n);
@@ -240,7 +239,7 @@ static void coroutine_fn mirror_run(void *opaque)
s->common.offset = (end - cnt) * BDRV_SECTOR_SIZE;
if (s->common.speed) {
- delay_ns = ratelimit_calculate_delay(&s->limit, BDRV_SECTORS_PER_DIRTY_CHUNK);
+ delay_ns = ratelimit_calculate_delay(&s->limit, nb_sectors_chunk);
} else {
delay_ns = 0;
}
@@ -330,7 +329,7 @@ static BlockJobType mirror_job_type = {
};
void mirror_start(BlockDriverState *bs, BlockDriverState *target,
- int64_t speed, MirrorSyncMode mode,
+ int64_t speed, int64_t granularity, MirrorSyncMode mode,
BlockdevOnError on_source_error,
BlockdevOnError on_target_error,
BlockDriverCompletionFunc *cb,
@@ -338,6 +337,20 @@ void mirror_start(BlockDriverState *bs, BlockDriverState *target,
{
MirrorBlockJob *s;
+ if (granularity == 0) {
+ /* Choose the default granularity based on the target file's cluster
+ * size, clamped between 4k and 64k. */
+ BlockDriverInfo bdi;
+ if (bdrv_get_info(target, &bdi) >= 0 && bdi.cluster_size != 0) {
+ granularity = MAX(4096, bdi.cluster_size);
+ granularity = MIN(65536, granularity);
+ } else {
+ granularity = 65536;
+ }
+ }
+
+ assert((granularity & (granularity - 1)) == 0);
+
if ((on_source_error == BLOCKDEV_ON_ERROR_STOP ||
on_source_error == BLOCKDEV_ON_ERROR_ENOSPC) &&
!bdrv_iostatus_is_enabled(bs)) {
@@ -354,9 +367,10 @@ void mirror_start(BlockDriverState *bs, BlockDriverState *target,
s->on_target_error = on_target_error;
s->target = target;
s->mode = mode;
- s->buf_size = BLOCK_SIZE;
+ s->granularity = granularity;
+ s->buf_size = granularity;
- bdrv_set_dirty_tracking(bs, BLOCK_SIZE);
+ bdrv_set_dirty_tracking(bs, granularity);
bdrv_set_enable_write_cache(s->target, true);
bdrv_set_on_error(s->target, on_target_error, on_target_error);
bdrv_iostatus_enable(s->target);
diff --git a/block_int.h b/block_int.h
index 364ab2d..8d038c8 100644
--- a/block_int.h
+++ b/block_int.h
@@ -338,6 +338,7 @@ void commit_start(BlockDriverState *bs, BlockDriverState *base,
* @bs: Block device to operate on.
* @target: Block device to write to.
* @speed: The maximum speed, in bytes per second, or 0 for unlimited.
+ * @granularity: The chosen granularity for the dirty bitmap.
* @mode: Whether to collapse all images in the chain to the target.
* @on_source_error: The action to take upon error reading from the source.
* @on_target_error: The action to take upon error writing to the target.
@@ -351,7 +352,7 @@ void commit_start(BlockDriverState *bs, BlockDriverState *base,
* @bs will be switched to read from @target.
*/
void mirror_start(BlockDriverState *bs, BlockDriverState *target,
- int64_t speed, MirrorSyncMode mode,
+ int64_t speed, int64_t granularity, MirrorSyncMode mode,
BlockdevOnError on_source_error,
BlockdevOnError on_target_error,
BlockDriverCompletionFunc *cb,
diff --git a/blockdev.c b/blockdev.c
index f309dba..d554c03 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1192,6 +1192,7 @@ void qmp_drive_mirror(const char *device, const char *target,
enum MirrorSyncMode sync,
bool has_mode, enum NewImageMode mode,
bool has_speed, int64_t speed,
+ bool has_granularity, uint32_t granularity,
bool has_on_source_error, BlockdevOnError on_source_error,
bool has_on_target_error, BlockdevOnError on_target_error,
Error **errp)
@@ -1217,6 +1218,17 @@ void qmp_drive_mirror(const char *device, const char *target,
if (!has_mode) {
mode = NEW_IMAGE_MODE_ABSOLUTE_PATHS;
}
+ if (!has_granularity) {
+ granularity = 0;
+ }
+ if (granularity != 0 && (granularity < 512 || granularity > 1048576 * 64)) {
+ error_set(errp, QERR_INVALID_PARAMETER, device);
+ return;
+ }
+ if (granularity & (granularity - 1)) {
+ error_set(errp, QERR_INVALID_PARAMETER, device);
+ return;
+ }
bs = bdrv_find(device);
if (!bs) {
@@ -1298,7 +1310,8 @@ void qmp_drive_mirror(const char *device, const char *target,
return;
}
- mirror_start(bs, target_bs, speed, sync, on_source_error, on_target_error,
+ mirror_start(bs, target_bs, speed, granularity, sync,
+ on_source_error, on_target_error,
block_job_cb, bs, &local_err);
if (local_err != NULL) {
bdrv_delete(target_bs);
diff --git a/hmp.c b/hmp.c
index 180ba2b..e6d2930 100644
--- a/hmp.c
+++ b/hmp.c
@@ -795,7 +795,7 @@ void hmp_drive_mirror(Monitor *mon, const QDict *qdict)
qmp_drive_mirror(device, filename, !!format, format,
full ? MIRROR_SYNC_MODE_FULL : MIRROR_SYNC_MODE_TOP,
- true, mode, false, 0,
+ true, mode, false, 0, false, 0,
false, 0, false, 0, &errp);
hmp_handle_error(mon, &errp);
}
diff --git a/qapi-schema.json b/qapi-schema.json
index 75aacdd..465984e 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1636,6 +1636,11 @@
# (all the disk, only the sectors allocated in the topmost image, or
# only new I/O).
#
+# @granularity: #optional granularity of the dirty bitmap, default is 64K
+# if the image format doesn't have clusters, 4K if the clusters
+# are smaller than that, else the cluster size. Must be a
+# power of 2 between 512 and 64M.
+#
# @on-source-error: #optional the action to take on an error on the source,
# default 'report'. 'stop' and 'enospc' can only be used
# if the block device supports io-status (see BlockInfo).
@@ -1652,7 +1657,8 @@
{ 'command': 'drive-mirror',
'data': { 'device': 'str', 'target': 'str', '*format': 'str',
'sync': 'MirrorSyncMode', '*mode': 'NewImageMode',
- '*speed': 'int', '*on-source-error': 'BlockdevOnError',
+ '*speed': 'int', '*granularity': 'uint32',
+ '*on-source-error': 'BlockdevOnError',
'*on-target-error': 'BlockdevOnError' } }
##
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 5c692d0..d8faad0 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -938,7 +938,8 @@ EQMP
{
.name = "drive-mirror",
.args_type = "sync:s,device:B,target:s,speed:i?,mode:s?,format:s?,"
- "on-source-error:s?,on-target-error:s?",
+ "on-source-error:s?,on-target-error:s?,"
+ "granularity:i?",
.mhandler.cmd_new = qmp_marshal_input_drive_mirror,
},
@@ -962,6 +963,7 @@ Arguments:
file/device (NewImageMode, optional, default 'absolute-paths')
- "speed": maximum speed of the streaming job, in bytes per second
(json-int)
+- "granularity": granularity of the dirty bitmap (json-int, optional)
- "sync": what parts of the disk image should be copied to the destination;
possibilities include "full" for all the disk, "top" for only the sectors
allocated in the topmost image, or "none" to only replicate new I/O
@@ -971,6 +973,10 @@ Arguments:
- "on-target-error": the action to take on an error on the target
(BlockdevOnError, default 'report')
+The default value of the granularity is, if the image format defines
+a cluster size, the cluster size or 4096, whichever is larger. If it
+does not define a cluster size, the default value of the granularity
+is 65536.
Example:
--
1.8.0.1
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [Qemu-devel] [PATCH 09/20] mirror: switch mirror_iteration to AIO
2012-12-12 13:46 [Qemu-devel] [PATCH 00/20] Block device mirroring enhancements, 12-12-12 edition Paolo Bonzini
` (7 preceding siblings ...)
2012-12-12 13:46 ` [Qemu-devel] [PATCH 08/20] mirror: allow customizing the granularity Paolo Bonzini
@ 2012-12-12 13:46 ` Paolo Bonzini
2012-12-14 22:11 ` Eric Blake
2012-12-12 13:46 ` [Qemu-devel] [PATCH 10/20] mirror: add buf-size argument to drive-mirror Paolo Bonzini
` (11 subsequent siblings)
20 siblings, 1 reply; 53+ messages in thread
From: Paolo Bonzini @ 2012-12-12 13:46 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, jcody, stefanha
There is really no change in the behavior of the job here, since
there is still a maximum of one in-flight I/O operation between
the source and the target. However, this patch already introduces
the AIO callbacks (which are unmodified in the next patch)
and some of the logic to count in-flight operations and only
complete the job when there is none.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
block/mirror.c | 155 +++++++++++++++++++++++++++++++++++++++++++--------------
trace-events | 2 +
2 files changed, 119 insertions(+), 38 deletions(-)
diff --git a/block/mirror.c b/block/mirror.c
index 93240eb..f02ffb3 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -33,8 +33,19 @@ typedef struct MirrorBlockJob {
unsigned long *cow_bitmap;
HBitmapIter hbi;
uint8_t *buf;
+
+ int in_flight;
+ int ret;
} MirrorBlockJob;
+typedef struct MirrorOp {
+ MirrorBlockJob *s;
+ QEMUIOVector qiov;
+ struct iovec iov;
+ int64_t sector_num;
+ int nb_sectors;
+} MirrorOp;
+
static BlockErrorAction mirror_error_action(MirrorBlockJob *s, bool read,
int error)
{
@@ -48,15 +59,60 @@ static BlockErrorAction mirror_error_action(MirrorBlockJob *s, bool read,
}
}
-static int coroutine_fn mirror_iteration(MirrorBlockJob *s,
- BlockErrorAction *p_action)
+static void mirror_iteration_done(MirrorOp *op)
+{
+ MirrorBlockJob *s = op->s;
+
+ s->in_flight--;
+ trace_mirror_iteration_done(s, op->sector_num, op->nb_sectors);
+ g_slice_free(MirrorOp, op);
+ qemu_coroutine_enter(s->common.co, NULL);
+}
+
+static void mirror_write_complete(void *opaque, int ret)
+{
+ MirrorOp *op = opaque;
+ MirrorBlockJob *s = op->s;
+ if (ret < 0) {
+ BlockDriverState *source = s->common.bs;
+ BlockErrorAction action;
+
+ bdrv_set_dirty(source, op->sector_num, op->nb_sectors);
+ action = mirror_error_action(s, false, -ret);
+ if (action == BDRV_ACTION_REPORT && s->ret >= 0) {
+ s->ret = ret;
+ }
+ }
+ mirror_iteration_done(op);
+}
+
+static void mirror_read_complete(void *opaque, int ret)
+{
+ MirrorOp *op = opaque;
+ MirrorBlockJob *s = op->s;
+ if (ret < 0) {
+ BlockDriverState *source = s->common.bs;
+ BlockErrorAction action;
+
+ bdrv_set_dirty(source, op->sector_num, op->nb_sectors);
+ action = mirror_error_action(s, true, -ret);
+ if (action == BDRV_ACTION_REPORT && s->ret >= 0) {
+ s->ret = ret;
+ }
+
+ mirror_iteration_done(op);
+ return;
+ }
+ bdrv_aio_writev(s->target, op->sector_num, &op->qiov, op->nb_sectors,
+ mirror_write_complete, op);
+}
+
+static void coroutine_fn mirror_iteration(MirrorBlockJob *s)
{
BlockDriverState *source = s->common.bs;
- BlockDriverState *target = s->target;
- QEMUIOVector qiov;
- int ret, nb_sectors, nb_sectors_chunk;
+ int nb_sectors, nb_sectors_chunk;
int64_t end, sector_num, cluster_num;
- struct iovec iov;
+ MirrorOp *op;
s->sector_num = hbitmap_iter_next(&s->hbi);
if (s->sector_num < 0) {
@@ -87,31 +143,30 @@ static int coroutine_fn mirror_iteration(MirrorBlockJob *s,
end = s->common.len >> BDRV_SECTOR_BITS;
nb_sectors = MIN(nb_sectors, end - sector_num);
+
+ /* Allocate a MirrorOp that is used as an AIO callback. */
+ op = g_slice_new(MirrorOp);
+ op->s = s;
+ op->iov.iov_base = s->buf;
+ op->iov.iov_len = nb_sectors * 512;
+ op->sector_num = sector_num;
+ op->nb_sectors = nb_sectors;
+ qemu_iovec_init_external(&op->qiov, &op->iov, 1);
+
bdrv_reset_dirty(source, sector_num, nb_sectors);
/* Copy the dirty cluster. */
- iov.iov_base = s->buf;
- iov.iov_len = nb_sectors * 512;
- qemu_iovec_init_external(&qiov, &iov, 1);
-
+ s->in_flight++;
trace_mirror_one_iteration(s, sector_num, nb_sectors);
- ret = bdrv_co_readv(source, sector_num, nb_sectors, &qiov);
- if (ret < 0) {
- *p_action = mirror_error_action(s, true, -ret);
- goto fail;
- }
- ret = bdrv_co_writev(target, sector_num, nb_sectors, &qiov);
- if (ret < 0) {
- *p_action = mirror_error_action(s, false, -ret);
- s->synced = false;
- goto fail;
- }
- return 0;
+ bdrv_aio_readv(source, sector_num, &op->qiov, nb_sectors,
+ mirror_read_complete, op);
+}
-fail:
- /* Try again later. */
- bdrv_set_dirty(source, sector_num, nb_sectors);
- return ret;
+static void mirror_drain(MirrorBlockJob *s)
+{
+ while (s->in_flight > 0) {
+ qemu_coroutine_yield();
+ }
}
static void coroutine_fn mirror_run(void *opaque)
@@ -119,6 +174,7 @@ static void coroutine_fn mirror_run(void *opaque)
MirrorBlockJob *s = opaque;
BlockDriverState *bs = s->common.bs;
int64_t sector_num, end, nb_sectors_chunk, length;
+ uint64_t last_pause_ns;
BlockDriverInfo bdi;
char backing_filename[1024];
int ret = 0;
@@ -177,28 +233,43 @@ static void coroutine_fn mirror_run(void *opaque)
}
bdrv_dirty_iter_init(bs, &s->hbi);
+ last_pause_ns = qemu_get_clock_ns(rt_clock);
for (;;) {
uint64_t delay_ns;
int64_t cnt;
bool should_complete;
+ if (s->ret < 0) {
+ ret = s->ret;
+ break;
+ }
+
cnt = bdrv_get_dirty_count(bs);
- if (cnt != 0) {
- BlockErrorAction action = BDRV_ACTION_REPORT;
- ret = mirror_iteration(s, &action);
- if (ret < 0 && action == BDRV_ACTION_REPORT) {
- goto immediate_exit;
+
+ /* Note that even when no rate limit is applied we need to yield
+ * periodically with no pending I/O so that qemu_aio_flush() returns.
+ * We do so every SLICE_TIME milliseconds, or when there is an error,
+ * or when the source is clean, whichever comes first.
+ */
+ if (qemu_get_clock_ns(rt_clock) - last_pause_ns < SLICE_TIME &&
+ s->common.iostatus == BLOCK_DEVICE_IO_STATUS_OK) {
+ if (s->in_flight > 0) {
+ trace_mirror_yield(s, s->in_flight, cnt);
+ qemu_coroutine_yield();
+ continue;
+ } else if (cnt != 0) {
+ mirror_iteration(s);
+ continue;
}
- cnt = bdrv_get_dirty_count(bs);
}
should_complete = false;
- if (cnt == 0) {
+ if (s->in_flight == 0 && cnt == 0) {
trace_mirror_before_flush(s);
ret = bdrv_flush(s->target);
if (ret < 0) {
if (mirror_error_action(s, false, -ret) == BDRV_ACTION_REPORT) {
- goto immediate_exit;
+ break;
}
} else {
/* We're out of the streaming phase. From now on, if the job
@@ -244,15 +315,12 @@ static void coroutine_fn mirror_run(void *opaque)
delay_ns = 0;
}
- /* Note that even when no rate limit is applied we need to yield
- * with no pending I/O here so that qemu_aio_flush() returns.
- */
block_job_sleep_ns(&s->common, rt_clock, delay_ns);
if (block_job_is_cancelled(&s->common)) {
break;
}
} else if (!should_complete) {
- delay_ns = (cnt == 0 ? SLICE_TIME : 0);
+ delay_ns = (s->in_flight == 0 && cnt == 0 ? SLICE_TIME : 0);
block_job_sleep_ns(&s->common, rt_clock, delay_ns);
} else if (cnt == 0) {
/* The two disks are in sync. Exit and report successful
@@ -262,9 +330,20 @@ static void coroutine_fn mirror_run(void *opaque)
s->common.cancelled = false;
break;
}
+ last_pause_ns = qemu_get_clock_ns(rt_clock);
}
immediate_exit:
+ if (s->in_flight > 0) {
+ /* We get here only if something went wrong. Either the job failed,
+ * or it was cancelled prematurely so that we do not guarantee that
+ * the target is a copy of the source.
+ */
+ assert(ret < 0 || (!s->synced && block_job_is_cancelled(&s->common)));
+ mirror_drain(s);
+ }
+
+ assert(s->in_flight == 0);
g_free(s->buf);
g_free(s->cow_bitmap);
bdrv_set_dirty_tracking(bs, 0);
diff --git a/trace-events b/trace-events
index b318b4e..ca50812 100644
--- a/trace-events
+++ b/trace-events
@@ -85,6 +85,8 @@ mirror_before_drain(void *s, int64_t cnt) "s %p dirty count %"PRId64
mirror_before_sleep(void *s, int64_t cnt, int synced) "s %p dirty count %"PRId64" synced %d"
mirror_one_iteration(void *s, int64_t sector_num, int nb_sectors) "s %p sector_num %"PRId64" nb_sectors %d"
mirror_cow(void *s, int64_t sector_num) "s %p sector_num %"PRId64
+mirror_iteration_done(void *s, int64_t sector_num, int nb_sectors) "s %p sector_num %"PRId64" nb_sectors %d"
+mirror_yield(void *s, int64_t cnt, int in_flight) "s %p dirty count %"PRId64" in_flight %d"
# blockdev.c
qmp_block_job_cancel(void *job) "job %p"
--
1.8.0.1
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [Qemu-devel] [PATCH 10/20] mirror: add buf-size argument to drive-mirror
2012-12-12 13:46 [Qemu-devel] [PATCH 00/20] Block device mirroring enhancements, 12-12-12 edition Paolo Bonzini
` (8 preceding siblings ...)
2012-12-12 13:46 ` [Qemu-devel] [PATCH 09/20] mirror: switch mirror_iteration to AIO Paolo Bonzini
@ 2012-12-12 13:46 ` Paolo Bonzini
2012-12-14 22:22 ` Eric Blake
2013-01-14 11:41 ` Stefan Hajnoczi
2012-12-12 13:46 ` [Qemu-devel] [PATCH 11/20] mirror: support more than one in-flight AIO operation Paolo Bonzini
` (10 subsequent siblings)
20 siblings, 2 replies; 53+ messages in thread
From: Paolo Bonzini @ 2012-12-12 13:46 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, jcody, stefanha
This makes sense when the next commit starts using the extra buffer space
to perform many I/O operations asynchronously.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
block/mirror.c | 6 +++---
block_int.h | 5 +++--
blockdev.c | 9 ++++++++-
hmp.c | 2 +-
qapi-schema.json | 5 ++++-
qmp-commands.hx | 4 +++-
tests/qemu-iotests/041 | 31 +++++++++++++++++++++++++++++++
7 files changed, 53 insertions(+), 9 deletions(-)
diff --git a/block/mirror.c b/block/mirror.c
index f02ffb3..ed56b86 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -408,8 +408,8 @@ static BlockJobType mirror_job_type = {
};
void mirror_start(BlockDriverState *bs, BlockDriverState *target,
- int64_t speed, int64_t granularity, MirrorSyncMode mode,
- BlockdevOnError on_source_error,
+ int64_t speed, int64_t granularity, int64_t buf_size,
+ MirrorSyncMode mode, BlockdevOnError on_source_error,
BlockdevOnError on_target_error,
BlockDriverCompletionFunc *cb,
void *opaque, Error **errp)
@@ -447,7 +447,7 @@ void mirror_start(BlockDriverState *bs, BlockDriverState *target,
s->target = target;
s->mode = mode;
s->granularity = granularity;
- s->buf_size = granularity;
+ s->buf_size = MAX(buf_size, granularity);
bdrv_set_dirty_tracking(bs, granularity);
bdrv_set_enable_write_cache(s->target, true);
diff --git a/block_int.h b/block_int.h
index 8d038c8..8f54b41 100644
--- a/block_int.h
+++ b/block_int.h
@@ -339,6 +339,7 @@ void commit_start(BlockDriverState *bs, BlockDriverState *base,
* @target: Block device to write to.
* @speed: The maximum speed, in bytes per second, or 0 for unlimited.
* @granularity: The chosen granularity for the dirty bitmap.
+ * @buf_size: The amount of data that can be in flight at one time.
* @mode: Whether to collapse all images in the chain to the target.
* @on_source_error: The action to take upon error reading from the source.
* @on_target_error: The action to take upon error writing to the target.
@@ -352,8 +353,8 @@ void commit_start(BlockDriverState *bs, BlockDriverState *base,
* @bs will be switched to read from @target.
*/
void mirror_start(BlockDriverState *bs, BlockDriverState *target,
- int64_t speed, int64_t granularity, MirrorSyncMode mode,
- BlockdevOnError on_source_error,
+ int64_t speed, int64_t granularity, int64_t buf_size,
+ MirrorSyncMode mode, BlockdevOnError on_source_error,
BlockdevOnError on_target_error,
BlockDriverCompletionFunc *cb,
void *opaque, Error **errp);
diff --git a/blockdev.c b/blockdev.c
index d554c03..dff6c3d 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1187,12 +1187,15 @@ void qmp_block_commit(const char *device,
drive_get_ref(drive_get_by_blockdev(bs));
}
+#define DEFAULT_MIRROR_BUF_SIZE (10 << 20)
+
void qmp_drive_mirror(const char *device, const char *target,
bool has_format, const char *format,
enum MirrorSyncMode sync,
bool has_mode, enum NewImageMode mode,
bool has_speed, int64_t speed,
bool has_granularity, uint32_t granularity,
+ bool has_buf_size, int64_t buf_size,
bool has_on_source_error, BlockdevOnError on_source_error,
bool has_on_target_error, BlockdevOnError on_target_error,
Error **errp)
@@ -1221,6 +1224,10 @@ void qmp_drive_mirror(const char *device, const char *target,
if (!has_granularity) {
granularity = 0;
}
+ if (!has_buf_size) {
+ buf_size = DEFAULT_MIRROR_BUF_SIZE;
+ }
+
if (granularity != 0 && (granularity < 512 || granularity > 1048576 * 64)) {
error_set(errp, QERR_INVALID_PARAMETER, device);
return;
@@ -1310,7 +1317,7 @@ void qmp_drive_mirror(const char *device, const char *target,
return;
}
- mirror_start(bs, target_bs, speed, granularity, sync,
+ mirror_start(bs, target_bs, speed, granularity, buf_size, sync,
on_source_error, on_target_error,
block_job_cb, bs, &local_err);
if (local_err != NULL) {
diff --git a/hmp.c b/hmp.c
index e6d2930..428c563 100644
--- a/hmp.c
+++ b/hmp.c
@@ -795,7 +795,7 @@ void hmp_drive_mirror(Monitor *mon, const QDict *qdict)
qmp_drive_mirror(device, filename, !!format, format,
full ? MIRROR_SYNC_MODE_FULL : MIRROR_SYNC_MODE_TOP,
- true, mode, false, 0, false, 0,
+ true, mode, false, 0, false, 0, false, 0,
false, 0, false, 0, &errp);
hmp_handle_error(mon, &errp);
}
diff --git a/qapi-schema.json b/qapi-schema.json
index 465984e..fb38106 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1641,6 +1641,9 @@
# are smaller than that, else the cluster size. Must be a
# power of 2 between 512 and 64M.
#
+# @buf-size: #optional maximum amount of data in flight from source to
+# target.
+#
# @on-source-error: #optional the action to take on an error on the source,
# default 'report'. 'stop' and 'enospc' can only be used
# if the block device supports io-status (see BlockInfo).
@@ -1658,7 +1661,7 @@
'data': { 'device': 'str', 'target': 'str', '*format': 'str',
'sync': 'MirrorSyncMode', '*mode': 'NewImageMode',
'*speed': 'int', '*granularity': 'uint32',
- '*on-source-error': 'BlockdevOnError',
+ '*buf-size': 'int', '*on-source-error': 'BlockdevOnError',
'*on-target-error': 'BlockdevOnError' } }
##
diff --git a/qmp-commands.hx b/qmp-commands.hx
index d8faad0..97c52c9 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -939,7 +939,7 @@ EQMP
.name = "drive-mirror",
.args_type = "sync:s,device:B,target:s,speed:i?,mode:s?,format:s?,"
"on-source-error:s?,on-target-error:s?,"
- "granularity:i?",
+ "buf-size:i?,granularity:i?",
.mhandler.cmd_new = qmp_marshal_input_drive_mirror,
},
@@ -964,6 +964,8 @@ Arguments:
- "speed": maximum speed of the streaming job, in bytes per second
(json-int)
- "granularity": granularity of the dirty bitmap (json-int, optional)
+- "buf_size": maximum amount of data in flight from source to target
+ (json-int, default 10M)
- "sync": what parts of the disk image should be copied to the destination;
possibilities include "full" for all the disk, "top" for only the sectors
allocated in the topmost image, or "none" to only replicate new I/O
diff --git a/tests/qemu-iotests/041 b/tests/qemu-iotests/041
index d0e9a7f..05ff0f8 100755
--- a/tests/qemu-iotests/041
+++ b/tests/qemu-iotests/041
@@ -207,6 +207,37 @@ class TestSingleDrive(ImageMirroringTestCase):
self.assertTrue(self.compare_images(test_img, target_img),
'target image does not match source after mirroring')
+ def test_small_buffer(self):
+ self.assert_no_active_mirrors()
+
+ # A small buffer is rounded up automatically
+ result = self.vm.qmp('drive-mirror', device='drive0', sync='full',
+ buf_size=4096, target=target_img)
+ self.assert_qmp(result, 'return', {})
+
+ self.complete_and_wait()
+ result = self.vm.qmp('query-block')
+ self.assert_qmp(result, 'return[0]/inserted/file', target_img)
+ self.vm.shutdown()
+ self.assertTrue(self.compare_images(test_img, target_img),
+ 'target image does not match source after mirroring')
+
+ def test_small_buffer2(self):
+ self.assert_no_active_mirrors()
+
+ qemu_img('create', '-f', iotests.imgfmt, '-o', 'cluster_size=%d,size=%d'
+ % (TestSingleDrive.image_len, TestSingleDrive.image_len), target_img)
+ result = self.vm.qmp('drive-mirror', device='drive0', sync='full',
+ buf_size=65536, mode='existing', target=target_img)
+ self.assert_qmp(result, 'return', {})
+
+ self.complete_and_wait()
+ result = self.vm.qmp('query-block')
+ self.assert_qmp(result, 'return[0]/inserted/file', target_img)
+ self.vm.shutdown()
+ self.assertTrue(self.compare_images(test_img, target_img),
+ 'target image does not match source after mirroring')
+
def test_large_cluster(self):
self.assert_no_active_mirrors()
--
1.8.0.1
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [Qemu-devel] [PATCH 11/20] mirror: support more than one in-flight AIO operation
2012-12-12 13:46 [Qemu-devel] [PATCH 00/20] Block device mirroring enhancements, 12-12-12 edition Paolo Bonzini
` (9 preceding siblings ...)
2012-12-12 13:46 ` [Qemu-devel] [PATCH 10/20] mirror: add buf-size argument to drive-mirror Paolo Bonzini
@ 2012-12-12 13:46 ` Paolo Bonzini
2012-12-14 22:32 ` Eric Blake
2013-01-14 12:56 ` Stefan Hajnoczi
2012-12-12 13:46 ` [Qemu-devel] [PATCH 12/20] mirror: support arbitrarily-sized iterations Paolo Bonzini
` (9 subsequent siblings)
20 siblings, 2 replies; 53+ messages in thread
From: Paolo Bonzini @ 2012-12-12 13:46 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, jcody, stefanha
With AIO support in place, we can start copying more than one chunk
in parallel. This patch introduces the required infrastructure for
this: the buffer is split into multiple granularity-sized chunks,
and there is a free list to access them.
Because of copy-on-write, a single operation may already require
multiple chunks to be available on the free list.
In addition, two different iterations on the HBitmap may want to
copy the same cluster. We avoid this by keeping a bitmap of in-flight
I/O operations, and blocking until the previous iteration completes.
This should be a pretty rare occurrence, though; as long as there is
no overlap the next iteration can start before the previous one finishes.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
block/mirror.c | 109 ++++++++++++++++++++++++++++++++++++++++++++++++++-------
trace-events | 4 ++-
2 files changed, 100 insertions(+), 13 deletions(-)
diff --git a/block/mirror.c b/block/mirror.c
index ed56b86..f9caaea 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -17,7 +17,15 @@
#include "qemu/ratelimit.h"
#include "bitmap.h"
-#define SLICE_TIME 100000000ULL /* ns */
+#define SLICE_TIME 100000000ULL /* ns */
+#define MAX_IN_FLIGHT 16
+
+/* The mirroring buffer is a list of granularity-sized chunks.
+ * Free chunks are organized in a list.
+ */
+typedef struct MirrorBuffer {
+ QSIMPLEQ_ENTRY(MirrorBuffer) next;
+} MirrorBuffer;
typedef struct MirrorBlockJob {
BlockJob common;
@@ -33,7 +41,10 @@ typedef struct MirrorBlockJob {
unsigned long *cow_bitmap;
HBitmapIter hbi;
uint8_t *buf;
+ QSIMPLEQ_HEAD(, MirrorBuffer) buf_free;
+ int buf_free_count;
+ unsigned long *in_flight_bitmap;
int in_flight;
int ret;
} MirrorBlockJob;
@@ -41,7 +52,6 @@ typedef struct MirrorBlockJob {
typedef struct MirrorOp {
MirrorBlockJob *s;
QEMUIOVector qiov;
- struct iovec iov;
int64_t sector_num;
int nb_sectors;
} MirrorOp;
@@ -62,8 +72,22 @@ static BlockErrorAction mirror_error_action(MirrorBlockJob *s, bool read,
static void mirror_iteration_done(MirrorOp *op)
{
MirrorBlockJob *s = op->s;
+ struct iovec *iov;
+ int64_t cluster_num;
+ int i, nb_chunks;
s->in_flight--;
+ iov = op->qiov.iov;
+ for (i = 0; i < op->qiov.niov; i++) {
+ MirrorBuffer *buf = (MirrorBuffer *) iov[i].iov_base;
+ QSIMPLEQ_INSERT_TAIL(&s->buf_free, buf, next);
+ s->buf_free_count++;
+ }
+
+ cluster_num = op->sector_num / s->granularity;
+ nb_chunks = op->nb_sectors / s->granularity;
+ bitmap_clear(s->in_flight_bitmap, cluster_num, nb_chunks);
+
trace_mirror_iteration_done(s, op->sector_num, op->nb_sectors);
g_slice_free(MirrorOp, op);
qemu_coroutine_enter(s->common.co, NULL);
@@ -110,8 +134,8 @@ static void mirror_read_complete(void *opaque, int ret)
static void coroutine_fn mirror_iteration(MirrorBlockJob *s)
{
BlockDriverState *source = s->common.bs;
- int nb_sectors, nb_sectors_chunk;
- int64_t end, sector_num, cluster_num;
+ int nb_sectors, nb_sectors_chunk, nb_chunks;
+ int64_t end, sector_num, cluster_num, next_sector, hbitmap_next_sector;
MirrorOp *op;
s->sector_num = hbitmap_iter_next(&s->hbi);
@@ -122,6 +146,8 @@ static void coroutine_fn mirror_iteration(MirrorBlockJob *s)
assert(s->sector_num >= 0);
}
+ hbitmap_next_sector = s->sector_num;
+
/* If we have no backing file yet in the destination, and the cluster size
* is very large, we need to do COW ourselves. The first time a cluster is
* copied, copy it entirely.
@@ -137,21 +163,58 @@ static void coroutine_fn mirror_iteration(MirrorBlockJob *s)
bdrv_round_to_clusters(s->target,
sector_num, nb_sectors_chunk,
§or_num, &nb_sectors);
- bitmap_set(s->cow_bitmap, sector_num / nb_sectors_chunk,
- nb_sectors / nb_sectors_chunk);
+
+ /* The rounding may make us copy sectors before the
+ * first dirty one.
+ */
+ cluster_num = sector_num / nb_sectors_chunk;
+ }
+
+ /* Wait for I/O to this cluster (from a previous iteration) to be done. */
+ while (test_bit(cluster_num, s->in_flight_bitmap)) {
+ trace_mirror_yield_in_flight(s, sector_num, s->in_flight);
+ qemu_coroutine_yield();
}
end = s->common.len >> BDRV_SECTOR_BITS;
nb_sectors = MIN(nb_sectors, end - sector_num);
+ nb_chunks = (nb_sectors + nb_sectors_chunk - 1) / nb_sectors_chunk;
+ while (s->buf_free_count < nb_chunks) {
+ trace_mirror_yield_buf_busy(s, nb_chunks, s->in_flight);
+ qemu_coroutine_yield();
+ }
+
+ /* We have enough free space to copy these sectors. */
+ if (s->cow_bitmap) {
+ bitmap_set(s->cow_bitmap, cluster_num, nb_chunks);
+ }
/* Allocate a MirrorOp that is used as an AIO callback. */
op = g_slice_new(MirrorOp);
op->s = s;
- op->iov.iov_base = s->buf;
- op->iov.iov_len = nb_sectors * 512;
op->sector_num = sector_num;
op->nb_sectors = nb_sectors;
- qemu_iovec_init_external(&op->qiov, &op->iov, 1);
+
+ /* Now make a QEMUIOVector taking enough granularity-sized chunks
+ * from s->buf_free.
+ */
+ qemu_iovec_init(&op->qiov, nb_chunks);
+ next_sector = sector_num;
+ while (nb_chunks-- > 0) {
+ MirrorBuffer *buf = QSIMPLEQ_FIRST(&s->buf_free);
+ QSIMPLEQ_REMOVE_HEAD(&s->buf_free, next);
+ s->buf_free_count--;
+ qemu_iovec_add(&op->qiov, buf, s->granularity);
+
+ /* Advance the HBitmapIter in parallel, so that we do not examine
+ * the same sector twice.
+ */
+ if (next_sector > hbitmap_next_sector && bdrv_get_dirty(source, next_sector)) {
+ hbitmap_next_sector = hbitmap_iter_next(&s->hbi);
+ }
+
+ next_sector += nb_sectors_chunk;
+ }
bdrv_reset_dirty(source, sector_num, nb_sectors);
@@ -162,6 +225,23 @@ static void coroutine_fn mirror_iteration(MirrorBlockJob *s)
mirror_read_complete, op);
}
+static void mirror_free_init(MirrorBlockJob *s)
+{
+ int granularity = s->granularity;
+ size_t buf_size = s->buf_size;
+ uint8_t *buf = s->buf;
+
+ assert(s->buf_free_count == 0);
+ QSIMPLEQ_INIT(&s->buf_free);
+ while (buf_size != 0) {
+ MirrorBuffer *cur = (MirrorBuffer *)buf;
+ QSIMPLEQ_INSERT_TAIL(&s->buf_free, cur, next);
+ s->buf_free_count++;
+ buf_size -= granularity;
+ buf += granularity;
+ }
+}
+
static void mirror_drain(MirrorBlockJob *s)
{
while (s->in_flight > 0) {
@@ -190,6 +270,9 @@ static void coroutine_fn mirror_run(void *opaque)
return;
}
+ length = (bdrv_getlength(bs) + s->granularity - 1) / s->granularity;
+ s->in_flight_bitmap = bitmap_new(length);
+
/* If we have no backing file yet in the destination, we cannot let
* the destination do COW. Instead, we copy sectors around the
* dirty data if needed. We need a bitmap to do that.
@@ -200,7 +283,6 @@ static void coroutine_fn mirror_run(void *opaque)
bdrv_get_info(s->target, &bdi);
if (s->buf_size < bdi.cluster_size) {
s->buf_size = bdi.cluster_size;
- length = (bdrv_getlength(bs) + s->granularity - 1) / s->granularity;
s->cow_bitmap = bitmap_new(length);
}
}
@@ -208,6 +290,7 @@ static void coroutine_fn mirror_run(void *opaque)
end = s->common.len >> BDRV_SECTOR_BITS;
s->buf = qemu_blockalign(bs, s->buf_size);
nb_sectors_chunk = s->granularity >> BDRV_SECTOR_BITS;
+ mirror_free_init(s);
if (s->mode != MIRROR_SYNC_MODE_NONE) {
/* First part, loop on the sectors and initialize the dirty bitmap. */
@@ -253,8 +336,9 @@ static void coroutine_fn mirror_run(void *opaque)
*/
if (qemu_get_clock_ns(rt_clock) - last_pause_ns < SLICE_TIME &&
s->common.iostatus == BLOCK_DEVICE_IO_STATUS_OK) {
- if (s->in_flight > 0) {
- trace_mirror_yield(s, s->in_flight, cnt);
+ if (s->in_flight == MAX_IN_FLIGHT || s->buf_free_count == 0 ||
+ (cnt == 0 && s->in_flight > 0)) {
+ trace_mirror_yield(s, s->in_flight, s->buf_free_count, cnt);
qemu_coroutine_yield();
continue;
} else if (cnt != 0) {
@@ -346,6 +430,7 @@ immediate_exit:
assert(s->in_flight == 0);
g_free(s->buf);
g_free(s->cow_bitmap);
+ g_free(s->in_flight_bitmap);
bdrv_set_dirty_tracking(bs, 0);
bdrv_iostatus_disable(s->target);
if (s->should_complete && ret == 0) {
diff --git a/trace-events b/trace-events
index ca50812..42068dd 100644
--- a/trace-events
+++ b/trace-events
@@ -86,7 +86,9 @@ mirror_before_sleep(void *s, int64_t cnt, int synced) "s %p dirty count %"PRId64
mirror_one_iteration(void *s, int64_t sector_num, int nb_sectors) "s %p sector_num %"PRId64" nb_sectors %d"
mirror_cow(void *s, int64_t sector_num) "s %p sector_num %"PRId64
mirror_iteration_done(void *s, int64_t sector_num, int nb_sectors) "s %p sector_num %"PRId64" nb_sectors %d"
-mirror_yield(void *s, int64_t cnt, int in_flight) "s %p dirty count %"PRId64" in_flight %d"
+mirror_yield(void *s, int64_t cnt, int buf_free_count, int in_flight) "s %p dirty count %"PRId64" free buffers %d in_flight %d"
+mirror_yield_in_flight(void *s, int64_t sector_num, int in_flight) "s %p sector_num %"PRId64" in_flight %d"
+mirror_yield_buf_busy(void *s, int nb_chunks, int in_flight) "s %p requested chunks %d in_flight %d"
# blockdev.c
qmp_block_job_cancel(void *job) "job %p"
--
1.8.0.1
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [Qemu-devel] [PATCH 12/20] mirror: support arbitrarily-sized iterations
2012-12-12 13:46 [Qemu-devel] [PATCH 00/20] Block device mirroring enhancements, 12-12-12 edition Paolo Bonzini
` (10 preceding siblings ...)
2012-12-12 13:46 ` [Qemu-devel] [PATCH 11/20] mirror: support more than one in-flight AIO operation Paolo Bonzini
@ 2012-12-12 13:46 ` Paolo Bonzini
2012-12-14 22:39 ` Eric Blake
2012-12-12 13:46 ` [Qemu-devel] [PATCH 13/20] oslib: add a wrapper for mmap/munmap Paolo Bonzini
` (8 subsequent siblings)
20 siblings, 1 reply; 53+ messages in thread
From: Paolo Bonzini @ 2012-12-12 13:46 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, jcody, stefanha
Yet another optimization is to extend the mirroring iteration to include more
adjacent dirty blocks. This limits the number of I/O operations and makes
mirroring efficient even with a small granularity. Most of the infrastructure
is already in place; we only need to put a loop around the computation of
the origin and sector count of the iteration.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
block/mirror.c | 100 +++++++++++++++++++++++++++++++++++++++------------------
trace-events | 1 +
2 files changed, 69 insertions(+), 32 deletions(-)
diff --git a/block/mirror.c b/block/mirror.c
index f9caaea..23f87d2 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -135,7 +135,7 @@ static void coroutine_fn mirror_iteration(MirrorBlockJob *s)
{
BlockDriverState *source = s->common.bs;
int nb_sectors, nb_sectors_chunk, nb_chunks;
- int64_t end, sector_num, cluster_num, next_sector, hbitmap_next_sector;
+ int64_t end, sector_num, next_cluster, next_sector, hbitmap_next_sector;
MirrorOp *op;
s->sector_num = hbitmap_iter_next(&s->hbi);
@@ -147,47 +147,83 @@ static void coroutine_fn mirror_iteration(MirrorBlockJob *s)
}
hbitmap_next_sector = s->sector_num;
+ sector_num = s->sector_num;
+ nb_sectors_chunk = s->granularity >> BDRV_SECTOR_BITS;
+ end = s->common.len >> BDRV_SECTOR_BITS;
- /* If we have no backing file yet in the destination, and the cluster size
- * is very large, we need to do COW ourselves. The first time a cluster is
- * copied, copy it entirely.
+ /* Extend the QEMUIOVector to include all adjacent blocks that will
+ * be copied in this operation.
+ *
+ * We have to do this if we have no backing file yet in the destination,
+ * and the cluster size is very large. Then we need to do COW ourselves.
+ * The first time a cluster is copied, copy it entirely. Note that,
+ * because both the granularity and the cluster size are powers of two,
+ * the number of sectors to copy cannot exceed one cluster.
*
- * Because both the granularity and the cluster size are powers of two, the
- * number of sectors to copy cannot exceed one cluster.
+ * We also want to extend the QEMUIOVector to include more adjacent
+ * dirty blocks if possible, to limit the number of I/O operations and
+ * run efficiently even with a small granularity.
*/
- sector_num = s->sector_num;
- nb_sectors_chunk = nb_sectors = s->granularity >> BDRV_SECTOR_BITS;
- cluster_num = sector_num / nb_sectors_chunk;
- if (s->cow_bitmap && !test_bit(cluster_num, s->cow_bitmap)) {
- trace_mirror_cow(s, sector_num);
- bdrv_round_to_clusters(s->target,
- sector_num, nb_sectors_chunk,
- §or_num, &nb_sectors);
-
- /* The rounding may make us copy sectors before the
- * first dirty one.
- */
- cluster_num = sector_num / nb_sectors_chunk;
- }
+ nb_chunks = 0;
+ nb_sectors = 0;
+ next_sector = sector_num;
+ next_cluster = sector_num / nb_sectors_chunk;
/* Wait for I/O to this cluster (from a previous iteration) to be done. */
- while (test_bit(cluster_num, s->in_flight_bitmap)) {
+ while (test_bit(next_cluster, s->in_flight_bitmap)) {
trace_mirror_yield_in_flight(s, sector_num, s->in_flight);
qemu_coroutine_yield();
}
- end = s->common.len >> BDRV_SECTOR_BITS;
- nb_sectors = MIN(nb_sectors, end - sector_num);
- nb_chunks = (nb_sectors + nb_sectors_chunk - 1) / nb_sectors_chunk;
- while (s->buf_free_count < nb_chunks) {
- trace_mirror_yield_buf_busy(s, nb_chunks, s->in_flight);
- qemu_coroutine_yield();
- }
+ do {
+ int added_sectors, added_chunks;
- /* We have enough free space to copy these sectors. */
- if (s->cow_bitmap) {
- bitmap_set(s->cow_bitmap, cluster_num, nb_chunks);
- }
+ if (!bdrv_get_dirty(source, next_sector) ||
+ test_bit(next_cluster, s->in_flight_bitmap)) {
+ assert(nb_sectors > 0);
+ break;
+ }
+
+ added_sectors = nb_sectors_chunk;
+ if (s->cow_bitmap && !test_bit(next_cluster, s->cow_bitmap)) {
+ bdrv_round_to_clusters(s->target,
+ next_sector, added_sectors,
+ &next_sector, &added_sectors);
+
+ /* On the first iteration, the rounding may make us copy
+ * sectors before the first dirty one.
+ */
+ if (next_sector < sector_num) {
+ assert(nb_sectors == 0);
+ sector_num = next_sector;
+ next_cluster = next_sector / nb_sectors_chunk;
+ }
+ }
+
+ added_sectors = MIN(added_sectors, end - (sector_num + nb_sectors));
+ added_chunks = (added_sectors + nb_sectors_chunk - 1) / nb_sectors_chunk;
+
+ /* When doing COW, it may happen that there is not enough space for
+ * a full cluster. Wait if that is the case.
+ */
+ while (nb_chunks == 0 && s->buf_free_count < added_chunks) {
+ trace_mirror_yield_buf_busy(s, nb_chunks, s->in_flight);
+ qemu_coroutine_yield();
+ }
+ if (s->buf_free_count < nb_chunks + added_chunks) {
+ trace_mirror_break_buf_busy(s, nb_chunks, s->in_flight);
+ break;
+ }
+
+ /* We have enough free space to copy these sectors. */
+ if (s->cow_bitmap) {
+ bitmap_set(s->cow_bitmap, next_cluster, added_chunks);
+ }
+ nb_sectors += added_sectors;
+ nb_chunks += added_chunks;
+ next_sector += added_sectors;
+ next_cluster += added_chunks;
+ } while (next_sector < end);
/* Allocate a MirrorOp that is used as an AIO callback. */
op = g_slice_new(MirrorOp);
diff --git a/trace-events b/trace-events
index 42068dd..3632efc 100644
--- a/trace-events
+++ b/trace-events
@@ -89,6 +89,7 @@ mirror_iteration_done(void *s, int64_t sector_num, int nb_sectors) "s %p sector_
mirror_yield(void *s, int64_t cnt, int buf_free_count, int in_flight) "s %p dirty count %"PRId64" free buffers %d in_flight %d"
mirror_yield_in_flight(void *s, int64_t sector_num, int in_flight) "s %p sector_num %"PRId64" in_flight %d"
mirror_yield_buf_busy(void *s, int nb_chunks, int in_flight) "s %p requested chunks %d in_flight %d"
+mirror_break_buf_busy(void *s, int nb_chunks, int in_flight) "s %p requested chunks %d in_flight %d"
# blockdev.c
qmp_block_job_cancel(void *job) "job %p"
--
1.8.0.1
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [Qemu-devel] [PATCH 13/20] oslib: add a wrapper for mmap/munmap
2012-12-12 13:46 [Qemu-devel] [PATCH 00/20] Block device mirroring enhancements, 12-12-12 edition Paolo Bonzini
` (11 preceding siblings ...)
2012-12-12 13:46 ` [Qemu-devel] [PATCH 12/20] mirror: support arbitrarily-sized iterations Paolo Bonzini
@ 2012-12-12 13:46 ` Paolo Bonzini
2012-12-14 22:54 ` Eric Blake
2012-12-12 13:46 ` [Qemu-devel] [PATCH 14/20] hbitmap: add hbitmap_alloc_with_data and hbitmap_required_size Paolo Bonzini
` (7 subsequent siblings)
20 siblings, 1 reply; 53+ messages in thread
From: Paolo Bonzini @ 2012-12-12 13:46 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, jcody, stefanha
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
osdep.h | 10 ++++++++++
oslib-posix.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++
oslib-win32.c | 59 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 116 insertions(+)
diff --git a/osdep.h b/osdep.h
index 87d3b9c..fb7f72c 100644
--- a/osdep.h
+++ b/osdep.h
@@ -134,6 +134,16 @@ void qemu_vfree(void *ptr);
#endif
+typedef struct QEMUMmapArea {
+ int fd;
+ void *mem;
+ size_t size;
+} QEMUMmapArea;
+
+int qemu_mmap_alloc(QEMUMmapArea *mm, const char *path, size_t size);
+int qemu_mmap_flush(QEMUMmapArea *mm);
+void qemu_mmap_free(QEMUMmapArea *mm);
+
int qemu_madvise(void *addr, size_t len, int advice);
int qemu_open(const char *name, int flags, ...);
diff --git a/oslib-posix.c b/oslib-posix.c
index 9db9c3d..f230f61 100644
--- a/oslib-posix.c
+++ b/oslib-posix.c
@@ -61,6 +61,7 @@ static int running_on_valgrind = -1;
#ifdef CONFIG_LINUX
#include <sys/syscall.h>
#endif
+#include <sys/mman.h>
int qemu_get_thread_id(void)
{
@@ -226,3 +227,49 @@ int qemu_utimens(const char *path, const struct timespec *times)
return utimes(path, &tv[0]);
}
+
+int qemu_mmap_alloc(QEMUMmapArea *mm, const char *path, size_t size)
+{
+ int fd = -1;
+ char *mem = NULL;
+ int save_errno;
+
+ fd = qemu_open(path, O_RDWR | O_CREAT, 0666);
+ if (fd < 0) {
+ goto fail;
+ }
+
+ if (ftruncate(fd, size) < 0) {
+ goto fail;
+ }
+
+ mem = mmap(NULL, size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
+ if (!mem) {
+ goto fail;
+ }
+
+ mm->fd = fd;
+ mm->mem = mem;
+ mm->size = size;
+ return 0;
+
+fail:
+ save_errno = errno;
+ if (fd >= 0) {
+ close(fd);
+ unlink(path);
+ }
+ return -save_errno;
+}
+
+int qemu_mmap_flush(QEMUMmapArea *mm)
+{
+ int rc = msync(mm->mem, mm->size, MS_SYNC);
+ return rc < 0 ? -errno : 0;
+}
+
+void qemu_mmap_free(QEMUMmapArea *mm)
+{
+ munmap(mm->mem, mm->size);
+ close(mm->fd);
+}
diff --git a/oslib-win32.c b/oslib-win32.c
index 51b33e8..3885a31 100644
--- a/oslib-win32.c
+++ b/oslib-win32.c
@@ -150,3 +150,62 @@ int qemu_get_thread_id(void)
{
return GetCurrentThreadId();
}
+
+int qemu_mmap_alloc(QEMUMmapArea *mm, const char *path, size_t size)
+{
+ int fd = -1;
+ char *mem = NULL;
+
+ HANDLE hFile, hMapping;
+ int save_errno;
+
+ fd = qemu_open(path, O_RDWR | O_CREAT, 0666);
+ if (fd < 0) {
+ goto fail;
+ }
+
+ hFile = (HANDLE)_get_osfhandle(fd);
+ if (ftruncate(fd, size) < 0) {
+ goto fail;
+ }
+
+ hMapping = CreateFileMapping(hFile, NULL, PAGE_EXECUTE_READWRITE,
+ 0, 0, NULL);
+ if (!hMapping) {
+ goto fail;
+ }
+
+ mem = MapViewOfFileEx(hMapping, FILE_MAP_WRITE, 0, 0, size, NULL);
+ CloseHandle(hMapping);
+ if (mem) {
+ goto fail;
+ }
+
+ mm->fd = fd;
+ mm->mem = mem;
+ mm->size = size;
+ return 0;
+
+fail:
+ save_errno = errno;
+ if (fd >= 0) {
+ close(fd);
+ unlink(path);
+ }
+ return -save_errno;
+}
+
+int qemu_mmap_flush(QEMUMmapArea *mm)
+{
+ int rc;
+
+ FlushViewOfFile(mm->mem, mm->size);
+ rc = qemu_fdatasync(mm->fd);
+ return rc < 0 ? -errno : 0;
+}
+
+void qemu_mmap_free(QEMUMmapArea *mm)
+{
+ UnmapViewOfFile(mm->mem);
+ close(mm->fd);
+}
--
1.8.0.1
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [Qemu-devel] [PATCH 14/20] hbitmap: add hbitmap_alloc_with_data and hbitmap_required_size
2012-12-12 13:46 [Qemu-devel] [PATCH 00/20] Block device mirroring enhancements, 12-12-12 edition Paolo Bonzini
` (12 preceding siblings ...)
2012-12-12 13:46 ` [Qemu-devel] [PATCH 13/20] oslib: add a wrapper for mmap/munmap Paolo Bonzini
@ 2012-12-12 13:46 ` Paolo Bonzini
2012-12-17 17:14 ` Eric Blake
2012-12-12 13:46 ` [Qemu-devel] [PATCH 15/20] hbitmap: add hbitmap_copy Paolo Bonzini
` (6 subsequent siblings)
20 siblings, 1 reply; 53+ messages in thread
From: Paolo Bonzini @ 2012-12-12 13:46 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, jcody, stefanha
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
hbitmap.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++----------
hbitmap.h | 25 +++++++++++++++++++++++++
2 files changed, 72 insertions(+), 10 deletions(-)
diff --git a/hbitmap.c b/hbitmap.c
index ae59f39..aa4586f 100644
--- a/hbitmap.c
+++ b/hbitmap.c
@@ -81,6 +81,9 @@ struct HBitmap {
*/
int granularity;
+ /* True if the last level should be freed by hbitmap_free. */
+ bool last_level_allocated;
+
/* A number of progressively less coarse bitmaps (i.e. level 0 is the
* coarsest). Each bit in level N represents a word in level N+1 that
* has a set bit, except the last level where each bit represents the
@@ -367,34 +370,68 @@ bool hbitmap_get(const HBitmap *hb, uint64_t item)
void hbitmap_free(HBitmap *hb)
{
- unsigned i;
- for (i = HBITMAP_LEVELS; i-- > 0; ) {
+ unsigned i = HBITMAP_LEVELS;
+
+ if (!hb->last_level_allocated) {
+ i--;
+ }
+
+ while (i-- > 0) {
g_free(hb->levels[i]);
}
g_free(hb);
}
-HBitmap *hbitmap_alloc(uint64_t size, int granularity)
+static uint64_t hbitmap_round_size(uint64_t size, int granularity)
{
- HBitmap *hb = g_malloc0(sizeof(struct HBitmap));
- unsigned i;
-
assert(granularity >= 0 && granularity < 64);
+
+ if (!size) {
+ size = 1;
+ }
+
size = (size + (1ULL << granularity) - 1) >> granularity;
assert(size <= ((uint64_t)1 << HBITMAP_LOG_MAX_SIZE));
+ return size;
+}
- hb->size = size;
+size_t hbitmap_required_size(uint64_t size, int granularity)
+{
+ size_t longs;
+
+ size = hbitmap_round_size(size, granularity);
+ longs = (size + BITS_PER_LONG - 1) >> BITS_PER_LEVEL;
+ return longs * sizeof(unsigned long);
+}
+
+HBitmap *hbitmap_alloc_with_data(uint64_t size, int granularity, void *data)
+{
+ HBitmap *hb = g_malloc0(sizeof(struct HBitmap));
+ unsigned i;
+
+ hb->size = hbitmap_round_size(size, granularity);
hb->granularity = granularity;
+ hb->last_level_allocated = (data == NULL);
+
for (i = HBITMAP_LEVELS; i-- > 0; ) {
- size = MAX((size + BITS_PER_LONG - 1) >> BITS_PER_LEVEL, 1);
- hb->levels[i] = g_malloc0(size * sizeof(unsigned long));
+ if (data == NULL) {
+ data = g_malloc0(hbitmap_required_size(size, granularity));
+ }
+ hb->levels[i] = data;
+ data = NULL;
+ granularity += BITS_PER_LEVEL;
}
/* We necessarily have free bits in level 0 due to the definition
* of HBITMAP_LEVELS, so use one for a sentinel. This speeds up
* hbitmap_iter_skip_words.
*/
- assert(size == 1);
+ assert(hbitmap_required_size(size, granularity) == sizeof(unsigned long));
hb->levels[0][0] |= 1UL << (BITS_PER_LONG - 1);
return hb;
}
+
+HBitmap *hbitmap_alloc(uint64_t size, int granularity)
+{
+ return hbitmap_alloc_with_data(size, granularity, NULL);
+}
diff --git a/hbitmap.h b/hbitmap.h
index 7ddfb66..35ee51a 100644
--- a/hbitmap.h
+++ b/hbitmap.h
@@ -64,6 +64,31 @@ struct HBitmapIter {
HBitmap *hbitmap_alloc(uint64_t size, int granularity);
/**
+ * hbitmap_required_size:
+ * @size: Number of bits in the bitmap.
+ * @granularity: Granularity of the bitmap.
+ *
+ * Return the number of bytes that are needed for a bitmap with the
+ * given size and granularity.
+ *
+ * A block of this size can then be passed to @hbitmap_alloc_with_data.
+ */
+size_t hbitmap_required_size(uint64_t size, int granularity);
+
+/**
+ * hbitmap_alloc_with_data:
+ * @size: Number of bits in the bitmap.
+ * @granularity: Granularity of the bitmap.
+ * @data: Pointer to a data block that will be used for the bottom level
+ * of the HBitmap.
+ *
+ * Allocate a new HBitmap, using a client-provided data block for the
+ * actual bitmap and allocating memory only for the compressed levels.
+ * If @data is NULL, this function is equivalent to @hbitmap_alloc.
+ */
+HBitmap *hbitmap_alloc_with_data(uint64_t size, int granularity, void *data);
+
+/**
* hbitmap_empty:
* @hb: HBitmap to operate on.
*
--
1.8.0.1
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [Qemu-devel] [PATCH 15/20] hbitmap: add hbitmap_copy
2012-12-12 13:46 [Qemu-devel] [PATCH 00/20] Block device mirroring enhancements, 12-12-12 edition Paolo Bonzini
` (13 preceding siblings ...)
2012-12-12 13:46 ` [Qemu-devel] [PATCH 14/20] hbitmap: add hbitmap_alloc_with_data and hbitmap_required_size Paolo Bonzini
@ 2012-12-12 13:46 ` Paolo Bonzini
2012-12-17 18:25 ` Eric Blake
2012-12-12 13:46 ` [Qemu-devel] [PATCH 16/20] block: split bdrv_enable_dirty_tracking and bdrv_disable_dirty_tracking Paolo Bonzini
` (5 subsequent siblings)
20 siblings, 1 reply; 53+ messages in thread
From: Paolo Bonzini @ 2012-12-12 13:46 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, jcody, stefanha
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
hbitmap.c | 48 +++++++++++++++++++++++++++++++++++++
hbitmap.h | 11 +++++++++
tests/test-hbitmap.c | 67 ++++++++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 126 insertions(+)
diff --git a/hbitmap.c b/hbitmap.c
index aa4586f..99c749f 100644
--- a/hbitmap.c
+++ b/hbitmap.c
@@ -368,6 +368,54 @@ bool hbitmap_get(const HBitmap *hb, uint64_t item)
return (hb->levels[HBITMAP_LEVELS - 1][pos >> BITS_PER_LEVEL] & bit) != 0;
}
+void hbitmap_copy(HBitmap *dst, HBitmap *src)
+{
+ HBitmapIter dst_iter, src_iter;
+ size_t dst_pos, src_pos, old_dst_pos;
+ unsigned long dst_cur, src_cur;
+
+ assert(dst->granularity == src->granularity);
+ assert(dst->size == src->size);
+
+ hbitmap_iter_init(&dst_iter, dst, 0);
+ hbitmap_iter_init(&src_iter, src, 0);
+
+ dst_pos = hbitmap_iter_next_word(&dst_iter, &dst_cur);
+ src_pos = hbitmap_iter_next_word(&src_iter, &src_cur);
+
+ while (dst_pos != -1) {
+ if (dst_pos < src_pos) {
+ /* Clear all words up to src_pos. Do it one by one,
+ * on the assumption that dst is sparse */
+ old_dst_pos = dst_pos;
+ dst_pos = hbitmap_iter_next_word(&dst_iter, &dst_cur);
+ dst->levels[HBITMAP_LEVELS - 1][old_dst_pos] = 0;
+ hb_reset_between(dst, HBITMAP_LEVELS - 2, old_dst_pos, old_dst_pos);
+ continue;
+ }
+
+ /* Copy one word from source to destination. Update the
+ * upper levels if the corresponding word in dst was zero.
+ */
+ dst->levels[HBITMAP_LEVELS - 1][src_pos] = src_cur;
+ if (dst_pos > src_pos) {
+ hb_set_between(dst, HBITMAP_LEVELS - 2, src_pos, src_pos);
+ } else {
+ dst_pos = hbitmap_iter_next_word(&dst_iter, &dst_cur);
+ }
+ src_pos = hbitmap_iter_next_word(&src_iter, &src_cur);
+ }
+
+ /* Copy everything past the last 'one' in dst. */
+ while (src_pos != -1) {
+ dst->levels[HBITMAP_LEVELS - 1][src_pos] = src_cur;
+ hb_set_between(dst, HBITMAP_LEVELS - 2, src_pos, src_pos);
+ src_pos = hbitmap_iter_next_word(&src_iter, &src_cur);
+ }
+
+ dst->count = src->count;
+}
+
void hbitmap_free(HBitmap *hb)
{
unsigned i = HBITMAP_LEVELS;
diff --git a/hbitmap.h b/hbitmap.h
index 35ee51a..690e252 100644
--- a/hbitmap.h
+++ b/hbitmap.h
@@ -229,4 +229,15 @@ static inline size_t hbitmap_iter_next_word(HBitmapIter *hbi, unsigned long *p_c
}
+/**
+ * hbitmap_copy:
+ * @dest: HBitmap to copy to.
+ * @src: HBitmap to copy from.
+ *
+ * Copy from one HBitmap to another, efficiently skipping zero
+ * areas in both the source and the destination. The two bitmaps
+ * must have the same size and granularity.
+ */
+void hbitmap_copy(HBitmap *dest, HBitmap *src);
+
#endif
diff --git a/tests/test-hbitmap.c b/tests/test-hbitmap.c
index b5d76a7..d6a93c2 100644
--- a/tests/test-hbitmap.c
+++ b/tests/test-hbitmap.c
@@ -24,6 +24,7 @@ typedef struct TestHBitmapData {
unsigned long *bits;
size_t size;
int granularity;
+ int id;
} TestHBitmapData;
@@ -329,6 +330,52 @@ static void test_hbitmap_reset(TestHBitmapData *data,
hbitmap_test_set(data, L3 / 2, L3);
}
+static void test_hbitmap_copy(TestHBitmapData *data,
+ const void *p_id)
+{
+ int id = *(int *)p_id;
+ HBitmap *aux;
+
+ hbitmap_test_init(data, L3, 0);
+ aux = hbitmap_alloc(L3, 0);
+
+ /* Add some bits to data. */
+ if (id < 1 || id > 7) {
+ hbitmap_test_set(data, 2, 2);
+ }
+ if (id < 2 || id > 6) {
+ hbitmap_test_set(data, L1 * 3, L1 * 2);
+ }
+ if (id < 3 || id > 5) {
+ hbitmap_test_set(data, L1 * 7, 4);
+ }
+ if (id < 4) {
+ hbitmap_test_set(data, L1 * 10, 4);
+ }
+
+ /* Add some bits to the copy destination. */
+ if (id < 5) {
+ hbitmap_set(aux, 0, 1);
+ }
+ if (id < 6) {
+ hbitmap_set(aux, L1 * 4, 2);
+ }
+ if (id < 7) {
+ hbitmap_set(aux, L1 * 7, 4);
+ }
+
+ hbitmap_copy(aux, data->hb);
+ hbitmap_test_check(data, 0);
+
+ /* Swap the two, the new data bitmap must still be the same as the
+ * check vector.
+ */
+ hbitmap_free(data->hb);
+ data->hb = aux;
+
+ hbitmap_test_check(data, 0);
+}
+
static void test_hbitmap_granularity(TestHBitmapData *data,
const void *unused)
{
@@ -375,6 +422,17 @@ static void test_hbitmap_iter_granularity(TestHBitmapData *data,
g_assert_cmpint(hbitmap_iter_next(&hbi), <, 0);
}
+static void hbitmap_test_add_with_id(const char *testpath,
+ void (*test_func)(TestHBitmapData *data, const void *user_data),
+ int id)
+{
+ int *p_id = g_malloc0(sizeof(int));
+ *p_id = id;
+
+ g_test_add(testpath, TestHBitmapData, p_id, NULL, test_func,
+ hbitmap_test_teardown);
+}
+
static void hbitmap_test_add(const char *testpath,
void (*test_func)(TestHBitmapData *data, const void *user_data))
{
@@ -402,6 +460,15 @@ int main(int argc, char **argv)
hbitmap_test_add("/hbitmap/reset/empty", test_hbitmap_reset_empty);
hbitmap_test_add("/hbitmap/reset/general", test_hbitmap_reset);
hbitmap_test_add("/hbitmap/granularity", test_hbitmap_granularity);
+ hbitmap_test_add_with_id("/hbitmap/copy/0", test_hbitmap_copy, 0);
+ hbitmap_test_add_with_id("/hbitmap/copy/1", test_hbitmap_copy, 1);
+ hbitmap_test_add_with_id("/hbitmap/copy/2", test_hbitmap_copy, 2);
+ hbitmap_test_add_with_id("/hbitmap/copy/3", test_hbitmap_copy, 3);
+ hbitmap_test_add_with_id("/hbitmap/copy/4", test_hbitmap_copy, 4);
+ hbitmap_test_add_with_id("/hbitmap/copy/5", test_hbitmap_copy, 5);
+ hbitmap_test_add_with_id("/hbitmap/copy/6", test_hbitmap_copy, 6);
+ hbitmap_test_add_with_id("/hbitmap/copy/7", test_hbitmap_copy, 7);
+ hbitmap_test_add_with_id("/hbitmap/copy/8", test_hbitmap_copy, 8);
g_test_run();
return 0;
--
1.8.0.1
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [Qemu-devel] [PATCH 16/20] block: split bdrv_enable_dirty_tracking and bdrv_disable_dirty_tracking
2012-12-12 13:46 [Qemu-devel] [PATCH 00/20] Block device mirroring enhancements, 12-12-12 edition Paolo Bonzini
` (14 preceding siblings ...)
2012-12-12 13:46 ` [Qemu-devel] [PATCH 15/20] hbitmap: add hbitmap_copy Paolo Bonzini
@ 2012-12-12 13:46 ` Paolo Bonzini
2012-12-20 18:26 ` Eric Blake
2012-12-12 13:46 ` [Qemu-devel] [PATCH 17/20] block: support a persistent dirty bitmap Paolo Bonzini
` (4 subsequent siblings)
20 siblings, 1 reply; 53+ messages in thread
From: Paolo Bonzini @ 2012-12-12 13:46 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, jcody, stefanha
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
block-migration.c | 17 +++--------------
block.c | 22 +++++++++++-----------
block.h | 3 ++-
block/mirror.c | 4 ++--
4 files changed, 18 insertions(+), 28 deletions(-)
diff --git a/block-migration.c b/block-migration.c
index 1e1d25e..bf995f5 100644
--- a/block-migration.c
+++ b/block-migration.c
@@ -260,15 +260,6 @@ static int mig_save_device_bulk(QEMUFile *f, BlkMigDevState *bmds)
return (bmds->cur_sector >= total_sectors);
}
-static void set_dirty_tracking(int enable)
-{
- BlkMigDevState *bmds;
-
- QSIMPLEQ_FOREACH(bmds, &block_mig_state.bmds_list, entry) {
- bdrv_set_dirty_tracking(bmds->bs, enable ? BLOCK_SIZE : 0);
- }
-}
-
static void init_blk_migration_it(void *opaque, BlockDriverState *bs)
{
BlkMigDevState *bmds;
@@ -289,6 +280,7 @@ static void init_blk_migration_it(void *opaque, BlockDriverState *bs)
alloc_aio_bitmap(bmds);
drive_get_ref(drive_get_by_blockdev(bs));
bdrv_set_in_use(bs, 1);
+ bdrv_enable_dirty_tracking(bs, BLOCK_SIZE);
block_mig_state.total_sector_sum += sectors;
@@ -527,10 +519,9 @@ static void blk_mig_cleanup(void)
bdrv_drain_all();
- set_dirty_tracking(0);
-
while ((bmds = QSIMPLEQ_FIRST(&block_mig_state.bmds_list)) != NULL) {
QSIMPLEQ_REMOVE_HEAD(&block_mig_state.bmds_list, entry);
+ bdrv_disable_dirty_tracking(bmds->bs);
bdrv_set_in_use(bmds->bs, 0);
drive_put_ref(drive_get_by_blockdev(bmds->bs));
g_free(bmds->aio_bitmap);
@@ -556,10 +547,8 @@ static int block_save_setup(QEMUFile *f, void *opaque)
DPRINTF("Enter save live setup submitted %d transferred %d\n",
block_mig_state.submitted, block_mig_state.transferred);
- init_blk_migration(f);
-
/* start track dirty blocks */
- set_dirty_tracking(1);
+ init_blk_migration(f);
ret = flush_blks(f);
if (ret) {
diff --git a/block.c b/block.c
index 09f9ca9..66e30aa 100644
--- a/block.c
+++ b/block.c
@@ -4220,22 +4220,22 @@ void *qemu_blockalign(BlockDriverState *bs, size_t size)
return qemu_memalign((bs && bs->buffer_alignment) ? bs->buffer_alignment : 512, size);
}
-void bdrv_set_dirty_tracking(BlockDriverState *bs, int granularity)
+void bdrv_enable_dirty_tracking(BlockDriverState *bs, int granularity)
{
int64_t bitmap_size;
assert((granularity & (granularity - 1)) == 0);
+ granularity >>= BDRV_SECTOR_BITS;
+ assert(!bs->dirty_bitmap);
+ bitmap_size = (bdrv_getlength(bs) >> BDRV_SECTOR_BITS);
+ bs->dirty_bitmap = hbitmap_alloc(bitmap_size, ffs(granularity) - 1);
+}
- if (granularity) {
- granularity >>= BDRV_SECTOR_BITS;
- assert(!bs->dirty_bitmap);
- bitmap_size = (bdrv_getlength(bs) >> BDRV_SECTOR_BITS);
- bs->dirty_bitmap = hbitmap_alloc(bitmap_size, ffs(granularity) - 1);
- } else {
- if (bs->dirty_bitmap) {
- hbitmap_free(bs->dirty_bitmap);
- bs->dirty_bitmap = NULL;
- }
+void bdrv_disable_dirty_tracking(BlockDriverState *bs)
+{
+ if (bs->dirty_bitmap) {
+ hbitmap_free(bs->dirty_bitmap);
+ bs->dirty_bitmap = NULL;
}
}
diff --git a/block.h b/block.h
index b57896a..78a39fb 100644
--- a/block.h
+++ b/block.h
@@ -355,7 +355,8 @@ void bdrv_set_buffer_alignment(BlockDriverState *bs, int align);
void *qemu_blockalign(BlockDriverState *bs, size_t size);
struct HBitmapIter;
-void bdrv_set_dirty_tracking(BlockDriverState *bs, int granularity);
+void bdrv_enable_dirty_tracking(BlockDriverState *bs, int granularity);
+void bdrv_disable_dirty_tracking(BlockDriverState *bs);
int bdrv_get_dirty(BlockDriverState *bs, int64_t sector);
void bdrv_set_dirty(BlockDriverState *bs, int64_t cur_sector, int nr_sectors);
void bdrv_reset_dirty(BlockDriverState *bs, int64_t cur_sector, int nr_sectors);
diff --git a/block/mirror.c b/block/mirror.c
index 23f87d2..99c5bd1 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -467,7 +467,7 @@ immediate_exit:
g_free(s->buf);
g_free(s->cow_bitmap);
g_free(s->in_flight_bitmap);
- bdrv_set_dirty_tracking(bs, 0);
+ bdrv_disable_dirty_tracking(bs);
bdrv_iostatus_disable(s->target);
if (s->should_complete && ret == 0) {
if (bdrv_get_flags(s->target) != bdrv_get_flags(s->common.bs)) {
@@ -570,7 +570,7 @@ void mirror_start(BlockDriverState *bs, BlockDriverState *target,
s->granularity = granularity;
s->buf_size = MAX(buf_size, granularity);
- bdrv_set_dirty_tracking(bs, granularity);
+ bdrv_enable_dirty_tracking(bs, granularity);
bdrv_set_enable_write_cache(s->target, true);
bdrv_set_on_error(s->target, on_target_error, on_target_error);
bdrv_iostatus_enable(s->target);
--
1.8.0.1
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [Qemu-devel] [PATCH 17/20] block: support a persistent dirty bitmap
2012-12-12 13:46 [Qemu-devel] [PATCH 00/20] Block device mirroring enhancements, 12-12-12 edition Paolo Bonzini
` (15 preceding siblings ...)
2012-12-12 13:46 ` [Qemu-devel] [PATCH 16/20] block: split bdrv_enable_dirty_tracking and bdrv_disable_dirty_tracking Paolo Bonzini
@ 2012-12-12 13:46 ` Paolo Bonzini
2012-12-20 23:03 ` Eric Blake
2012-12-12 13:46 ` [Qemu-devel] [PATCH 18/20] mirror: add support for " Paolo Bonzini
` (3 subsequent siblings)
20 siblings, 1 reply; 53+ messages in thread
From: Paolo Bonzini @ 2012-12-12 13:46 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, jcody, stefanha
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
block.c | 133 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
block.h | 5 +++
block_int.h | 5 +++
3 files changed, 141 insertions(+), 2 deletions(-)
diff --git a/block.c b/block.c
index 66e30aa..d993d0b 100644
--- a/block.c
+++ b/block.c
@@ -1274,6 +1274,7 @@ static void bdrv_move_feature_fields(BlockDriverState *bs_dest,
/* dirty bitmap */
bs_dest->dirty_bitmap = bs_src->dirty_bitmap;
+ bs_dest->dirty_usage = bs_src->dirty_usage;
/* job */
bs_dest->in_use = bs_src->in_use;
@@ -3973,6 +3974,16 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
}
}
+ /* The persistent bitmap is flushed synchronously; we may want to
+ * move this to a thread pool.
+ */
+ if (bs->persistent_bitmap) {
+ ret = qemu_mmap_flush(&bs->persistent_mmap);
+ if (ret < 0) {
+ return ret;
+ }
+ }
+
/* But don't actually force it to the disk with cache=unsafe */
if (bs->open_flags & BDRV_O_NO_FLUSH) {
goto flush_parent;
@@ -4223,9 +4234,17 @@ void *qemu_blockalign(BlockDriverState *bs, size_t size)
void bdrv_enable_dirty_tracking(BlockDriverState *bs, int granularity)
{
int64_t bitmap_size;
+ int granularity_bits;
assert((granularity & (granularity - 1)) == 0);
granularity >>= BDRV_SECTOR_BITS;
+ granularity_bits = ffs(granularity) - 1;
+ bs->dirty_usage++;
+ if (bs->dirty_usage != 1) {
+ assert(granularity_bits == hbitmap_granularity(bs->dirty_bitmap));
+ return;
+ }
+
assert(!bs->dirty_bitmap);
bitmap_size = (bdrv_getlength(bs) >> BDRV_SECTOR_BITS);
bs->dirty_bitmap = hbitmap_alloc(bitmap_size, ffs(granularity) - 1);
@@ -4233,9 +4252,96 @@ void bdrv_enable_dirty_tracking(BlockDriverState *bs, int granularity)
void bdrv_disable_dirty_tracking(BlockDriverState *bs)
{
+ if (bs->dirty_usage == 0) {
+ return;
+ }
+
+ bs->dirty_usage--;
+ if (bs->dirty_usage != 0) {
+ return;
+ }
+
+ hbitmap_free(bs->dirty_bitmap);
+ bs->dirty_bitmap = NULL;
+ bdrv_disable_persistent_dirty_tracking(bs);
+}
+
+void bdrv_enable_persistent_dirty_tracking(BlockDriverState *bs, const char *file,
+ Error **errp)
+{
+ int rc;
+ int granularity_bits;
+ int64_t bitmap_size;
+ size_t file_size;
+ QEMUMmapArea mm;
+ HBitmap *new_bitmap;
+ bool load;
+
+ assert(bs->dirty_usage > 0);
+
+ granularity_bits = hbitmap_granularity(bs->dirty_bitmap);
+ bitmap_size = (bdrv_getlength(bs) >> BDRV_SECTOR_BITS);
+ file_size = hbitmap_required_size(bitmap_size, granularity_bits);
+
+ load = access(file, R_OK) >= 0;
+ if (bdrv_in_use(bs) && load) {
+ error_set(errp, QERR_DEVICE_IN_USE, bdrv_get_device_name(bs));
+ return;
+ }
+
+ /*
+ * Do not touch fields in BS until we're sure we can complete
+ * successfully.
+ */
+ rc = qemu_mmap_alloc(&mm, file, file_size);
+ if (rc < 0) {
+ error_set(errp, QERR_OPEN_FILE_FAILED, file);
+ return;
+ }
+
+ new_bitmap = hbitmap_alloc_with_data(bitmap_size, granularity_bits, mm.mem);
+ if (!load) {
+ hbitmap_copy(new_bitmap, bs->dirty_bitmap);
+ rc = qemu_mmap_flush(&mm);
+ if (rc < 0) {
+ error_set(errp, QERR_OPEN_FILE_FAILED, file);
+ hbitmap_free(new_bitmap);
+ qemu_mmap_free(&mm);
+ return;
+ }
+ }
+
+ /*
+ * Now we can complete the switch from the old to the new bitmap.
+ */
+ if (bs->persistent_bitmap) {
+ bdrv_disable_persistent_dirty_tracking(bs);
+ }
+
+ bs->persistent_mmap = mm;
+ bs->persistent_bitmap = new_bitmap;
+ if (load) {
+ hbitmap_copy(bs->dirty_bitmap, new_bitmap);
+ }
+}
+
+void bdrv_disable_persistent_dirty_tracking(BlockDriverState *bs)
+{
+ if (!bs->persistent_bitmap) {
+ return;
+ }
+
+ hbitmap_free(bs->persistent_bitmap);
+ bs->persistent_bitmap = NULL;
+ qemu_mmap_free(&bs->persistent_mmap);
+}
+
+int bdrv_get_dirty_tracking_granularity(BlockDriverState *bs)
+{
if (bs->dirty_bitmap) {
- hbitmap_free(bs->dirty_bitmap);
- bs->dirty_bitmap = NULL;
+ return BDRV_SECTOR_SIZE << hbitmap_granularity(bs->dirty_bitmap);
+ } else {
+ return 0;
}
}
@@ -4257,6 +4363,9 @@ void bdrv_set_dirty(BlockDriverState *bs, int64_t cur_sector,
int nr_sectors)
{
hbitmap_set(bs->dirty_bitmap, cur_sector, nr_sectors);
+ if (bs->persistent_bitmap) {
+ hbitmap_set(bs->persistent_bitmap, cur_sector, nr_sectors);
+ }
}
void bdrv_reset_dirty(BlockDriverState *bs, int64_t cur_sector,
@@ -4265,6 +4374,26 @@ void bdrv_reset_dirty(BlockDriverState *bs, int64_t cur_sector,
hbitmap_reset(bs->dirty_bitmap, cur_sector, nr_sectors);
}
+int bdrv_flush_dirty_tracking(BlockDriverState *bs, bool set)
+{
+ /* Because of the way bdrv_reset_dirty and bdrv_set_dirty work,
+ * bs->persistent_bitmap will always be a subset of bs->dirty_bitmap,
+ * and this hbitmap_copy will only remove bits rather than add them.
+ *
+ * Because no bits are added to the persistent bitmap, the on-disk
+ * bitmap is still a conservative approximation of the in-memory
+ * bitmap; we do not need to flush to disk, the OS will take care of
+ * writeback in due time.
+ */
+ if (bs->persistent_bitmap) {
+ hbitmap_copy(bs->persistent_bitmap, bs->dirty_bitmap);
+ if (set) {
+ return qemu_mmap_flush(&bs->persistent_mmap);
+ }
+ }
+ return 0;
+}
+
int64_t bdrv_get_dirty_count(BlockDriverState *bs)
{
if (bs->dirty_bitmap) {
diff --git a/block.h b/block.h
index 78a39fb..1eb4817 100644
--- a/block.h
+++ b/block.h
@@ -363,6 +363,11 @@ void bdrv_reset_dirty(BlockDriverState *bs, int64_t cur_sector, int nr_sectors);
void bdrv_dirty_iter_init(BlockDriverState *bs, struct HBitmapIter *hbi);
int64_t bdrv_get_dirty_count(BlockDriverState *bs);
+int bdrv_get_dirty_tracking_granularity(BlockDriverState *bs);
+void bdrv_enable_persistent_dirty_tracking(BlockDriverState *bs, const char *file, Error **errp);
+void bdrv_disable_persistent_dirty_tracking(BlockDriverState *bs);
+int bdrv_flush_dirty_tracking(BlockDriverState *bs, bool set);
+
void bdrv_enable_copy_on_read(BlockDriverState *bs);
void bdrv_disable_copy_on_read(BlockDriverState *bs);
diff --git a/block_int.h b/block_int.h
index 8f54b41..ff96351 100644
--- a/block_int.h
+++ b/block_int.h
@@ -270,7 +270,12 @@ struct BlockDriverState {
bool iostatus_enabled;
BlockDeviceIoStatus iostatus;
char device_name[32];
+
HBitmap *dirty_bitmap;
+ int dirty_usage;
+ QEMUMmapArea persistent_mmap;
+ HBitmap *persistent_bitmap;
+
int in_use; /* users other than guest access, eg. block migration */
QTAILQ_ENTRY(BlockDriverState) list;
--
1.8.0.1
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [Qemu-devel] [PATCH 18/20] mirror: add support for persistent dirty bitmap
2012-12-12 13:46 [Qemu-devel] [PATCH 00/20] Block device mirroring enhancements, 12-12-12 edition Paolo Bonzini
` (16 preceding siblings ...)
2012-12-12 13:46 ` [Qemu-devel] [PATCH 17/20] block: support a persistent dirty bitmap Paolo Bonzini
@ 2012-12-12 13:46 ` Paolo Bonzini
2012-12-20 23:49 ` Eric Blake
2012-12-12 13:46 ` [Qemu-devel] [PATCH 19/20] block: choose the default dirty bitmap granularity in bdrv_enable_dirty_tracking Paolo Bonzini
` (2 subsequent siblings)
20 siblings, 1 reply; 53+ messages in thread
From: Paolo Bonzini @ 2012-12-12 13:46 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, jcody, stefanha
A persistent dirty bitmap lets mirroring proceed with no waste of work
after QEMU exits (either offline, or online when the VM restarts).
It is also useful in order to communicate to management whether the
switch to the destination was completed or not.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
block/mirror.c | 15 +++++++++++++++
1 file changed, 15 insertions(+)
diff --git a/block/mirror.c b/block/mirror.c
index 99c5bd1..3d37396 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -351,6 +351,18 @@ static void coroutine_fn mirror_run(void *opaque)
}
}
+ /*
+ * Ensure the set bits have reached the persistent dirty bitmap before
+ * moving on. This covers the case where no I/O happens between the
+ * beginning of mirroring and block-job-complete, but there were indeed
+ * some dirty sectors. In this case, the persistent dirty bitmap could end
+ * up staying all-zeroes all the time!
+ */
+ ret = bdrv_flush_dirty_tracking(bs, true);
+ if (ret < 0) {
+ goto immediate_exit;
+ }
+
bdrv_dirty_iter_init(bs, &s->hbi);
last_pause_ns = qemu_get_clock_ns(rt_clock);
for (;;) {
@@ -392,6 +404,9 @@ static void coroutine_fn mirror_run(void *opaque)
break;
}
} else {
+ ret = bdrv_flush_dirty_tracking(bs, false);
+ assert(ret >= 0);
+
/* We're out of the streaming phase. From now on, if the job
* is cancelled we will actually complete all pending I/O and
* report completion. This way, block-job-cancel will leave
--
1.8.0.1
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [Qemu-devel] [PATCH 19/20] block: choose the default dirty bitmap granularity in bdrv_enable_dirty_tracking
2012-12-12 13:46 [Qemu-devel] [PATCH 00/20] Block device mirroring enhancements, 12-12-12 edition Paolo Bonzini
` (17 preceding siblings ...)
2012-12-12 13:46 ` [Qemu-devel] [PATCH 18/20] mirror: add support for " Paolo Bonzini
@ 2012-12-12 13:46 ` Paolo Bonzini
2012-12-20 23:53 ` Eric Blake
2012-12-12 13:46 ` [Qemu-devel] [PATCH 20/20] monitor: add commands to start/stop dirty bitmap Paolo Bonzini
2013-01-14 13:02 ` [Qemu-devel] [PATCH 00/20] Block device mirroring enhancements, 12-12-12 edition Stefan Hajnoczi
20 siblings, 1 reply; 53+ messages in thread
From: Paolo Bonzini @ 2012-12-12 13:46 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, jcody, stefanha
This is useful because we want the same logic in the monitor commands
that enable the persistent dirty bitmap.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
block-migration.c | 2 +-
block.c | 46 +++++++++++++++++++++++++++++++++++++---------
block.h | 3 ++-
block/mirror.c | 30 ++++++++++++++----------------
blockdev.c | 11 +----------
5 files changed, 55 insertions(+), 37 deletions(-)
diff --git a/block-migration.c b/block-migration.c
index bf995f5..948b56b 100644
--- a/block-migration.c
+++ b/block-migration.c
@@ -280,7 +280,7 @@ static void init_blk_migration_it(void *opaque, BlockDriverState *bs)
alloc_aio_bitmap(bmds);
drive_get_ref(drive_get_by_blockdev(bs));
bdrv_set_in_use(bs, 1);
- bdrv_enable_dirty_tracking(bs, BLOCK_SIZE);
+ bdrv_enable_dirty_tracking(bs, BLOCK_SIZE, NULL);
block_mig_state.total_sector_sum += sectors;
diff --git a/block.c b/block.c
index d993d0b..2ea91ff 100644
--- a/block.c
+++ b/block.c
@@ -4231,23 +4231,51 @@ void *qemu_blockalign(BlockDriverState *bs, size_t size)
return qemu_memalign((bs && bs->buffer_alignment) ? bs->buffer_alignment : 512, size);
}
-void bdrv_enable_dirty_tracking(BlockDriverState *bs, int granularity)
+void bdrv_enable_dirty_tracking(BlockDriverState *bs, int granularity,
+ Error **errp)
{
int64_t bitmap_size;
int granularity_bits;
- assert((granularity & (granularity - 1)) == 0);
- granularity >>= BDRV_SECTOR_BITS;
- granularity_bits = ffs(granularity) - 1;
- bs->dirty_usage++;
- if (bs->dirty_usage != 1) {
- assert(granularity_bits == hbitmap_granularity(bs->dirty_bitmap));
+ if (granularity == -1) {
+ if (bs->dirty_usage != 0) {
+ granularity = bdrv_get_dirty_tracking_granularity(bs);
+ } else {
+ /* Choose the default granularity based on the block device's cluster
+ * size, clamped between 4k and 64k. */
+ BlockDriverInfo bdi;
+ if (bdrv_get_info(bs, &bdi) >= 0 && bdi.cluster_size != 0) {
+ granularity = MAX(4096, bdi.cluster_size);
+ granularity = MIN(65536, granularity);
+ } else {
+ granularity = 65536;
+ }
+ }
+ }
+
+ if (granularity < 512 || granularity > 1048576 * 64) {
+ error_set(errp, QERR_INVALID_PARAMETER, bdrv_get_device_name(bs));
+ return;
+ }
+ if (granularity & (granularity - 1)) {
+ error_set(errp, QERR_INVALID_PARAMETER, bdrv_get_device_name(bs));
return;
}
- assert(!bs->dirty_bitmap);
+ granularity_bits = ffs(granularity) - 1 - BDRV_SECTOR_BITS;
+ if (bs->dirty_usage != 0) {
+ if (granularity_bits == hbitmap_granularity(bs->dirty_bitmap)) {
+ bs->dirty_usage++;
+ } else {
+ error_set(errp, QERR_INVALID_PARAMETER, bdrv_get_device_name(bs));
+ }
+ return;
+ }
+
+ bs->dirty_usage++;
+ assert(!bs->dirty_bitmap && !bs->persistent_bitmap);
bitmap_size = (bdrv_getlength(bs) >> BDRV_SECTOR_BITS);
- bs->dirty_bitmap = hbitmap_alloc(bitmap_size, ffs(granularity) - 1);
+ bs->dirty_bitmap = hbitmap_alloc(bitmap_size, granularity_bits);
}
void bdrv_disable_dirty_tracking(BlockDriverState *bs)
diff --git a/block.h b/block.h
index 1eb4817..fee6bb5 100644
--- a/block.h
+++ b/block.h
@@ -355,7 +355,8 @@ void bdrv_set_buffer_alignment(BlockDriverState *bs, int align);
void *qemu_blockalign(BlockDriverState *bs, size_t size);
struct HBitmapIter;
-void bdrv_enable_dirty_tracking(BlockDriverState *bs, int granularity);
+void bdrv_enable_dirty_tracking(BlockDriverState *bs, int granularity,
+ Error **errp);
void bdrv_disable_dirty_tracking(BlockDriverState *bs);
int bdrv_get_dirty(BlockDriverState *bs, int64_t sector);
void bdrv_set_dirty(BlockDriverState *bs, int64_t cur_sector, int nr_sectors);
diff --git a/block/mirror.c b/block/mirror.c
index 3d37396..cc47c8d 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -550,22 +550,9 @@ void mirror_start(BlockDriverState *bs, BlockDriverState *target,
BlockDriverCompletionFunc *cb,
void *opaque, Error **errp)
{
+ Error *local_err = NULL;
MirrorBlockJob *s;
- if (granularity == 0) {
- /* Choose the default granularity based on the target file's cluster
- * size, clamped between 4k and 64k. */
- BlockDriverInfo bdi;
- if (bdrv_get_info(target, &bdi) >= 0 && bdi.cluster_size != 0) {
- granularity = MAX(4096, bdi.cluster_size);
- granularity = MIN(65536, granularity);
- } else {
- granularity = 65536;
- }
- }
-
- assert((granularity & (granularity - 1)) == 0);
-
if ((on_source_error == BLOCKDEV_ON_ERROR_STOP ||
on_source_error == BLOCKDEV_ON_ERROR_ENOSPC) &&
!bdrv_iostatus_is_enabled(bs)) {
@@ -573,23 +560,34 @@ void mirror_start(BlockDriverState *bs, BlockDriverState *target,
return;
}
+ bdrv_enable_dirty_tracking(bs, granularity, &local_err);
+ if (local_err) {
+ error_propagate(errp, local_err);
+ return;
+ }
+
s = block_job_create(&mirror_job_type, bs, speed, cb, opaque, errp);
if (!s) {
- return;
+ goto fail;
}
s->on_source_error = on_source_error;
s->on_target_error = on_target_error;
s->target = target;
s->mode = mode;
+
+ granularity = bdrv_get_dirty_tracking_granularity(bs);
s->granularity = granularity;
s->buf_size = MAX(buf_size, granularity);
- bdrv_enable_dirty_tracking(bs, granularity);
bdrv_set_enable_write_cache(s->target, true);
bdrv_set_on_error(s->target, on_target_error, on_target_error);
bdrv_iostatus_enable(s->target);
s->common.co = qemu_coroutine_create(mirror_run);
trace_mirror_start(bs, s, s->common.co, opaque);
qemu_coroutine_enter(s->common.co, s);
+ return;
+
+fail:
+ bdrv_disable_dirty_tracking(bs);
}
diff --git a/blockdev.c b/blockdev.c
index dff6c3d..37e6743 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1222,21 +1222,12 @@ void qmp_drive_mirror(const char *device, const char *target,
mode = NEW_IMAGE_MODE_ABSOLUTE_PATHS;
}
if (!has_granularity) {
- granularity = 0;
+ granularity = -1;
}
if (!has_buf_size) {
buf_size = DEFAULT_MIRROR_BUF_SIZE;
}
- if (granularity != 0 && (granularity < 512 || granularity > 1048576 * 64)) {
- error_set(errp, QERR_INVALID_PARAMETER, device);
- return;
- }
- if (granularity & (granularity - 1)) {
- error_set(errp, QERR_INVALID_PARAMETER, device);
- return;
- }
-
bs = bdrv_find(device);
if (!bs) {
error_set(errp, QERR_DEVICE_NOT_FOUND, device);
--
1.8.0.1
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [Qemu-devel] [PATCH 20/20] monitor: add commands to start/stop dirty bitmap
2012-12-12 13:46 [Qemu-devel] [PATCH 00/20] Block device mirroring enhancements, 12-12-12 edition Paolo Bonzini
` (18 preceding siblings ...)
2012-12-12 13:46 ` [Qemu-devel] [PATCH 19/20] block: choose the default dirty bitmap granularity in bdrv_enable_dirty_tracking Paolo Bonzini
@ 2012-12-12 13:46 ` Paolo Bonzini
2012-12-21 18:30 ` Eric Blake
2013-01-14 13:02 ` [Qemu-devel] [PATCH 00/20] Block device mirroring enhancements, 12-12-12 edition Stefan Hajnoczi
20 siblings, 1 reply; 53+ messages in thread
From: Paolo Bonzini @ 2012-12-12 13:46 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, jcody, stefanha
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
blockdev.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++
blockdev.h | 1 +
hmp-commands.hx | 39 ++++++++++++++++++++++++++++++++++
hmp.c | 27 +++++++++++++++++++++++
hmp.h | 2 ++
qapi-schema.json | 56 ++++++++++++++++++++++++++++++++++++++++++++++++
qmp-commands.hx | 65 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
7 files changed, 245 insertions(+)
diff --git a/blockdev.c b/blockdev.c
index 37e6743..96e1b4e 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1323,6 +1323,61 @@ void qmp_drive_mirror(const char *device, const char *target,
drive_get_ref(drive_get_by_blockdev(bs));
}
+void qmp_blockdev_dirty_enable(const char *device, const char *file,
+ bool has_granularity, uint32_t granularity,
+ Error **errp)
+{
+ BlockDriverState *bs;
+ DriveInfo *drv;
+ Error *local_err = NULL;
+
+ bs = bdrv_find(device);
+ if (!bs) {
+ error_set(errp, QERR_DEVICE_NOT_FOUND, device);
+ return;
+ }
+
+ drv = drive_get_by_blockdev(bs);
+ bdrv_enable_dirty_tracking(bs, granularity, &local_err);
+ if (local_err) {
+ error_propagate(errp, local_err);
+ return;
+ }
+
+ /* Release previous usage of dirty bitmap. */
+ if (drv->dirty_use) {
+ bdrv_disable_persistent_dirty_tracking(bs);
+ bdrv_disable_dirty_tracking(bs);
+ }
+ drv->dirty_use = true;
+
+ bdrv_enable_persistent_dirty_tracking(bs, file, errp);
+}
+
+void qmp_blockdev_dirty_disable(const char *device, bool has_force, bool force, Error **errp)
+{
+ BlockDriverState *bs = bdrv_find(device);
+ DriveInfo *drv;
+
+ if (!bs) {
+ error_set(errp, QERR_DEVICE_NOT_FOUND, device);
+ return;
+ }
+
+ drv = drive_get_by_blockdev(bs);
+ if (!drv->dirty_use) {
+ error_setg(errp, "dirty tracking not enabled on device '%s'", device);
+ return;
+ }
+
+ if (has_force && force) {
+ bdrv_disable_persistent_dirty_tracking(bs);
+ }
+
+ bdrv_disable_dirty_tracking(bs);
+ drv->dirty_use = false;
+}
+
static BlockJob *find_block_job(const char *device)
{
BlockDriverState *bs;
diff --git a/blockdev.h b/blockdev.h
index 5f27b64..deb5bbf 100644
--- a/blockdev.h
+++ b/blockdev.h
@@ -38,6 +38,7 @@ struct DriveInfo {
const char *serial;
QTAILQ_ENTRY(DriveInfo) next;
int refcount;
+ int dirty_use;
};
DriveInfo *drive_get(BlockInterfaceType type, int bus, int unit);
diff --git a/hmp-commands.hx b/hmp-commands.hx
index 010b8c9..349cd0d 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1025,6 +1025,45 @@ using the specified target.
ETEXI
{
+ .name = "dirty_enable",
+ .args_type = "device:B,file:s,granularity:i?",
+ .params = "device file [granularity]",
+ "Defaults to MB if no size suffix is specified, ie. B/K/M/G/T",
+ .help = "initiates tracking of\n\t\t\t"
+ "dirty blocks for a block device.",
+ .mhandler.cmd = hmp_dirty_enable,
+ },
+STEXI
+@item dirty_enable
+@findex dirty_enable
+Start tracking dirty blocks for a block device. Dirty blocks will
+be written to an on-disk file, with one bit per block and an arbitrary
+granularity.
+
+If the dirty bitmap is already active, or used by something else (for
+example @command{drive_mirror}), the granularity argument must be absent
+or equal to the active granularity. The granularity must be a power-of-two
+comprised between 4,096 and 67,108,864.
+ETEXI
+
+ {
+ .name = "dirty_disable",
+ .args_type = "force:-f,device:B",
+ .params = "[-f] device",
+ .help = "prepares to disable tracking\n\t\t\t"
+ "dirty blocks of a block device.",
+ .mhandler.cmd = hmp_dirty_disable,
+ },
+STEXI
+@item dirty_disable
+@findex dirty_disable
+Prepare QEMU to stop tracking dirty blocks for a block device. The
+actual end of dirty tracking could be delayed while the dirty bitmap
+is in use by another command such as @command{drive_mirror}, unless
+the @option{-f} option is used.
+ETEXI
+
+ {
.name = "drive_add",
.args_type = "pci_addr:s,opts:s",
.params = "[[<domain>:]<bus>:]<slot>\n"
diff --git a/hmp.c b/hmp.c
index 428c563..e3d2c47 100644
--- a/hmp.c
+++ b/hmp.c
@@ -1335,3 +1335,30 @@ void hmp_nbd_server_stop(Monitor *mon, const QDict *qdict)
qmp_nbd_server_stop(&errp);
hmp_handle_error(mon, &errp);
}
+
+void hmp_dirty_enable(Monitor *mon, const QDict *qdict)
+{
+ const char *device = qdict_get_str(qdict, "device");
+ const char *file = qdict_get_str(qdict, "file");
+ bool has_granularity = qdict_haskey(qdict, "granularity");
+ int granularity = -1;
+ Error *errp = NULL;
+
+ if (has_granularity) {
+ granularity = qdict_get_int(qdict, "granularity");
+ }
+
+ qmp_blockdev_dirty_enable(device, file,
+ has_granularity, granularity, &errp);
+ hmp_handle_error(mon, &errp);
+}
+
+void hmp_dirty_disable(Monitor *mon, const QDict *qdict)
+{
+ const char *device = qdict_get_str(qdict, "device");
+ bool force = qdict_get_try_bool(qdict, "force", false);
+ Error *errp = NULL;
+
+ qmp_blockdev_dirty_disable(device, true, force, &errp);
+ hmp_handle_error(mon, &errp);
+}
diff --git a/hmp.h b/hmp.h
index 0ab03be..7408e45 100644
--- a/hmp.h
+++ b/hmp.h
@@ -80,5 +80,7 @@ void hmp_screen_dump(Monitor *mon, const QDict *qdict);
void hmp_nbd_server_start(Monitor *mon, const QDict *qdict);
void hmp_nbd_server_add(Monitor *mon, const QDict *qdict);
void hmp_nbd_server_stop(Monitor *mon, const QDict *qdict);
+void hmp_dirty_enable(Monitor *mon, const QDict *qdict);
+void hmp_dirty_disable(Monitor *mon, const QDict *qdict);
#endif
diff --git a/qapi-schema.json b/qapi-schema.json
index fb38106..cdf3772 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -3028,3 +3028,59 @@
# Since: 1.3.0
##
{ 'command': 'nbd-server-stop' }
+
+##
+# @blockdev-dirty-enable:
+#
+# Start tracking dirty blocks for a block device. Dirty blocks will
+# be written to an on-disk file, with one bit per block and an arbitrary
+# granularity. If the file already exists, the dirty bitmap is loaded
+# from the file.
+#
+# It is not an error to use this command if the dirty bitmap is already
+# active; the dirty blocks will simply be written to a new file from this
+# point on.
+#
+# If the dirty bitmap is already active, or used by something else (for
+# example drive-mirror), the granularity argument must be absent or equal
+# to the active granularity. Also, in this case the file must not exist.
+#
+# @device: the name of the device to track dirty blocks of
+#
+# @filename: the name of the file to write the bitmap to
+#
+# @granularity: #optional granularity of the dirty bitmap, default is 64K
+# if the image format doesn't have clusters, 4K if the clusters
+# are smaller than that, else the cluster size. Must be a
+# power of 2 between 512 and 64M.
+#
+# Returns: Nothing on success
+#
+# Since: 1.3
+##
+{ 'command': 'blockdev-dirty-enable',
+ 'data': {'device': 'str', 'filename': 'str', '*granularity': 'uint32' } }
+
+##
+# @blockdev-dirty-disable:
+#
+# Stop tracking dirty blocks for a block device. Dirty blocks will
+# be written to an on-disk file, with one bit per block and an arbitrary
+# granularity.
+#
+# If the dirty bitmap is already active, or used by something else (for
+# example blockdev-drive-mirror), the granularity argument must be absent
+# or equal to the active granularity.
+#
+# @device: the name of the device to track dirty blocks of
+#
+# @force: #optional true to immediately stop writing to the dirty
+# bitmap file; false to do so only when the last user of the
+# dirty bitmap stops using it (default false)
+#
+# Returns: Nothing on success
+#
+# Since: 1.3
+##
+{ 'command': 'blockdev-dirty-disable',
+ 'data': {'device': 'str', '*force': 'bool'} }
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 97c52c9..e0fde11 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -2589,6 +2589,71 @@ EQMP
},
{
+ .name = "blockdev-dirty-enable",
+ .args_type = "device:s,file:s,granularity:i?",
+ .mhandler.cmd_new = qmp_marshal_input_blockdev_dirty_enable,
+ },
+SQMP
+blockdev-dirty-enable
+---------------------
+
+Start tracking dirty blocks for a block device. Dirty blocks will
+be written to an on-disk file, with one bit per block and an arbitrary
+granularity. If the file already exists, the dirty bitmap is loaded
+from the file.
+
+It is not an error to use this command if the dirty bitmap is already
+active; the dirty blocks will simply be written to a new file from this
+point on.
+
+If the dirty bitmap is already active, or used by something else (for
+example blockdev-drive-mirror), the granularity argument must be absent
+or equal to the active granularity. Also, in this case the file must
+not exist.
+
+Arguments:
+
+- device: device name (json-string)
+- file: path to the dirty tracking file (json-string)
+- granularity: granularity of the dirty bitmap (json-int, optional,
+ must be a power of two between 512 and 64M.
+
+The default value of the granularity is, if the image format defines
+a cluster size, the cluster size or 4096, whichever is larger. If it
+does not define a cluster size, the default value of the granularity
+is 65536.
+
+Example:
+
+-> { "execute": "blockdev-dirty-enable", "arguments": {
+ "device": "ide0-hd0", "path": "/var/lib/libvirt/dirty/image.dbmp" } }
+<- { "return": {} }
+EQMP
+
+ {
+ .name = "blockdev-dirty-disable",
+ .args_type = "device:s,force:b",
+ .mhandler.cmd_new = qmp_marshal_input_blockdev_dirty_disable,
+ },
+SQMP
+blockdev-dirty-disable
+----------------------
+
+Arguments:
+
+- device: device name (json-string)
+- force: true to immediately stop writing to the dirty bitmap file;
+ false to do so only when the last user of the dirty bitmap stops using
+ it (json-boolean).
+
+Example:
+
+-> { "execute": "blockdev-dirty-disable", "arguments": {
+ "device": "ide0-hd0", "force": false } }
+<- { "return": {} }
+EQMP
+
+ {
.name = "query-block-jobs",
.args_type = "",
.mhandler.cmd_new = qmp_marshal_input_query_block_jobs,
--
1.8.0.1
^ permalink raw reply related [flat|nested] 53+ messages in thread
* Re: [Qemu-devel] [PATCH 01/20] host-utils: add ffsl
2012-12-12 13:46 ` [Qemu-devel] [PATCH 01/20] host-utils: add ffsl Paolo Bonzini
@ 2012-12-12 23:41 ` Eric Blake
0 siblings, 0 replies; 53+ messages in thread
From: Eric Blake @ 2012-12-12 23:41 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: kwolf, jcody, qemu-devel, stefanha
[-- Attachment #1: Type: text/plain, Size: 968 bytes --]
On 12/12/2012 06:46 AM, Paolo Bonzini wrote:
> We can provide fast versions based on the other functions defined
> by host-utils.h. Some care is required on glibc, which provides
> ffsl already.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> host-utils.h | 26 ++++++++++++++++++++++++++
> 1 file changed, 26 insertions(+)
Reviewed-by: Eric Blake <eblake@redhat.com>
>
> diff --git a/host-utils.h b/host-utils.h
> index 821db93..231d580 100644
> --- a/host-utils.h
> +++ b/host-utils.h
> @@ -24,6 +24,7 @@
> */
>
> #include "compiler.h" /* QEMU_GNUC_PREREQ */
> +#include <string.h> /* ffsl */
I never really understood why ffs() is in <strings.h> but ffsl() is in
<string.h>, especially when neither function deals with strings (a
<bitops.h> header would have made more sense), but such is life.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 619 bytes --]
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [Qemu-devel] [PATCH 02/20] add hierarchical bitmap data type and test cases
2012-12-12 13:46 ` [Qemu-devel] [PATCH 02/20] add hierarchical bitmap data type and test cases Paolo Bonzini
@ 2012-12-14 0:04 ` Eric Blake
2013-01-11 18:27 ` Stefan Hajnoczi
1 sibling, 0 replies; 53+ messages in thread
From: Eric Blake @ 2012-12-14 0:04 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: kwolf, jcody, qemu-devel, stefanha
[-- Attachment #1: Type: text/plain, Size: 1206 bytes --]
On 12/12/2012 06:46 AM, Paolo Bonzini wrote:
> HBitmaps provides an array of bits. The bits are stored as usual in an
> array of unsigned longs, but HBitmap is also optimized to provide fast
> iteration over set bits; going from one bit to the next is O(logB n)
> worst case, with B = sizeof(long) * CHAR_BIT: the result is low enough
> that the number of levels is in fact fixed.
>
>
> Reviewed-by: Laszlo Ersek <lersek@redhat.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> hbitmap.c | 400 ++++++++++++++++++++++++++++++++++++++++++++++++++
> hbitmap.h | 207 ++++++++++++++++++++++++++
> tests/Makefile | 2 +
> tests/test-hbitmap.c | 408 +++++++++++++++++++++++++++++++++++++++++++++++++++
> trace-events | 5 +
> 5 files changed, 1022 insertions(+)
> create mode 100644 hbitmap.c
> create mode 100644 hbitmap.h
> create mode 100644 tests/test-hbitmap.c
I've looked through this several times, but did another once-over, and
you can feel free to add:
Reviewed-by: Eric Blake <eblake@redhat.com>
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 619 bytes --]
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [Qemu-devel] [PATCH 03/20] block: implement dirty bitmap using HBitmap
2012-12-12 13:46 ` [Qemu-devel] [PATCH 03/20] block: implement dirty bitmap using HBitmap Paolo Bonzini
@ 2012-12-14 0:27 ` Eric Blake
0 siblings, 0 replies; 53+ messages in thread
From: Eric Blake @ 2012-12-14 0:27 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: kwolf, jcody, qemu-devel, stefanha
[-- Attachment #1: Type: text/plain, Size: 737 bytes --]
On 12/12/2012 06:46 AM, Paolo Bonzini wrote:
> This actually uses the dirty bitmap in the block layer, and converts
> mirroring to use an HBitmapIter.
>
> Reviewed-by: Laszlo Ersek <lersek@redhat.com> (except block/mirror.c parts)
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> Makefile.objs | 2 +-
> block.c | 94 ++++++++++------------------------------------------------
> block.h | 6 ++--
> block/mirror.c | 12 ++++++--
> block_int.h | 4 +--
> trace-events | 1 +
> 6 files changed, 33 insertions(+), 86 deletions(-)
Reviewed-by: Eric Blake <eblake@redhat.com>
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 619 bytes --]
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [Qemu-devel] [PATCH 04/20] block: make round_to_clusters public
2012-12-12 13:46 ` [Qemu-devel] [PATCH 04/20] block: make round_to_clusters public Paolo Bonzini
@ 2012-12-14 20:13 ` Eric Blake
0 siblings, 0 replies; 53+ messages in thread
From: Eric Blake @ 2012-12-14 20:13 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: kwolf, jcody, qemu-devel, stefanha
[-- Attachment #1: Type: text/plain, Size: 470 bytes --]
On 12/12/2012 06:46 AM, Paolo Bonzini wrote:
> This is needed in the following patch.
>
> Reviewed-by: Laszlo Ersek <lersek@redhat.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> block.c | 16 ++++++++--------
> block.h | 4 ++++
> 2 files changed, 12 insertions(+), 8 deletions(-)
Reviewed-by: Eric Blake <eblake@redhat.com>
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 619 bytes --]
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [Qemu-devel] [PATCH 05/20] mirror: perform COW if the cluster size is bigger than the granularity
2012-12-12 13:46 ` [Qemu-devel] [PATCH 05/20] mirror: perform COW if the cluster size is bigger than the granularity Paolo Bonzini
@ 2012-12-14 20:21 ` Eric Blake
0 siblings, 0 replies; 53+ messages in thread
From: Eric Blake @ 2012-12-14 20:21 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: kwolf, jcody, qemu-devel, stefanha
[-- Attachment #1: Type: text/plain, Size: 1508 bytes --]
On 12/12/2012 06:46 AM, Paolo Bonzini wrote:
> When mirroring runs, the backing files for the target may not yet be
> ready. However, this means that a copy-on-write operation on the target
> would fill the missing sectors with zeros. Copy-on-write only happens
> if the granularity of the dirty bitmap is smaller than the cluster size
> (and only for clusters that are allocated in the source after the job
> has started copying). So far, the granularity was fixed to 1MB; to avoid
> the problem we detected the situation and required the backing files to
> be available in that case only.
>
> However, we want to lower the granularity for efficiency, so we need
> a better solution. The solution is to always copy a whole cluster the
> first time it is touched. The code keeps a bitmap of clusters that
> have already been allocated by the mirroring job, and only does "manual"
> copy-on-write if the chunk being copied is zero in the bitmap.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> block/mirror.c | 60 +++++++++++++++++++++++++++++++++++++++-------
> blockdev.c | 15 +++---------
> tests/qemu-iotests/041 | 21 ++++++++++++++++
> tests/qemu-iotests/041.out | 4 ++--
> trace-events | 1 +
> 5 files changed, 78 insertions(+), 23 deletions(-)
Reviewed-by: Eric Blake <eblake@redhat.com>
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 619 bytes --]
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [Qemu-devel] [PATCH 06/20] block: return count of dirty sectors, not chunks
2012-12-12 13:46 ` [Qemu-devel] [PATCH 06/20] block: return count of dirty sectors, not chunks Paolo Bonzini
@ 2012-12-14 20:49 ` Eric Blake
0 siblings, 0 replies; 53+ messages in thread
From: Eric Blake @ 2012-12-14 20:49 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: kwolf, jcody, qemu-devel, stefanha
[-- Attachment #1: Type: text/plain, Size: 571 bytes --]
On 12/12/2012 06:46 AM, Paolo Bonzini wrote:
> Reviewed-by: Laszlo Ersek <lersek@redhat.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> block-migration.c | 2 +-
> block.c | 5 ++---
> block/mirror.c | 2 +-
> 3 files changed, 4 insertions(+), 5 deletions(-)
Reviewed-by: Eric Blake <eblake@redhat.com>
although the commit message could usefully add a line stating WHY
sectors is a more useful number than chunks
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 619 bytes --]
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [Qemu-devel] [PATCH 07/20] block: allow customizing the granularity of the dirty bitmap
2012-12-12 13:46 ` [Qemu-devel] [PATCH 07/20] block: allow customizing the granularity of the dirty bitmap Paolo Bonzini
@ 2012-12-14 21:27 ` Eric Blake
2012-12-15 9:11 ` Paolo Bonzini
0 siblings, 1 reply; 53+ messages in thread
From: Eric Blake @ 2012-12-14 21:27 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: kwolf, jcody, qemu-devel, stefanha
[-- Attachment #1: Type: text/plain, Size: 1891 bytes --]
On 12/12/2012 06:46 AM, Paolo Bonzini wrote:
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> block-migration.c | 5 +++--
> block.c | 17 ++++++++++-------
> block.h | 5 +----
> block/mirror.c | 14 ++++----------
> qapi-schema.json | 4 +++-
> 5 files changed, 21 insertions(+), 24 deletions(-)
>
> @@ -4218,16 +4220,17 @@ void *qemu_blockalign(BlockDriverState *bs, size_t size)
> return qemu_memalign((bs && bs->buffer_alignment) ? bs->buffer_alignment : 512, size);
> }
>
> -void bdrv_set_dirty_tracking(BlockDriverState *bs, int enable)
> +void bdrv_set_dirty_tracking(BlockDriverState *bs, int granularity)
> {
> int64_t bitmap_size;
>
> - if (enable) {
> - if (!bs->dirty_bitmap) {
> - bitmap_size = (bdrv_getlength(bs) >> BDRV_SECTOR_BITS);
> - bs->dirty_bitmap = hbitmap_alloc(bitmap_size,
> - BDRV_LOG_SECTORS_PER_DIRTY_CHUNK);
> - }
> + assert((granularity & (granularity - 1)) == 0);
> +
> + if (granularity) {
> + granularity >>= BDRV_SECTOR_BITS;
Given that granularity is specified in bytes, does it make sense for a
user to want a granularity of 4G? I guess another way of wording this
question is: Are you sure 'int granularity' is the right type?
> +++ b/qapi-schema.json
> @@ -667,10 +667,12 @@
> #
> # @count: number of dirty bytes according to the dirty bitmap
> #
> +# @granularity: granularity of the dirty bitmap in bytes
Mention that this field was added in 1.4.
> +#
> # Since: 1.3
> ##
> { 'type': 'BlockDirtyInfo',
> - 'data': {'count': 'int'} }
> + 'data': {'count': 'int', 'granularity': 'int'} }
>
> ##
> # @BlockInfo:
>
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 619 bytes --]
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [Qemu-devel] [PATCH 08/20] mirror: allow customizing the granularity
2012-12-12 13:46 ` [Qemu-devel] [PATCH 08/20] mirror: allow customizing the granularity Paolo Bonzini
@ 2012-12-14 22:01 ` Eric Blake
2013-01-14 11:28 ` Stefan Hajnoczi
1 sibling, 0 replies; 53+ messages in thread
From: Eric Blake @ 2012-12-14 22:01 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: kwolf, jcody, qemu-devel, stefanha
[-- Attachment #1: Type: text/plain, Size: 4142 bytes --]
On 12/12/2012 06:46 AM, Paolo Bonzini wrote:
> The desired granularity may be very different depending on the kind of
> operation (e.g. continuous replication vs. collapse-to-raw) and whether
> the VM is expected to perform lots of I/O while mirroring is in progress.
>
> Allow the user to customize it, while providing a sane default so that
> in general there will be no extra allocated space in the target compared
> to the source.
Might be worth mentioning that this configuration still has to be done
prior to starting the job (ie. it's a one-time initialization, not
something that is run-time tweakable).
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> block/mirror.c | 50 ++++++++++++++++++++++++++++++++------------------
> block_int.h | 3 ++-
> blockdev.c | 15 ++++++++++++++-
> hmp.c | 2 +-
> qapi-schema.json | 8 +++++++-
> qmp-commands.hx | 8 +++++++-
> 6 files changed, 63 insertions(+), 23 deletions(-)
> @@ -330,7 +329,7 @@ static BlockJobType mirror_job_type = {
> };
>
> void mirror_start(BlockDriverState *bs, BlockDriverState *target,
> - int64_t speed, MirrorSyncMode mode,
> + int64_t speed, int64_t granularity, MirrorSyncMode mode,
> BlockdevOnError on_source_error,
> BlockdevOnError on_target_error,
> BlockDriverCompletionFunc *cb,
> @@ -338,6 +337,20 @@ void mirror_start(BlockDriverState *bs, BlockDriverState *target,
> {
> MirrorBlockJob *s;
>
> + if (granularity == 0) {
> + /* Choose the default granularity based on the target file's cluster
> + * size, clamped between 4k and 64k. */
> + BlockDriverInfo bdi;
> + if (bdrv_get_info(target, &bdi) >= 0 && bdi.cluster_size != 0) {
> + granularity = MAX(4096, bdi.cluster_size);
> + granularity = MIN(65536, granularity);
You clamp granularity to sane bounds when defaulting...
> + } else {
> + granularity = 65536;
> + }
> + }
> +
> + assert((granularity & (granularity - 1)) == 0);
...but don't do any checking other than power-of-two on user input. Can
the user request a granularity that makes no sense, such as something
less than 512 or huge like 1G? Or for that matter, is it even a problem
if the user requests a granularity larger than the target bdi.cluster_size?
> @@ -1217,6 +1218,17 @@ void qmp_drive_mirror(const char *device, const char *target,
> if (!has_mode) {
> mode = NEW_IMAGE_MODE_ABSOLUTE_PATHS;
> }
> + if (!has_granularity) {
> + granularity = 0;
> + }
> + if (granularity != 0 && (granularity < 512 || granularity > 1048576 * 64)) {
That answers part of my question - you clamp between 512 and 64M, which
is a much wider range than the defaults you end up choosing.
> +++ b/qapi-schema.json
> @@ -1636,6 +1636,11 @@
> # (all the disk, only the sectors allocated in the topmost image, or
> # only new I/O).
> #
> +# @granularity: #optional granularity of the dirty bitmap, default is 64K
> +# if the image format doesn't have clusters, 4K if the clusters
> +# are smaller than that, else the cluster size. Must be a
> +# power of 2 between 512 and 64M.
Maybe mention that this attribute was added in 1.4. (Hmm, now I have to
decide how to expose this attribute via libvirt.)
> @@ -971,6 +973,10 @@ Arguments:
> - "on-target-error": the action to take on an error on the target
> (BlockdevOnError, default 'report')
>
> +The default value of the granularity is, if the image format defines
> +a cluster size, the cluster size or 4096, whichever is larger. If it
> +does not define a cluster size, the default value of the granularity
> +is 65536.
That doesn't quite cover the fact that you clamp granularity to 64k if
the image format has a cluster size larger than 64k.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 619 bytes --]
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [Qemu-devel] [PATCH 09/20] mirror: switch mirror_iteration to AIO
2012-12-12 13:46 ` [Qemu-devel] [PATCH 09/20] mirror: switch mirror_iteration to AIO Paolo Bonzini
@ 2012-12-14 22:11 ` Eric Blake
2012-12-15 9:09 ` Paolo Bonzini
0 siblings, 1 reply; 53+ messages in thread
From: Eric Blake @ 2012-12-14 22:11 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: kwolf, jcody, qemu-devel, stefanha
[-- Attachment #1: Type: text/plain, Size: 1367 bytes --]
On 12/12/2012 06:46 AM, Paolo Bonzini wrote:
> There is really no change in the behavior of the job here, since
> there is still a maximum of one in-flight I/O operation between
> the source and the target. However, this patch already introduces
> the AIO callbacks (which are unmodified in the next patch)
> and some of the logic to count in-flight operations and only
> complete the job when there is none.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> block/mirror.c | 155 +++++++++++++++++++++++++++++++++++++++++++--------------
> trace-events | 2 +
> 2 files changed, 119 insertions(+), 38 deletions(-)
>
> @@ -87,31 +143,30 @@ static int coroutine_fn mirror_iteration(MirrorBlockJob *s,
>
> end = s->common.len >> BDRV_SECTOR_BITS;
> nb_sectors = MIN(nb_sectors, end - sector_num);
> +
> + /* Allocate a MirrorOp that is used as an AIO callback. */
> + op = g_slice_new(MirrorOp);
> + op->s = s;
> + op->iov.iov_base = s->buf;
> + op->iov.iov_len = nb_sectors * 512;
Why two spaces?
I'm not an expert in this area of code, so my review is weak; but I
didn't spot anything obviously wrong, so feel free to add my:
Reviewed-by: Eric Blake <eblake@redhat.com>
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 619 bytes --]
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [Qemu-devel] [PATCH 10/20] mirror: add buf-size argument to drive-mirror
2012-12-12 13:46 ` [Qemu-devel] [PATCH 10/20] mirror: add buf-size argument to drive-mirror Paolo Bonzini
@ 2012-12-14 22:22 ` Eric Blake
2012-12-15 9:09 ` Paolo Bonzini
2013-01-14 11:41 ` Stefan Hajnoczi
1 sibling, 1 reply; 53+ messages in thread
From: Eric Blake @ 2012-12-14 22:22 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: kwolf, jcody, qemu-devel, stefanha
[-- Attachment #1: Type: text/plain, Size: 1912 bytes --]
On 12/12/2012 06:46 AM, Paolo Bonzini wrote:
> This makes sense when the next commit starts using the extra buffer space
> to perform many I/O operations asynchronously.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> block/mirror.c | 6 +++---
> block_int.h | 5 +++--
> blockdev.c | 9 ++++++++-
> hmp.c | 2 +-
> qapi-schema.json | 5 ++++-
> qmp-commands.hx | 4 +++-
> tests/qemu-iotests/041 | 31 +++++++++++++++++++++++++++++++
> 7 files changed, 53 insertions(+), 9 deletions(-)
>
> @@ -447,7 +447,7 @@ void mirror_start(BlockDriverState *bs, BlockDriverState *target,
> s->target = target;
> s->mode = mode;
> s->granularity = granularity;
> - s->buf_size = granularity;
> + s->buf_size = MAX(buf_size, granularity);
So you silently clamp the buffer size if the user gives something larger
than granularity.
> +++ b/qapi-schema.json
> @@ -1641,6 +1641,9 @@
> # are smaller than that, else the cluster size. Must be a
> # power of 2 between 512 and 64M.
> #
> +# @buf-size: #optional maximum amount of data in flight from source to
> +# target.
Mention that it was added in 1.4. Also, is it worth mentioning
reasonable bounds (such as granularity), and whether it must be a power
of two?
> - "granularity": granularity of the dirty bitmap (json-int, optional)
> +- "buf_size": maximum amount of data in flight from source to target
> + (json-int, default 10M)
Oh, so unlike granularity, this one does NOT have to be a power of 2.
But why is the default 10M if you clamp it to granularity, which
defaults to 64k? (and again, something else I have to figure out how to
expose from libvirt)
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 619 bytes --]
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [Qemu-devel] [PATCH 11/20] mirror: support more than one in-flight AIO operation
2012-12-12 13:46 ` [Qemu-devel] [PATCH 11/20] mirror: support more than one in-flight AIO operation Paolo Bonzini
@ 2012-12-14 22:32 ` Eric Blake
2013-01-14 12:56 ` Stefan Hajnoczi
1 sibling, 0 replies; 53+ messages in thread
From: Eric Blake @ 2012-12-14 22:32 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: kwolf, jcody, qemu-devel, stefanha
[-- Attachment #1: Type: text/plain, Size: 1235 bytes --]
On 12/12/2012 06:46 AM, Paolo Bonzini wrote:
> With AIO support in place, we can start copying more than one chunk
> in parallel. This patch introduces the required infrastructure for
> this: the buffer is split into multiple granularity-sized chunks,
> and there is a free list to access them.
>
> Because of copy-on-write, a single operation may already require
> multiple chunks to be available on the free list.
>
> In addition, two different iterations on the HBitmap may want to
> copy the same cluster. We avoid this by keeping a bitmap of in-flight
> I/O operations, and blocking until the previous iteration completes.
> This should be a pretty rare occurrence, though; as long as there is
> no overlap the next iteration can start before the previous one finishes.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> block/mirror.c | 109 ++++++++++++++++++++++++++++++++++++++++++++++++++-------
> trace-events | 4 ++-
> 2 files changed, 100 insertions(+), 13 deletions(-)
Again, my review is weak, but feel free to add:
Reviewed-by: Eric Blake <eblake@redhat.com>
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 619 bytes --]
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [Qemu-devel] [PATCH 12/20] mirror: support arbitrarily-sized iterations
2012-12-12 13:46 ` [Qemu-devel] [PATCH 12/20] mirror: support arbitrarily-sized iterations Paolo Bonzini
@ 2012-12-14 22:39 ` Eric Blake
0 siblings, 0 replies; 53+ messages in thread
From: Eric Blake @ 2012-12-14 22:39 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: kwolf, jcody, qemu-devel, stefanha
[-- Attachment #1: Type: text/plain, Size: 801 bytes --]
On 12/12/2012 06:46 AM, Paolo Bonzini wrote:
> Yet another optimization is to extend the mirroring iteration to include more
> adjacent dirty blocks. This limits the number of I/O operations and makes
> mirroring efficient even with a small granularity. Most of the infrastructure
> is already in place; we only need to put a loop around the computation of
> the origin and sector count of the iteration.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> block/mirror.c | 100 +++++++++++++++++++++++++++++++++++++++------------------
> trace-events | 1 +
> 2 files changed, 69 insertions(+), 32 deletions(-)
Reviewed-by: Eric Blake <eblake@redhat.com>
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 619 bytes --]
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [Qemu-devel] [PATCH 13/20] oslib: add a wrapper for mmap/munmap
2012-12-12 13:46 ` [Qemu-devel] [PATCH 13/20] oslib: add a wrapper for mmap/munmap Paolo Bonzini
@ 2012-12-14 22:54 ` Eric Blake
2012-12-15 9:06 ` Paolo Bonzini
0 siblings, 1 reply; 53+ messages in thread
From: Eric Blake @ 2012-12-14 22:54 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: kwolf, jcody, qemu-devel, stefanha
[-- Attachment #1: Type: text/plain, Size: 2524 bytes --]
On 12/12/2012 06:46 AM, Paolo Bonzini wrote:
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> osdep.h | 10 ++++++++++
> oslib-posix.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++
> oslib-win32.c | 59 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 116 insertions(+)
>
> +int qemu_mmap_alloc(QEMUMmapArea *mm, const char *path, size_t size)
> +{
> + int fd = -1;
> + char *mem = NULL;
> + int save_errno;
> +
> + fd = qemu_open(path, O_RDWR | O_CREAT, 0666);
Do you want O_EXCL and/or O_TRUNC here as well?
> + if (fd < 0) {
> + goto fail;
> + }
> +
> + if (ftruncate(fd, size) < 0) {
Or are you okay with using this to read existing contents and for the
size to possibly discard the tail of a file? (Hmm, thinking of how you
plan on using persistent bitmaps, it sounds like you WANT to open
pre-existing files; but then the caller has to be careful to pass in the
right size).
Would it be any better to alter this wrapper function to allow the user
to choose between creating a new file (size is relevant, and use O_EXCL)
vs. re-reading an existing file (no ftruncate performed, and the mmap
size is picked up from fstat())?
> + goto fail;
> + }
> +
> + mem = mmap(NULL, size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
> + if (!mem) {
> + goto fail;
> + }
> +
> + mm->fd = fd;
> + mm->mem = mem;
> + mm->size = size;
> + return 0;
> +
> +fail:
> + save_errno = errno;
> + if (fd >= 0) {
> + close(fd);
> + unlink(path);
You're blindly unlinking here; sounds like you either want O_EXCL on the
open, or need to skip the unlink() if the file was pre-existing. No
error checking that unlink() succeeded?
> + }
> + return -save_errno;
> +}
> +
> +int qemu_mmap_flush(QEMUMmapArea *mm)
> +{
> + int rc = msync(mm->mem, mm->size, MS_SYNC);
> + return rc < 0 ? -errno : 0;
> +}
> +
> +void qemu_mmap_free(QEMUMmapArea *mm)
> +{
> + munmap(mm->mem, mm->size);
> + close(mm->fd);
> +}
You unlink()d the file on failure during the initial map, but nowhere
else. Is that intentional?
> diff --git a/oslib-win32.c b/oslib-win32.c
My review gets weaker here... However, it looks like you have a
matching implementation based on the wrapper interface you defined.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 619 bytes --]
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [Qemu-devel] [PATCH 13/20] oslib: add a wrapper for mmap/munmap
2012-12-14 22:54 ` Eric Blake
@ 2012-12-15 9:06 ` Paolo Bonzini
0 siblings, 0 replies; 53+ messages in thread
From: Paolo Bonzini @ 2012-12-15 9:06 UTC (permalink / raw)
To: Eric Blake; +Cc: kwolf, jcody, qemu-devel, stefanha
----- Messaggio originale -----
> Da: "Eric Blake" <eblake@redhat.com>
> A: "Paolo Bonzini" <pbonzini@redhat.com>
> Cc: qemu-devel@nongnu.org, kwolf@redhat.com, stefanha@redhat.com, jcody@redhat.com
> Inviato: Venerdì, 14 dicembre 2012 23:54:31
> Oggetto: Re: [PATCH 13/20] oslib: add a wrapper for mmap/munmap
>
> On 12/12/2012 06:46 AM, Paolo Bonzini wrote:
> > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> > ---
> > osdep.h | 10 ++++++++++
> > oslib-posix.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++
> > oslib-win32.c | 59
> > +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > 3 files changed, 116 insertions(+)
> >
>
> > +int qemu_mmap_alloc(QEMUMmapArea *mm, const char *path, size_t
> > size)
> > +{
> > + int fd = -1;
> > + char *mem = NULL;
> > + int save_errno;
> > +
> > + fd = qemu_open(path, O_RDWR | O_CREAT, 0666);
>
> Do you want O_EXCL and/or O_TRUNC here as well?
> Or are you okay with using this to read existing contents and for the
> size to possibly discard the tail of a file? (Hmm, thinking of how
> you plan on using persistent bitmaps, it sounds like you WANT to open
> pre-existing files; but then the caller has to be careful to pass in
> the right size).
Yes. The caller will copy the transient bitmap to the persistent one
if it is creating a new file, and vice versa when loading.
> You're blindly unlinking here
Right, the unlink should be in the caller.
> ; sounds like you either want O_EXCL on the
> open, or need to skip the unlink() if the file was pre-existing. No
> error checking that unlink() succeeded?
No error checking because the mmap error is more important to pass up.
> You unlink()d the file on failure during the initial map, but nowhere
> else. Is that intentional?
Yes.
Paolo
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [Qemu-devel] [PATCH 10/20] mirror: add buf-size argument to drive-mirror
2012-12-14 22:22 ` Eric Blake
@ 2012-12-15 9:09 ` Paolo Bonzini
0 siblings, 0 replies; 53+ messages in thread
From: Paolo Bonzini @ 2012-12-15 9:09 UTC (permalink / raw)
To: Eric Blake; +Cc: kwolf, jcody, qemu-devel, stefanha
> On 12/12/2012 06:46 AM, Paolo Bonzini wrote:
> > This makes sense when the next commit starts using the extra buffer
> > space
> > to perform many I/O operations asynchronously.
> >
> > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> > ---
> > block/mirror.c | 6 +++---
> > block_int.h | 5 +++--
> > blockdev.c | 9 ++++++++-
> > hmp.c | 2 +-
> > qapi-schema.json | 5 ++++-
> > qmp-commands.hx | 4 +++-
> > tests/qemu-iotests/041 | 31 +++++++++++++++++++++++++++++++
> > 7 files changed, 53 insertions(+), 9 deletions(-)
> >
>
> > @@ -447,7 +447,7 @@ void mirror_start(BlockDriverState *bs,
> > BlockDriverState *target,
> > s->target = target;
> > s->mode = mode;
> > s->granularity = granularity;
> > - s->buf_size = granularity;
> > + s->buf_size = MAX(buf_size, granularity);
>
> So you silently clamp the buffer size if the user gives something
> larger than granularity.
I silently make it large if it gives something *smaller*. The job
always has to copy one granularity-sized chunk.
> > +++ b/qapi-schema.json
> > @@ -1641,6 +1641,9 @@
> > # are smaller than that, else the cluster size.
> > Must be a
> > # power of 2 between 512 and 64M.
> > #
> > +# @buf-size: #optional maximum amount of data in flight from
> > source to
> > +# target.
>
> Mention that it was added in 1.4. Also, is it worth mentioning
> reasonable bounds (such as granularity), and whether it must be a
> power of two?
It is arbitrary, though making it too big makes little sense.
> > - "granularity": granularity of the dirty bitmap (json-int,
> > optional)
> > +- "buf_size": maximum amount of data in flight from source to
> > target
> > + (json-int, default 10M)
>
> Oh, so unlike granularity, this one does NOT have to be a power of 2.
> But why is the default 10M if you clamp it to granularity, which
> defaults to 64k?
I don't clamp it, I make it _at least_ as big as the granularity.
Paolo
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [Qemu-devel] [PATCH 09/20] mirror: switch mirror_iteration to AIO
2012-12-14 22:11 ` Eric Blake
@ 2012-12-15 9:09 ` Paolo Bonzini
2012-12-15 13:05 ` Eric Blake
0 siblings, 1 reply; 53+ messages in thread
From: Paolo Bonzini @ 2012-12-15 9:09 UTC (permalink / raw)
To: Eric Blake; +Cc: kwolf, jcody, qemu-devel, stefanha
----- Messaggio originale -----
> Da: "Eric Blake" <eblake@redhat.com>
> A: "Paolo Bonzini" <pbonzini@redhat.com>
> Cc: qemu-devel@nongnu.org, kwolf@redhat.com, stefanha@redhat.com, jcody@redhat.com
> Inviato: Venerdì, 14 dicembre 2012 23:11:02
> Oggetto: Re: [PATCH 09/20] mirror: switch mirror_iteration to AIO
>
> On 12/12/2012 06:46 AM, Paolo Bonzini wrote:
> > There is really no change in the behavior of the job here, since
> > there is still a maximum of one in-flight I/O operation between
> > the source and the target. However, this patch already introduces
> > the AIO callbacks (which are unmodified in the next patch)
> > and some of the logic to count in-flight operations and only
> > complete the job when there is none.
> >
> > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> > ---
> > block/mirror.c | 155
> > +++++++++++++++++++++++++++++++++++++++++++--------------
> > trace-events | 2 +
> > 2 files changed, 119 insertions(+), 38 deletions(-)
> >
>
> > @@ -87,31 +143,30 @@ static int coroutine_fn
> > mirror_iteration(MirrorBlockJob *s,
> >
> > end = s->common.len >> BDRV_SECTOR_BITS;
> > nb_sectors = MIN(nb_sectors, end - sector_num);
> > +
> > + /* Allocate a MirrorOp that is used as an AIO callback. */
> > + op = g_slice_new(MirrorOp);
> > + op->s = s;
> > + op->iov.iov_base = s->buf;
> > + op->iov.iov_len = nb_sectors * 512;
>
> Why two spaces?
To align the equal signs. :)
Paolo
> I'm not an expert in this area of code, so my review is weak; but I
> didn't spot anything obviously wrong, so feel free to add my:
>
> Reviewed-by: Eric Blake <eblake@redhat.com>
>
> --
> Eric Blake eblake redhat com +1-919-301-3266
> Libvirt virtualization library http://libvirt.org
>
>
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [Qemu-devel] [PATCH 07/20] block: allow customizing the granularity of the dirty bitmap
2012-12-14 21:27 ` Eric Blake
@ 2012-12-15 9:11 ` Paolo Bonzini
0 siblings, 0 replies; 53+ messages in thread
From: Paolo Bonzini @ 2012-12-15 9:11 UTC (permalink / raw)
To: Eric Blake; +Cc: kwolf, jcody, qemu-devel, stefanha
----- Messaggio originale -----
> Da: "Eric Blake" <eblake@redhat.com>
> A: "Paolo Bonzini" <pbonzini@redhat.com>
> Cc: qemu-devel@nongnu.org, kwolf@redhat.com, stefanha@redhat.com, jcody@redhat.com
> Inviato: Venerdì, 14 dicembre 2012 22:27:04
> Oggetto: Re: [PATCH 07/20] block: allow customizing the granularity of the dirty bitmap
>
> On 12/12/2012 06:46 AM, Paolo Bonzini wrote:
> > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> > ---
> > block-migration.c | 5 +++--
> > block.c | 17 ++++++++++-------
> > block.h | 5 +----
> > block/mirror.c | 14 ++++----------
> > qapi-schema.json | 4 +++-
> > 5 files changed, 21 insertions(+), 24 deletions(-)
> >
>
> > @@ -4218,16 +4220,17 @@ void *qemu_blockalign(BlockDriverState *bs,
> > size_t size)
> > return qemu_memalign((bs && bs->buffer_alignment) ?
> > bs->buffer_alignment : 512, size);
> > }
> >
> > -void bdrv_set_dirty_tracking(BlockDriverState *bs, int enable)
> > +void bdrv_set_dirty_tracking(BlockDriverState *bs, int
> > granularity)
> > {
> > int64_t bitmap_size;
> >
> > - if (enable) {
> > - if (!bs->dirty_bitmap) {
> > - bitmap_size = (bdrv_getlength(bs) >>
> > BDRV_SECTOR_BITS);
> > - bs->dirty_bitmap = hbitmap_alloc(bitmap_size,
> > -
> > BDRV_LOG_SECTORS_PER_DIRTY_CHUNK);
> > - }
> > + assert((granularity & (granularity - 1)) == 0);
> > +
> > + if (granularity) {
> > + granularity >>= BDRV_SECTOR_BITS;
>
> Given that granularity is specified in bytes, does it make sense for
> a user to want a granularity of 4G?
No, the point of the HBitmap is to allow small granularities. Anything
bigger than 1 or 2 MB makes little sense.
Paolo
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [Qemu-devel] [PATCH 09/20] mirror: switch mirror_iteration to AIO
2012-12-15 9:09 ` Paolo Bonzini
@ 2012-12-15 13:05 ` Eric Blake
0 siblings, 0 replies; 53+ messages in thread
From: Eric Blake @ 2012-12-15 13:05 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: kwolf, jcody, qemu-devel, stefanha
[-- Attachment #1: Type: text/plain, Size: 727 bytes --]
On 12/15/2012 02:09 AM, Paolo Bonzini wrote:
>>> + /* Allocate a MirrorOp that is used as an AIO callback. */
>>> + op = g_slice_new(MirrorOp);
>>> + op->s = s;
>>> + op->iov.iov_base = s->buf;
>>> + op->iov.iov_len = nb_sectors * 512;
>>
>> Why two spaces?
>
> To align the equal signs. :)
Then I didn't quote enough context. Why the two spaces here, when the
very next line didn't align equal signs?
>
> + op->s = s;
> + op->iov.iov_base = s->buf;
> + op->iov.iov_len = nb_sectors * 512;
> + op->sector_num = sector_num;
> + op->nb_sectors = nb_sectors;
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 619 bytes --]
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [Qemu-devel] [PATCH 14/20] hbitmap: add hbitmap_alloc_with_data and hbitmap_required_size
2012-12-12 13:46 ` [Qemu-devel] [PATCH 14/20] hbitmap: add hbitmap_alloc_with_data and hbitmap_required_size Paolo Bonzini
@ 2012-12-17 17:14 ` Eric Blake
2012-12-17 17:18 ` Paolo Bonzini
0 siblings, 1 reply; 53+ messages in thread
From: Eric Blake @ 2012-12-17 17:14 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: kwolf, jcody, qemu-devel, stefanha
[-- Attachment #1: Type: text/plain, Size: 2452 bytes --]
On 12/12/2012 06:46 AM, Paolo Bonzini wrote:
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> hbitmap.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++----------
> hbitmap.h | 25 +++++++++++++++++++++++++
> 2 files changed, 72 insertions(+), 10 deletions(-)
When using hbitmap_alloc_with_data, must the user pass in all 0 data, or
do you do a one-time slow pass over the passed-in data to populate the
remaining layers of the hbitmap to allow faster traversal later?
> +HBitmap *hbitmap_alloc_with_data(uint64_t size, int granularity, void *data)
> +{
> + HBitmap *hb = g_malloc0(sizeof(struct HBitmap));
> + unsigned i;
> +
> + hb->size = hbitmap_round_size(size, granularity);
> hb->granularity = granularity;
> + hb->last_level_allocated = (data == NULL);
> +
> for (i = HBITMAP_LEVELS; i-- > 0; ) {
> - size = MAX((size + BITS_PER_LONG - 1) >> BITS_PER_LEVEL, 1);
> - hb->levels[i] = g_malloc0(size * sizeof(unsigned long));
> + if (data == NULL) {
> + data = g_malloc0(hbitmap_required_size(size, granularity));
> + }
> + hb->levels[i] = data;
> + data = NULL;
> + granularity += BITS_PER_LEVEL;
> }
>
> /* We necessarily have free bits in level 0 due to the definition
> * of HBITMAP_LEVELS, so use one for a sentinel. This speeds up
> * hbitmap_iter_skip_words.
> */
> - assert(size == 1);
> + assert(hbitmap_required_size(size, granularity) == sizeof(unsigned long));
> hb->levels[0][0] |= 1UL << (BITS_PER_LONG - 1);
> return hb;
> }
Based on the implementation, you aren't inspecting the contents of data.
> +/**
> + * hbitmap_alloc_with_data:
> + * @size: Number of bits in the bitmap.
> + * @granularity: Granularity of the bitmap.
> + * @data: Pointer to a data block that will be used for the bottom level
> + * of the HBitmap.
> + *
> + * Allocate a new HBitmap, using a client-provided data block for the
> + * actual bitmap and allocating memory only for the compressed levels.
> + * If @data is NULL, this function is equivalent to @hbitmap_alloc.
> + */
> +HBitmap *hbitmap_alloc_with_data(uint64_t size, int granularity, void *data);
But this documentation didn't mention that the caller must pass in all 0s.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 619 bytes --]
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [Qemu-devel] [PATCH 14/20] hbitmap: add hbitmap_alloc_with_data and hbitmap_required_size
2012-12-17 17:14 ` Eric Blake
@ 2012-12-17 17:18 ` Paolo Bonzini
0 siblings, 0 replies; 53+ messages in thread
From: Paolo Bonzini @ 2012-12-17 17:18 UTC (permalink / raw)
To: Eric Blake; +Cc: kwolf, jcody, qemu-devel, stefanha
Il 17/12/2012 18:14, Eric Blake ha scritto:
> On 12/12/2012 06:46 AM, Paolo Bonzini wrote:
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> ---
>> hbitmap.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++----------
>> hbitmap.h | 25 +++++++++++++++++++++++++
>> 2 files changed, 72 insertions(+), 10 deletions(-)
>
> When using hbitmap_alloc_with_data, must the user pass in all 0 data, or
> do you do a one-time slow pass over the passed-in data to populate the
> remaining layers of the hbitmap to allow faster traversal later?
Right, I should add such a pass.
Paolo
>> +HBitmap *hbitmap_alloc_with_data(uint64_t size, int granularity, void *data)
>> +{
>> + HBitmap *hb = g_malloc0(sizeof(struct HBitmap));
>> + unsigned i;
>> +
>> + hb->size = hbitmap_round_size(size, granularity);
>> hb->granularity = granularity;
>> + hb->last_level_allocated = (data == NULL);
>> +
>> for (i = HBITMAP_LEVELS; i-- > 0; ) {
>> - size = MAX((size + BITS_PER_LONG - 1) >> BITS_PER_LEVEL, 1);
>> - hb->levels[i] = g_malloc0(size * sizeof(unsigned long));
>> + if (data == NULL) {
>> + data = g_malloc0(hbitmap_required_size(size, granularity));
>> + }
>> + hb->levels[i] = data;
>> + data = NULL;
>> + granularity += BITS_PER_LEVEL;
>> }
>>
>> /* We necessarily have free bits in level 0 due to the definition
>> * of HBITMAP_LEVELS, so use one for a sentinel. This speeds up
>> * hbitmap_iter_skip_words.
>> */
>> - assert(size == 1);
>> + assert(hbitmap_required_size(size, granularity) == sizeof(unsigned long));
>> hb->levels[0][0] |= 1UL << (BITS_PER_LONG - 1);
>> return hb;
>> }
>
> Based on the implementation, you aren't inspecting the contents of data.
>
>> +/**
>> + * hbitmap_alloc_with_data:
>> + * @size: Number of bits in the bitmap.
>> + * @granularity: Granularity of the bitmap.
>> + * @data: Pointer to a data block that will be used for the bottom level
>> + * of the HBitmap.
>> + *
>> + * Allocate a new HBitmap, using a client-provided data block for the
>> + * actual bitmap and allocating memory only for the compressed levels.
>> + * If @data is NULL, this function is equivalent to @hbitmap_alloc.
>> + */
>> +HBitmap *hbitmap_alloc_with_data(uint64_t size, int granularity, void *data);
>
> But this documentation didn't mention that the caller must pass in all 0s.
>
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [Qemu-devel] [PATCH 15/20] hbitmap: add hbitmap_copy
2012-12-12 13:46 ` [Qemu-devel] [PATCH 15/20] hbitmap: add hbitmap_copy Paolo Bonzini
@ 2012-12-17 18:25 ` Eric Blake
0 siblings, 0 replies; 53+ messages in thread
From: Eric Blake @ 2012-12-17 18:25 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: kwolf, jcody, qemu-devel, stefanha
[-- Attachment #1: Type: text/plain, Size: 495 bytes --]
On 12/12/2012 06:46 AM, Paolo Bonzini wrote:
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> hbitmap.c | 48 +++++++++++++++++++++++++++++++++++++
> hbitmap.h | 11 +++++++++
> tests/test-hbitmap.c | 67 ++++++++++++++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 126 insertions(+)
Reviewed-by: Eric Blake <eblake@redhat.com>
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 619 bytes --]
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [Qemu-devel] [PATCH 16/20] block: split bdrv_enable_dirty_tracking and bdrv_disable_dirty_tracking
2012-12-12 13:46 ` [Qemu-devel] [PATCH 16/20] block: split bdrv_enable_dirty_tracking and bdrv_disable_dirty_tracking Paolo Bonzini
@ 2012-12-20 18:26 ` Eric Blake
0 siblings, 0 replies; 53+ messages in thread
From: Eric Blake @ 2012-12-20 18:26 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: kwolf, jcody, qemu-devel, stefanha
[-- Attachment #1: Type: text/plain, Size: 478 bytes --]
On 12/12/2012 06:46 AM, Paolo Bonzini wrote:
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> block-migration.c | 17 +++--------------
> block.c | 22 +++++++++++-----------
> block.h | 3 ++-
> block/mirror.c | 4 ++--
> 4 files changed, 18 insertions(+), 28 deletions(-)
Reviewed-by: Eric Blake <eblake@redhat.com>
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 619 bytes --]
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [Qemu-devel] [PATCH 17/20] block: support a persistent dirty bitmap
2012-12-12 13:46 ` [Qemu-devel] [PATCH 17/20] block: support a persistent dirty bitmap Paolo Bonzini
@ 2012-12-20 23:03 ` Eric Blake
0 siblings, 0 replies; 53+ messages in thread
From: Eric Blake @ 2012-12-20 23:03 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: kwolf, jcody, qemu-devel, stefanha
[-- Attachment #1: Type: text/plain, Size: 1961 bytes --]
On 12/12/2012 06:46 AM, Paolo Bonzini wrote:
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> block.c | 133 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
> block.h | 5 +++
> block_int.h | 5 +++
> 3 files changed, 141 insertions(+), 2 deletions(-)
>
> +void bdrv_enable_persistent_dirty_tracking(BlockDriverState *bs, const char *file,
> + Error **errp)
> +{
> + int rc;
> + int granularity_bits;
> + int64_t bitmap_size;
> + size_t file_size;
> + QEMUMmapArea mm;
> + HBitmap *new_bitmap;
> + bool load;
> +
> + assert(bs->dirty_usage > 0);
> +
> + granularity_bits = hbitmap_granularity(bs->dirty_bitmap);
> + bitmap_size = (bdrv_getlength(bs) >> BDRV_SECTOR_BITS);
> + file_size = hbitmap_required_size(bitmap_size, granularity_bits);
> +
> + load = access(file, R_OK) >= 0;
> + if (bdrv_in_use(bs) && load) {
> + error_set(errp, QERR_DEVICE_IN_USE, bdrv_get_device_name(bs));
> + return;
> + }
> +
> + /*
> + * Do not touch fields in BS until we're sure we can complete
> + * successfully.
> + */
> + rc = qemu_mmap_alloc(&mm, file, file_size);
> + if (rc < 0) {
> + error_set(errp, QERR_OPEN_FILE_FAILED, file);
> + return;
> + }
> +
> + new_bitmap = hbitmap_alloc_with_data(bitmap_size, granularity_bits, mm.mem);
> + if (!load) {
> + hbitmap_copy(new_bitmap, bs->dirty_bitmap);
Here's where my comment earlier in the series, about
hbitmap_alloc_with_data repopulating the upper layers of the hbitmap on
reused data, becomes essential. But since that fix is earlier in the
series, and I didn't see anything else suspicious in this patch, you can
add:
Reviewed-by: Eric Blake <eblake@redhat.com>
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 619 bytes --]
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [Qemu-devel] [PATCH 18/20] mirror: add support for persistent dirty bitmap
2012-12-12 13:46 ` [Qemu-devel] [PATCH 18/20] mirror: add support for " Paolo Bonzini
@ 2012-12-20 23:49 ` Eric Blake
0 siblings, 0 replies; 53+ messages in thread
From: Eric Blake @ 2012-12-20 23:49 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: kwolf, jcody, qemu-devel, stefanha
[-- Attachment #1: Type: text/plain, Size: 645 bytes --]
On 12/12/2012 06:46 AM, Paolo Bonzini wrote:
> A persistent dirty bitmap lets mirroring proceed with no waste of work
> after QEMU exits (either offline, or online when the VM restarts).
> It is also useful in order to communicate to management whether the
> switch to the destination was completed or not.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> block/mirror.c | 15 +++++++++++++++
> 1 file changed, 15 insertions(+)
Sure tiny for what it adds :)
Reviewed-by: Eric Blake <eblake@redhat.com>
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 619 bytes --]
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [Qemu-devel] [PATCH 19/20] block: choose the default dirty bitmap granularity in bdrv_enable_dirty_tracking
2012-12-12 13:46 ` [Qemu-devel] [PATCH 19/20] block: choose the default dirty bitmap granularity in bdrv_enable_dirty_tracking Paolo Bonzini
@ 2012-12-20 23:53 ` Eric Blake
0 siblings, 0 replies; 53+ messages in thread
From: Eric Blake @ 2012-12-20 23:53 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: kwolf, jcody, qemu-devel, stefanha
[-- Attachment #1: Type: text/plain, Size: 677 bytes --]
On 12/12/2012 06:46 AM, Paolo Bonzini wrote:
> This is useful because we want the same logic in the monitor commands
> that enable the persistent dirty bitmap.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> block-migration.c | 2 +-
> block.c | 46 +++++++++++++++++++++++++++++++++++++---------
> block.h | 3 ++-
> block/mirror.c | 30 ++++++++++++++----------------
> blockdev.c | 11 +----------
> 5 files changed, 55 insertions(+), 37 deletions(-)
>
Reviewed-by: Eric Blake <eblake@redhat.com>
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 619 bytes --]
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [Qemu-devel] [PATCH 20/20] monitor: add commands to start/stop dirty bitmap
2012-12-12 13:46 ` [Qemu-devel] [PATCH 20/20] monitor: add commands to start/stop dirty bitmap Paolo Bonzini
@ 2012-12-21 18:30 ` Eric Blake
0 siblings, 0 replies; 53+ messages in thread
From: Eric Blake @ 2012-12-21 18:30 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: kwolf, jcody, qemu-devel, stefanha
[-- Attachment #1: Type: text/plain, Size: 3109 bytes --]
On 12/12/2012 06:46 AM, Paolo Bonzini wrote:
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> blockdev.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++
> blockdev.h | 1 +
> hmp-commands.hx | 39 ++++++++++++++++++++++++++++++++++
> hmp.c | 27 +++++++++++++++++++++++
> hmp.h | 2 ++
> qapi-schema.json | 56 ++++++++++++++++++++++++++++++++++++++++++++++++
> qmp-commands.hx | 65 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 7 files changed, 245 insertions(+)
>
A bit light on the commit message. What is the proper sequence of QMP
commands for using this feature in relation to driver-mirror and such?
For example, let's say I want to start a mirror, then shutdown qemu,
then start a new qemu -S; what is the right sequence for when to call
this to re-initialize the dirty bitmap and resume the mirror at the same
point, all prior to issuing 'cont' to the new qemu instance? Once I
know that, then I can fix libvirt to allow disk-copy across qemu
restarts (right now, libvirt limits disk-copy to transient domains,
since those don't have to worry about restarts).
> +If the dirty bitmap is already active, or used by something else (for
> +example @command{drive_mirror}), the granularity argument must be absent
> +or equal to the active granularity. The granularity must be a power-of-two
> +comprised between 4,096 and 67,108,864.
4k to 64M here...
> +++ b/qapi-schema.json
> @@ -3028,3 +3028,59 @@
> # Since: 1.3.0
> ##
> { 'command': 'nbd-server-stop' }
> +
> +##
> +# @blockdev-dirty-enable:
> +#
> +# @granularity: #optional granularity of the dirty bitmap, default is 64K
> +# if the image format doesn't have clusters, 4K if the clusters
> +# are smaller than that, else the cluster size. Must be a
> +# power of 2 between 512 and 64M.
but 512 here. Which is right?
> +#
> +# Returns: Nothing on success
> +#
> +# Since: 1.3
1.4, now
> +##
> +{ 'command': 'blockdev-dirty-enable',
> + 'data': {'device': 'str', 'filename': 'str', '*granularity': 'uint32' } }
> +
> +##
> +# @blockdev-dirty-disable:
> +#
> +# Stop tracking dirty blocks for a block device. Dirty blocks will
> +# be written to an on-disk file, with one bit per block and an arbitrary
> +# granularity.
> +#
> +# If the dirty bitmap is already active, or used by something else (for
> +# example blockdev-drive-mirror), the granularity argument must be absent
You named it 'drive-mirror', not 'blockdev-drive-mirror'.
> +# or equal to the active granularity.
> +#
> +# @device: the name of the device to track dirty blocks of
> +#
> +# @force: #optional true to immediately stop writing to the dirty
> +# bitmap file; false to do so only when the last user of the
> +# dirty bitmap stops using it (default false)
> +#
> +# Returns: Nothing on success
> +#
> +# Since: 1.3
Again, 1.4.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 619 bytes --]
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [Qemu-devel] [PATCH 02/20] add hierarchical bitmap data type and test cases
2012-12-12 13:46 ` [Qemu-devel] [PATCH 02/20] add hierarchical bitmap data type and test cases Paolo Bonzini
2012-12-14 0:04 ` Eric Blake
@ 2013-01-11 18:27 ` Stefan Hajnoczi
1 sibling, 0 replies; 53+ messages in thread
From: Stefan Hajnoczi @ 2013-01-11 18:27 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: kwolf, jcody, qemu-devel, stefanha
On Wed, Dec 12, 2012 at 02:46:21PM +0100, Paolo Bonzini wrote:
> diff --git a/hbitmap.c b/hbitmap.c
> new file mode 100644
> index 0000000..ae59f39
> --- /dev/null
> +++ b/hbitmap.c
> @@ -0,0 +1,400 @@
> +/*
> + * Hierarchical Bitmap Data Type
> + *
> + * Copyright Red Hat, Inc., 2012
> + *
> + * Author: Paolo Bonzini <pbonzini@redhat.com>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or
> + * later. See the COPYING file in the top-level directory.
> + */
> +
> +#include "osdep.h"
> +#include "hbitmap.h"
> +#include "host-utils.h"
> +#include "trace.h"
> +#include <string.h>
> +#include <glib.h>
> +#include <assert.h>
System headers before application headers, please.
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [Qemu-devel] [PATCH 08/20] mirror: allow customizing the granularity
2012-12-12 13:46 ` [Qemu-devel] [PATCH 08/20] mirror: allow customizing the granularity Paolo Bonzini
2012-12-14 22:01 ` Eric Blake
@ 2013-01-14 11:28 ` Stefan Hajnoczi
1 sibling, 0 replies; 53+ messages in thread
From: Stefan Hajnoczi @ 2013-01-14 11:28 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: kwolf, jcody, qemu-devel, stefanha
On Wed, Dec 12, 2012 at 02:46:27PM +0100, Paolo Bonzini wrote:
> @@ -962,6 +963,7 @@ Arguments:
> file/device (NewImageMode, optional, default 'absolute-paths')
> - "speed": maximum speed of the streaming job, in bytes per second
> (json-int)
> +- "granularity": granularity of the dirty bitmap (json-int, optional)
"in bytes" would be nice although the text you added below implies this.
> - "sync": what parts of the disk image should be copied to the destination;
> possibilities include "full" for all the disk, "top" for only the sectors
> allocated in the topmost image, or "none" to only replicate new I/O
> @@ -971,6 +973,10 @@ Arguments:
> - "on-target-error": the action to take on an error on the target
> (BlockdevOnError, default 'report')
>
> +The default value of the granularity is, if the image format defines
> +a cluster size, the cluster size or 4096, whichever is larger. If it
> +does not define a cluster size, the default value of the granularity
> +is 65536.
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [Qemu-devel] [PATCH 10/20] mirror: add buf-size argument to drive-mirror
2012-12-12 13:46 ` [Qemu-devel] [PATCH 10/20] mirror: add buf-size argument to drive-mirror Paolo Bonzini
2012-12-14 22:22 ` Eric Blake
@ 2013-01-14 11:41 ` Stefan Hajnoczi
1 sibling, 0 replies; 53+ messages in thread
From: Stefan Hajnoczi @ 2013-01-14 11:41 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: kwolf, jcody, qemu-devel, stefanha
On Wed, Dec 12, 2012 at 02:46:29PM +0100, Paolo Bonzini wrote:
> diff --git a/qmp-commands.hx b/qmp-commands.hx
> index d8faad0..97c52c9 100644
> --- a/qmp-commands.hx
> +++ b/qmp-commands.hx
> @@ -939,7 +939,7 @@ EQMP
> .name = "drive-mirror",
> .args_type = "sync:s,device:B,target:s,speed:i?,mode:s?,format:s?,"
> "on-source-error:s?,on-target-error:s?,"
> - "granularity:i?",
> + "buf-size:i?,granularity:i?",
granularity was also added in this patch series, so this doesn't matter,
but in general it's nicer to append optional arguments on the end. This
way, command option ordering doesn't change arbitrarily - that could
confuse users.
Stefan
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [Qemu-devel] [PATCH 11/20] mirror: support more than one in-flight AIO operation
2012-12-12 13:46 ` [Qemu-devel] [PATCH 11/20] mirror: support more than one in-flight AIO operation Paolo Bonzini
2012-12-14 22:32 ` Eric Blake
@ 2013-01-14 12:56 ` Stefan Hajnoczi
2013-01-14 13:28 ` Paolo Bonzini
1 sibling, 1 reply; 53+ messages in thread
From: Stefan Hajnoczi @ 2013-01-14 12:56 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: kwolf, jcody, qemu-devel, stefanha
On Wed, Dec 12, 2012 at 02:46:30PM +0100, Paolo Bonzini wrote:
> @@ -137,21 +163,58 @@ static void coroutine_fn mirror_iteration(MirrorBlockJob *s)
> bdrv_round_to_clusters(s->target,
> sector_num, nb_sectors_chunk,
> §or_num, &nb_sectors);
> - bitmap_set(s->cow_bitmap, sector_num / nb_sectors_chunk,
> - nb_sectors / nb_sectors_chunk);
> +
> + /* The rounding may make us copy sectors before the
> + * first dirty one.
> + */
> + cluster_num = sector_num / nb_sectors_chunk;
> + }
> +
> + /* Wait for I/O to this cluster (from a previous iteration) to be done. */
> + while (test_bit(cluster_num, s->in_flight_bitmap)) {
> + trace_mirror_yield_in_flight(s, sector_num, s->in_flight);
> + qemu_coroutine_yield();
> }
in_flight_bitmap is never set in this patch. Either you'll set it in a
later patch or this is a bug?
Why can we get away with testing only cluster_num and not all bits up to
and including cluster_num + nb_chunks? Is there an assumption that
requests are issued in ascending order?
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [Qemu-devel] [PATCH 00/20] Block device mirroring enhancements, 12-12-12 edition
2012-12-12 13:46 [Qemu-devel] [PATCH 00/20] Block device mirroring enhancements, 12-12-12 edition Paolo Bonzini
` (19 preceding siblings ...)
2012-12-12 13:46 ` [Qemu-devel] [PATCH 20/20] monitor: add commands to start/stop dirty bitmap Paolo Bonzini
@ 2013-01-14 13:02 ` Stefan Hajnoczi
20 siblings, 0 replies; 53+ messages in thread
From: Stefan Hajnoczi @ 2013-01-14 13:02 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: kwolf, jcody, qemu-devel, stefanha
On Wed, Dec 12, 2012 at 02:46:19PM +0100, Paolo Bonzini wrote:
> Here are my block mirroring patches for QEMU 1.4, please review.
>
> Patches 1-12 implement improvements and optimizations to the code
> that went in 1.3.
I had one question which may be a bug: I can't see anything that sets
s->in_flight_bitmap in Patches 1-12. What am I missing.
Other comments were mainly cleanups.
If you don't see this until tomorrow I'll ping you on IRC tomorrow
morning. Patches 1-12 are clean and I'd like to merge them for QEMU
1.4.
> Patches 13-20 (lacking tests for now) implement a persistent dirty
> bitmap. The bitmap can be used to run storage migration jobs across
> multiple executions of the VM or, in the future, offline via qemu-img.
Please split Patches 13-20 into a separate "persistent dirty bitmap"
series.
Stefan
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [Qemu-devel] [PATCH 11/20] mirror: support more than one in-flight AIO operation
2013-01-14 12:56 ` Stefan Hajnoczi
@ 2013-01-14 13:28 ` Paolo Bonzini
0 siblings, 0 replies; 53+ messages in thread
From: Paolo Bonzini @ 2013-01-14 13:28 UTC (permalink / raw)
To: Stefan Hajnoczi; +Cc: kwolf, jcody, qemu-devel, stefanha
Il 14/01/2013 13:56, Stefan Hajnoczi ha scritto:
>> > + /* Wait for I/O to this cluster (from a previous iteration) to be done. */
>> > + while (test_bit(cluster_num, s->in_flight_bitmap)) {
>> > + trace_mirror_yield_in_flight(s, sector_num, s->in_flight);
>> > + qemu_coroutine_yield();
>> > }
> in_flight_bitmap is never set in this patch. Either you'll set it in a
> later patch or this is a bug?
Yes, this must be a rebase bug. Good catch. I'll send an updated
series tomorrow.
> Why can we get away with testing only cluster_num and not all bits up to
> and including cluster_num + nb_chunks? Is there an assumption that
> requests are issued in ascending order?
As of this patch, cluster_num is either a single granularity-sized
cluster or it is rounded to include a full image cluster. If it is
rounded, any other concurrent operation will do the same rounding and
set the bit corresponding to cluster_num.
In the next patch, the same applies when each granularity-sized cluster
is added to the current operation (either alone, or rounded to include a
full image cluster).
I'll add a comment.
Paolo
^ permalink raw reply [flat|nested] 53+ messages in thread
end of thread, other threads:[~2013-01-14 13:29 UTC | newest]
Thread overview: 53+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-12-12 13:46 [Qemu-devel] [PATCH 00/20] Block device mirroring enhancements, 12-12-12 edition Paolo Bonzini
2012-12-12 13:46 ` [Qemu-devel] [PATCH 01/20] host-utils: add ffsl Paolo Bonzini
2012-12-12 23:41 ` Eric Blake
2012-12-12 13:46 ` [Qemu-devel] [PATCH 02/20] add hierarchical bitmap data type and test cases Paolo Bonzini
2012-12-14 0:04 ` Eric Blake
2013-01-11 18:27 ` Stefan Hajnoczi
2012-12-12 13:46 ` [Qemu-devel] [PATCH 03/20] block: implement dirty bitmap using HBitmap Paolo Bonzini
2012-12-14 0:27 ` Eric Blake
2012-12-12 13:46 ` [Qemu-devel] [PATCH 04/20] block: make round_to_clusters public Paolo Bonzini
2012-12-14 20:13 ` Eric Blake
2012-12-12 13:46 ` [Qemu-devel] [PATCH 05/20] mirror: perform COW if the cluster size is bigger than the granularity Paolo Bonzini
2012-12-14 20:21 ` Eric Blake
2012-12-12 13:46 ` [Qemu-devel] [PATCH 06/20] block: return count of dirty sectors, not chunks Paolo Bonzini
2012-12-14 20:49 ` Eric Blake
2012-12-12 13:46 ` [Qemu-devel] [PATCH 07/20] block: allow customizing the granularity of the dirty bitmap Paolo Bonzini
2012-12-14 21:27 ` Eric Blake
2012-12-15 9:11 ` Paolo Bonzini
2012-12-12 13:46 ` [Qemu-devel] [PATCH 08/20] mirror: allow customizing the granularity Paolo Bonzini
2012-12-14 22:01 ` Eric Blake
2013-01-14 11:28 ` Stefan Hajnoczi
2012-12-12 13:46 ` [Qemu-devel] [PATCH 09/20] mirror: switch mirror_iteration to AIO Paolo Bonzini
2012-12-14 22:11 ` Eric Blake
2012-12-15 9:09 ` Paolo Bonzini
2012-12-15 13:05 ` Eric Blake
2012-12-12 13:46 ` [Qemu-devel] [PATCH 10/20] mirror: add buf-size argument to drive-mirror Paolo Bonzini
2012-12-14 22:22 ` Eric Blake
2012-12-15 9:09 ` Paolo Bonzini
2013-01-14 11:41 ` Stefan Hajnoczi
2012-12-12 13:46 ` [Qemu-devel] [PATCH 11/20] mirror: support more than one in-flight AIO operation Paolo Bonzini
2012-12-14 22:32 ` Eric Blake
2013-01-14 12:56 ` Stefan Hajnoczi
2013-01-14 13:28 ` Paolo Bonzini
2012-12-12 13:46 ` [Qemu-devel] [PATCH 12/20] mirror: support arbitrarily-sized iterations Paolo Bonzini
2012-12-14 22:39 ` Eric Blake
2012-12-12 13:46 ` [Qemu-devel] [PATCH 13/20] oslib: add a wrapper for mmap/munmap Paolo Bonzini
2012-12-14 22:54 ` Eric Blake
2012-12-15 9:06 ` Paolo Bonzini
2012-12-12 13:46 ` [Qemu-devel] [PATCH 14/20] hbitmap: add hbitmap_alloc_with_data and hbitmap_required_size Paolo Bonzini
2012-12-17 17:14 ` Eric Blake
2012-12-17 17:18 ` Paolo Bonzini
2012-12-12 13:46 ` [Qemu-devel] [PATCH 15/20] hbitmap: add hbitmap_copy Paolo Bonzini
2012-12-17 18:25 ` Eric Blake
2012-12-12 13:46 ` [Qemu-devel] [PATCH 16/20] block: split bdrv_enable_dirty_tracking and bdrv_disable_dirty_tracking Paolo Bonzini
2012-12-20 18:26 ` Eric Blake
2012-12-12 13:46 ` [Qemu-devel] [PATCH 17/20] block: support a persistent dirty bitmap Paolo Bonzini
2012-12-20 23:03 ` Eric Blake
2012-12-12 13:46 ` [Qemu-devel] [PATCH 18/20] mirror: add support for " Paolo Bonzini
2012-12-20 23:49 ` Eric Blake
2012-12-12 13:46 ` [Qemu-devel] [PATCH 19/20] block: choose the default dirty bitmap granularity in bdrv_enable_dirty_tracking Paolo Bonzini
2012-12-20 23:53 ` Eric Blake
2012-12-12 13:46 ` [Qemu-devel] [PATCH 20/20] monitor: add commands to start/stop dirty bitmap Paolo Bonzini
2012-12-21 18:30 ` Eric Blake
2013-01-14 13:02 ` [Qemu-devel] [PATCH 00/20] Block device mirroring enhancements, 12-12-12 edition Stefan Hajnoczi
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).