[Qemu-devel] [PATCH v2 0/3] blockjob: correct backup cluster size for backups

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH v2 0/3] blockjob: correct backup cluster size for backups
@ 2016-02-22 22:07 John Snow
  2016-02-22 22:07 ` [Qemu-devel] [PATCH v2 1/3] block/backup: make backup cluster size configurable John Snow
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: John Snow @ 2016-02-22 22:07 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, famz, jcody, qemu-devel, stefanha, John Snow

Backups sometimes need a non-64KiB transfer cluster size.
See patch #2 for the detailed justificaton.

===
v2
===

01: Removed "sectors_per_cluster" as a cached property of the Backup Block Job,
    In favor of recomputing it with a small function where needed.
    (I like v1 more. Thoughts?)
02: Expand correction to all backup modes instead of just incremental.
    Added credit: Thanks to Fam Z for noticing this problem!
03: Minor phrasing change in a comment.
    Added r-b.

________________________________________________________________________________

For convenience, this branch is available at:
https://github.com/jnsnow/qemu.git branch incremental-granularity-fix
https://github.com/jnsnow/qemu/tree/incremental-granularity-fix

This version is tagged incremental-granularity-fix-v2:
https://github.com/jnsnow/qemu/releases/tag/incremental-granularity-fix-v2

John Snow (3):
  block/backup: make backup cluster size configurable
  block/backup: avoid copying less than full target clusters
  iotests/124: Add cluster_size mismatch test

 block/backup.c             | 72 ++++++++++++++++++++++++++++------------------
 tests/qemu-iotests/124     | 58 +++++++++++++++++++++++++++++++++----
 tests/qemu-iotests/124.out |  4 +--
 3 files changed, 99 insertions(+), 35 deletions(-)

-- 
2.4.3

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Qemu-devel] [PATCH v2 1/3] block/backup: make backup cluster size configurable
  2016-02-22 22:07 [Qemu-devel] [PATCH v2 0/3] blockjob: correct backup cluster size for backups John Snow
@ 2016-02-22 22:07 ` John Snow
  2016-02-23  5:06   ` Fam Zheng
  2016-02-22 22:07 ` [Qemu-devel] [PATCH v2 2/3] block/backup: avoid copying less than full target clusters John Snow
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 9+ messages in thread
From: John Snow @ 2016-02-22 22:07 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, famz, jcody, qemu-devel, stefanha, John Snow

64K might not always be appropriate, make this a runtime value.

Signed-off-by: John Snow <jsnow@redhat.com>
---
 block/backup.c | 64 +++++++++++++++++++++++++++++++++-------------------------
 1 file changed, 36 insertions(+), 28 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index 00cafdb..76addef 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -21,10 +21,7 @@
 #include "qemu/ratelimit.h"
 #include "sysemu/block-backend.h"
 
-#define BACKUP_CLUSTER_BITS 16
-#define BACKUP_CLUSTER_SIZE (1 << BACKUP_CLUSTER_BITS)
-#define BACKUP_SECTORS_PER_CLUSTER (BACKUP_CLUSTER_SIZE / BDRV_SECTOR_SIZE)
-
+#define BACKUP_CLUSTER_SIZE_DEFAULT (1 << 16)
 #define SLICE_TIME 100000000ULL /* ns */
 
 typedef struct CowRequest {
@@ -46,9 +43,16 @@ typedef struct BackupBlockJob {
     CoRwlock flush_rwlock;
     uint64_t sectors_read;
     HBitmap *bitmap;
+    int64_t cluster_size;
     QLIST_HEAD(, CowRequest) inflight_reqs;
 } BackupBlockJob;
 
+/* Size of a cluster in sectors, instead of bytes. */
+static inline int64_t cluster_size_sectors(BackupBlockJob *job)
+{
+  return job->cluster_size / BDRV_SECTOR_SIZE;
+}
+
 /* See if in-flight requests overlap and wait for them to complete */
 static void coroutine_fn wait_for_overlapping_requests(BackupBlockJob *job,
                                                        int64_t start,
@@ -97,13 +101,14 @@ static int coroutine_fn backup_do_cow(BlockDriverState *bs,
     QEMUIOVector bounce_qiov;
     void *bounce_buffer = NULL;
     int ret = 0;
+    int64_t sectors_per_cluster = cluster_size_sectors(job);
     int64_t start, end;
     int n;
 
     qemu_co_rwlock_rdlock(&job->flush_rwlock);
 
-    start = sector_num / BACKUP_SECTORS_PER_CLUSTER;
-    end = DIV_ROUND_UP(sector_num + nb_sectors, BACKUP_SECTORS_PER_CLUSTER);
+    start = sector_num / sectors_per_cluster;
+    end = DIV_ROUND_UP(sector_num + nb_sectors, sectors_per_cluster);
 
     trace_backup_do_cow_enter(job, start, sector_num, nb_sectors);
 
@@ -118,12 +123,12 @@ static int coroutine_fn backup_do_cow(BlockDriverState *bs,
 
         trace_backup_do_cow_process(job, start);
 
-        n = MIN(BACKUP_SECTORS_PER_CLUSTER,
+        n = MIN(sectors_per_cluster,
                 job->common.len / BDRV_SECTOR_SIZE -
-                start * BACKUP_SECTORS_PER_CLUSTER);
+                start * sectors_per_cluster);
 
         if (!bounce_buffer) {
-            bounce_buffer = qemu_blockalign(bs, BACKUP_CLUSTER_SIZE);
+            bounce_buffer = qemu_blockalign(bs, job->cluster_size);
         }
         iov.iov_base = bounce_buffer;
         iov.iov_len = n * BDRV_SECTOR_SIZE;
@@ -131,10 +136,10 @@ static int coroutine_fn backup_do_cow(BlockDriverState *bs,
 
         if (is_write_notifier) {
             ret = bdrv_co_readv_no_serialising(bs,
-                                           start * BACKUP_SECTORS_PER_CLUSTER,
+                                           start * sectors_per_cluster,
                                            n, &bounce_qiov);
         } else {
-            ret = bdrv_co_readv(bs, start * BACKUP_SECTORS_PER_CLUSTER, n,
+            ret = bdrv_co_readv(bs, start * sectors_per_cluster, n,
                                 &bounce_qiov);
         }
         if (ret < 0) {
@@ -147,11 +152,11 @@ static int coroutine_fn backup_do_cow(BlockDriverState *bs,
 
         if (buffer_is_zero(iov.iov_base, iov.iov_len)) {
             ret = bdrv_co_write_zeroes(job->target,
-                                       start * BACKUP_SECTORS_PER_CLUSTER,
+                                       start * sectors_per_cluster,
                                        n, BDRV_REQ_MAY_UNMAP);
         } else {
             ret = bdrv_co_writev(job->target,
-                                 start * BACKUP_SECTORS_PER_CLUSTER, n,
+                                 start * sectors_per_cluster, n,
                                  &bounce_qiov);
         }
         if (ret < 0) {
@@ -322,21 +327,22 @@ static int coroutine_fn backup_run_incremental(BackupBlockJob *job)
     int64_t cluster;
     int64_t end;
     int64_t last_cluster = -1;
+    int64_t sectors_per_cluster = cluster_size_sectors(job);
     BlockDriverState *bs = job->common.bs;
     HBitmapIter hbi;
 
     granularity = bdrv_dirty_bitmap_granularity(job->sync_bitmap);
-    clusters_per_iter = MAX((granularity / BACKUP_CLUSTER_SIZE), 1);
+    clusters_per_iter = MAX((granularity / job->cluster_size), 1);
     bdrv_dirty_iter_init(job->sync_bitmap, &hbi);
 
     /* Find the next dirty sector(s) */
     while ((sector = hbitmap_iter_next(&hbi)) != -1) {
-        cluster = sector / BACKUP_SECTORS_PER_CLUSTER;
+        cluster = sector / sectors_per_cluster;
 
         /* Fake progress updates for any clusters we skipped */
         if (cluster != last_cluster + 1) {
             job->common.offset += ((cluster - last_cluster - 1) *
-                                   BACKUP_CLUSTER_SIZE);
+                                   job->cluster_size);
         }
 
         for (end = cluster + clusters_per_iter; cluster < end; cluster++) {
@@ -344,8 +350,8 @@ static int coroutine_fn backup_run_incremental(BackupBlockJob *job)
                 if (yield_and_check(job)) {
                     return ret;
                 }
-                ret = backup_do_cow(bs, cluster * BACKUP_SECTORS_PER_CLUSTER,
-                                    BACKUP_SECTORS_PER_CLUSTER, &error_is_read,
+                ret = backup_do_cow(bs, cluster * sectors_per_cluster,
+                                    sectors_per_cluster, &error_is_read,
                                     false);
                 if ((ret < 0) &&
                     backup_error_action(job, error_is_read, -ret) ==
@@ -357,17 +363,17 @@ static int coroutine_fn backup_run_incremental(BackupBlockJob *job)
 
         /* If the bitmap granularity is smaller than the backup granularity,
          * we need to advance the iterator pointer to the next cluster. */
-        if (granularity < BACKUP_CLUSTER_SIZE) {
-            bdrv_set_dirty_iter(&hbi, cluster * BACKUP_SECTORS_PER_CLUSTER);
+        if (granularity < job->cluster_size) {
+            bdrv_set_dirty_iter(&hbi, cluster * sectors_per_cluster);
         }
 
         last_cluster = cluster - 1;
     }
 
     /* Play some final catchup with the progress meter */
-    end = DIV_ROUND_UP(job->common.len, BACKUP_CLUSTER_SIZE);
+    end = DIV_ROUND_UP(job->common.len, job->cluster_size);
     if (last_cluster + 1 < end) {
-        job->common.offset += ((end - last_cluster - 1) * BACKUP_CLUSTER_SIZE);
+        job->common.offset += ((end - last_cluster - 1) * job->cluster_size);
     }
 
     return ret;
@@ -384,13 +390,14 @@ static void coroutine_fn backup_run(void *opaque)
         .notify = backup_before_write_notify,
     };
     int64_t start, end;
+    int64_t sectors_per_cluster = cluster_size_sectors(job);
     int ret = 0;
 
     QLIST_INIT(&job->inflight_reqs);
     qemu_co_rwlock_init(&job->flush_rwlock);
 
     start = 0;
-    end = DIV_ROUND_UP(job->common.len, BACKUP_CLUSTER_SIZE);
+    end = DIV_ROUND_UP(job->common.len, job->cluster_size);
 
     job->bitmap = hbitmap_alloc(end, 0);
 
@@ -427,7 +434,7 @@ static void coroutine_fn backup_run(void *opaque)
                 /* Check to see if these blocks are already in the
                  * backing file. */
 
-                for (i = 0; i < BACKUP_SECTORS_PER_CLUSTER;) {
+                for (i = 0; i < sectors_per_cluster;) {
                     /* bdrv_is_allocated() only returns true/false based
                      * on the first set of sectors it comes across that
                      * are are all in the same state.
@@ -436,8 +443,8 @@ static void coroutine_fn backup_run(void *opaque)
                      * needed but at some point that is always the case. */
                     alloced =
                         bdrv_is_allocated(bs,
-                                start * BACKUP_SECTORS_PER_CLUSTER + i,
-                                BACKUP_SECTORS_PER_CLUSTER - i, &n);
+                                start * sectors_per_cluster + i,
+                                sectors_per_cluster - i, &n);
                     i += n;
 
                     if (alloced == 1 || n == 0) {
@@ -452,8 +459,8 @@ static void coroutine_fn backup_run(void *opaque)
                 }
             }
             /* FULL sync mode we copy the whole drive. */
-            ret = backup_do_cow(bs, start * BACKUP_SECTORS_PER_CLUSTER,
-                    BACKUP_SECTORS_PER_CLUSTER, &error_is_read, false);
+            ret = backup_do_cow(bs, start * sectors_per_cluster,
+                                sectors_per_cluster, &error_is_read, false);
             if (ret < 0) {
                 /* Depending on error action, fail now or retry cluster */
                 BlockErrorAction action =
@@ -571,6 +578,7 @@ void backup_start(BlockDriverState *bs, BlockDriverState *target,
     job->sync_mode = sync_mode;
     job->sync_bitmap = sync_mode == MIRROR_SYNC_MODE_INCREMENTAL ?
                        sync_bitmap : NULL;
+    job->cluster_size = BACKUP_CLUSTER_SIZE_DEFAULT;
     job->common.len = len;
     job->common.co = qemu_coroutine_create(backup_run);
     block_job_txn_add_job(txn, &job->common);
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [Qemu-devel] [PATCH v2 2/3] block/backup: avoid copying less than full target clusters
  2016-02-22 22:07 [Qemu-devel] [PATCH v2 0/3] blockjob: correct backup cluster size for backups John Snow
  2016-02-22 22:07 ` [Qemu-devel] [PATCH v2 1/3] block/backup: make backup cluster size configurable John Snow
@ 2016-02-22 22:07 ` John Snow
  2016-02-23  5:08   ` Fam Zheng
  2016-02-22 22:07 ` [Qemu-devel] [PATCH v2 3/3] iotests/124: Add cluster_size mismatch test John Snow
  2016-03-02 10:48 ` [Qemu-devel] [Qemu-block] [PATCH v2 0/3] blockjob: correct backup cluster size for backups Stefan Hajnoczi
  3 siblings, 1 reply; 9+ messages in thread
From: John Snow @ 2016-02-22 22:07 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, famz, jcody, qemu-devel, stefanha, John Snow

During incremental backups, if the target has a cluster size that is
larger than the backup cluster size and we are backing up to a target
that cannot (for whichever reason) pull clusters up from a backing image,
we may inadvertantly create unusable incremental backup images.

For example:

If the bitmap tracks changes at a 64KB granularity and we transmit 64KB
of data at a time but the target uses a 128KB cluster size, it is
possible that only half of a target cluster will be recognized as dirty
by the backup block job. When the cluster is allocated on the target
image but only half populated with data, we lose the ability to
distinguish between zero padding and uninitialized data.

This does not happen if the target image has a backing file that points
to the last known good backup.

Even if we have a backing file, though, it's likely going to be faster
to just buffer the redundant data ourselves from the live image than
fetching it from the backing file, so let's just always round up to the
target granularity.

The same logic applies to backup modes top, none, and full. Copying
fractional clusters without the guarantee of COW is dangerous, but even
if we can rely on COW, it's likely better to just re-copy the data.

Reported-by: Fam Zheng <famz@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
---
 block/backup.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/block/backup.c b/block/backup.c
index 76addef..a9a4d5c 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -501,6 +501,7 @@ void backup_start(BlockDriverState *bs, BlockDriverState *target,
                   BlockJobTxn *txn, Error **errp)
 {
     int64_t len;
+    BlockDriverInfo bdi;

     assert(bs);
     assert(target);
@@ -578,7 +579,14 @@ void backup_start(BlockDriverState *bs, BlockDriverState *target,
     job->sync_mode = sync_mode;
     job->sync_bitmap = sync_mode == MIRROR_SYNC_MODE_INCREMENTAL ?
                        sync_bitmap : NULL;
-    job->cluster_size = BACKUP_CLUSTER_SIZE_DEFAULT;
+
+    /* If there is no backing file on the target, we cannot rely on COW if our
+     * backup cluster size is smaller than the target cluster size. Instead of
+     * checking for a backing file, we assume that just copying the data in the
+     * backup loop is comparable to the unreliable COW. */
+    bdrv_get_info(job->target, &bdi);
+    job->cluster_size = MAX(BACKUP_CLUSTER_SIZE_DEFAULT, bdi.cluster_size);
+
     job->common.len = len;
     job->common.co = qemu_coroutine_create(backup_run);
     block_job_txn_add_job(txn, &job->common);
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [Qemu-devel] [PATCH v2 3/3] iotests/124: Add cluster_size mismatch test
  2016-02-22 22:07 [Qemu-devel] [PATCH v2 0/3] blockjob: correct backup cluster size for backups John Snow
  2016-02-22 22:07 ` [Qemu-devel] [PATCH v2 1/3] block/backup: make backup cluster size configurable John Snow
  2016-02-22 22:07 ` [Qemu-devel] [PATCH v2 2/3] block/backup: avoid copying less than full target clusters John Snow
@ 2016-02-22 22:07 ` John Snow
  2016-03-02 10:48 ` [Qemu-devel] [Qemu-block] [PATCH v2 0/3] blockjob: correct backup cluster size for backups Stefan Hajnoczi
  3 siblings, 0 replies; 9+ messages in thread
From: John Snow @ 2016-02-22 22:07 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, famz, jcody, qemu-devel, stefanha, John Snow

If a backing file isn't specified in the target image and the
cluster_size is larger than the bitmap granularity, we run the risk of
creating bitmaps with allocated clusters but empty/no data which will
prevent the proper reading of the backup in the future.

Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
---
 tests/qemu-iotests/124     | 58 ++++++++++++++++++++++++++++++++++++++++++----
 tests/qemu-iotests/124.out |  4 ++--
 2 files changed, 55 insertions(+), 7 deletions(-)

diff --git a/tests/qemu-iotests/124 b/tests/qemu-iotests/124
index 7d33422..de7cdbe 100644
--- a/tests/qemu-iotests/124
+++ b/tests/qemu-iotests/124
@@ -132,14 +132,16 @@ class TestIncrementalBackupBase(iotests.QMPTestCase):
 
 
     def img_create(self, img, fmt=iotests.imgfmt, size='64M',
-                   parent=None, parentFormat=None):
+                   parent=None, parentFormat=None, **kwargs):
+        optargs = []
+        for k,v in kwargs.iteritems():
+            optargs = optargs + ['-o', '%s=%s' % (k,v)]
+        args = ['create', '-f', fmt] + optargs + [img, size]
         if parent:
             if parentFormat is None:
                 parentFormat = fmt
-            iotests.qemu_img('create', '-f', fmt, img, size,
-                             '-b', parent, '-F', parentFormat)
-        else:
-            iotests.qemu_img('create', '-f', fmt, img, size)
+            args = args + ['-b', parent, '-F', parentFormat]
+        iotests.qemu_img(*args)
         self.files.append(img)
 
 
@@ -307,6 +309,52 @@ class TestIncrementalBackup(TestIncrementalBackupBase):
         return self.do_incremental_simple(granularity=131072)
 
 
+    def test_larger_cluster_target(self):
+        '''
+        Test: Create and verify backups made to a larger cluster size target.
+
+        With a default granularity of 64KiB, verify that backups made to a
+        larger cluster size target of 128KiB without a backing file works.
+        '''
+        drive0 = self.drives[0]
+
+        # Create a cluster_size=128k full backup / "anchor" backup
+        self.img_create(drive0['backup'], cluster_size='128k')
+        self.assertTrue(self.do_qmp_backup(device=drive0['id'], sync='full',
+                                           format=drive0['fmt'],
+                                           target=drive0['backup'],
+                                           mode='existing'))
+
+        # Create bitmap and dirty it with some new writes.
+        # overwrite [32736, 32799] which will dirty bitmap clusters at
+        # 32M-64K and 32M. 32M+64K will be left undirtied.
+        bitmap0 = self.add_bitmap('bitmap0', drive0)
+        self.hmp_io_writes(drive0['id'],
+                           (('0xab', 0, 512),
+                            ('0xfe', '16M', '256k'),
+                            ('0x64', '32736k', '64k')))
+
+
+        # Prepare a cluster_size=128k backup target without a backing file.
+        (target, _) = bitmap0.new_target()
+        self.img_create(target, bitmap0.drive['fmt'], cluster_size='128k')
+
+        # Perform Incremental Backup
+        self.assertTrue(self.do_qmp_backup(device=bitmap0.drive['id'],
+                                           sync='incremental',
+                                           bitmap=bitmap0.name,
+                                           format=bitmap0.drive['fmt'],
+                                           target=target,
+                                           mode='existing'))
+        self.make_reference_backup(bitmap0)
+
+        # Add the backing file, then compare and exit.
+        iotests.qemu_img('rebase', '-f', drive0['fmt'], '-u', '-b',
+                         drive0['backup'], '-F', drive0['fmt'], target)
+        self.vm.shutdown()
+        self.check_backups()
+
+
     def test_incremental_transaction(self):
         '''Test: Verify backups made from transactionally created bitmaps.
 
diff --git a/tests/qemu-iotests/124.out b/tests/qemu-iotests/124.out
index dae404e..36376be 100644
--- a/tests/qemu-iotests/124.out
+++ b/tests/qemu-iotests/124.out
@@ -1,5 +1,5 @@
-.........
+..........
 ----------------------------------------------------------------------
-Ran 9 tests
+Ran 10 tests
 
 OK
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/3] block/backup: make backup cluster size configurable
  2016-02-22 22:07 ` [Qemu-devel] [PATCH v2 1/3] block/backup: make backup cluster size configurable John Snow
@ 2016-02-23  5:06   ` Fam Zheng
  0 siblings, 0 replies; 9+ messages in thread
From: Fam Zheng @ 2016-02-23  5:06 UTC (permalink / raw)
  To: John Snow; +Cc: kwolf, jcody, stefanha, qemu-devel, qemu-block

On Mon, 02/22 17:07, John Snow wrote:
> 64K might not always be appropriate, make this a runtime value.
> 
> Signed-off-by: John Snow <jsnow@redhat.com>
> ---
>  block/backup.c | 64 +++++++++++++++++++++++++++++++++-------------------------
>  1 file changed, 36 insertions(+), 28 deletions(-)
> 
> diff --git a/block/backup.c b/block/backup.c
> index 00cafdb..76addef 100644
> --- a/block/backup.c
> +++ b/block/backup.c
> @@ -21,10 +21,7 @@
>  #include "qemu/ratelimit.h"
>  #include "sysemu/block-backend.h"
>  
> -#define BACKUP_CLUSTER_BITS 16
> -#define BACKUP_CLUSTER_SIZE (1 << BACKUP_CLUSTER_BITS)
> -#define BACKUP_SECTORS_PER_CLUSTER (BACKUP_CLUSTER_SIZE / BDRV_SECTOR_SIZE)
> -
> +#define BACKUP_CLUSTER_SIZE_DEFAULT (1 << 16)
>  #define SLICE_TIME 100000000ULL /* ns */
>  
>  typedef struct CowRequest {
> @@ -46,9 +43,16 @@ typedef struct BackupBlockJob {
>      CoRwlock flush_rwlock;
>      uint64_t sectors_read;
>      HBitmap *bitmap;
> +    int64_t cluster_size;
>      QLIST_HEAD(, CowRequest) inflight_reqs;
>  } BackupBlockJob;
>  
> +/* Size of a cluster in sectors, instead of bytes. */
> +static inline int64_t cluster_size_sectors(BackupBlockJob *job)
> +{
> +  return job->cluster_size / BDRV_SECTOR_SIZE;
> +}
> +
>  /* See if in-flight requests overlap and wait for them to complete */
>  static void coroutine_fn wait_for_overlapping_requests(BackupBlockJob *job,
>                                                         int64_t start,
> @@ -97,13 +101,14 @@ static int coroutine_fn backup_do_cow(BlockDriverState *bs,
>      QEMUIOVector bounce_qiov;
>      void *bounce_buffer = NULL;
>      int ret = 0;
> +    int64_t sectors_per_cluster = cluster_size_sectors(job);
>      int64_t start, end;
>      int n;
>  
>      qemu_co_rwlock_rdlock(&job->flush_rwlock);
>  
> -    start = sector_num / BACKUP_SECTORS_PER_CLUSTER;
> -    end = DIV_ROUND_UP(sector_num + nb_sectors, BACKUP_SECTORS_PER_CLUSTER);
> +    start = sector_num / sectors_per_cluster;
> +    end = DIV_ROUND_UP(sector_num + nb_sectors, sectors_per_cluster);
>  
>      trace_backup_do_cow_enter(job, start, sector_num, nb_sectors);
>  
> @@ -118,12 +123,12 @@ static int coroutine_fn backup_do_cow(BlockDriverState *bs,
>  
>          trace_backup_do_cow_process(job, start);
>  
> -        n = MIN(BACKUP_SECTORS_PER_CLUSTER,
> +        n = MIN(sectors_per_cluster,
>                  job->common.len / BDRV_SECTOR_SIZE -
> -                start * BACKUP_SECTORS_PER_CLUSTER);
> +                start * sectors_per_cluster);
>  
>          if (!bounce_buffer) {
> -            bounce_buffer = qemu_blockalign(bs, BACKUP_CLUSTER_SIZE);
> +            bounce_buffer = qemu_blockalign(bs, job->cluster_size);
>          }
>          iov.iov_base = bounce_buffer;
>          iov.iov_len = n * BDRV_SECTOR_SIZE;
> @@ -131,10 +136,10 @@ static int coroutine_fn backup_do_cow(BlockDriverState *bs,
>  
>          if (is_write_notifier) {
>              ret = bdrv_co_readv_no_serialising(bs,
> -                                           start * BACKUP_SECTORS_PER_CLUSTER,
> +                                           start * sectors_per_cluster,
>                                             n, &bounce_qiov);
>          } else {
> -            ret = bdrv_co_readv(bs, start * BACKUP_SECTORS_PER_CLUSTER, n,
> +            ret = bdrv_co_readv(bs, start * sectors_per_cluster, n,
>                                  &bounce_qiov);
>          }
>          if (ret < 0) {
> @@ -147,11 +152,11 @@ static int coroutine_fn backup_do_cow(BlockDriverState *bs,
>  
>          if (buffer_is_zero(iov.iov_base, iov.iov_len)) {
>              ret = bdrv_co_write_zeroes(job->target,
> -                                       start * BACKUP_SECTORS_PER_CLUSTER,
> +                                       start * sectors_per_cluster,
>                                         n, BDRV_REQ_MAY_UNMAP);
>          } else {
>              ret = bdrv_co_writev(job->target,
> -                                 start * BACKUP_SECTORS_PER_CLUSTER, n,
> +                                 start * sectors_per_cluster, n,
>                                   &bounce_qiov);
>          }
>          if (ret < 0) {
> @@ -322,21 +327,22 @@ static int coroutine_fn backup_run_incremental(BackupBlockJob *job)
>      int64_t cluster;
>      int64_t end;
>      int64_t last_cluster = -1;
> +    int64_t sectors_per_cluster = cluster_size_sectors(job);
>      BlockDriverState *bs = job->common.bs;
>      HBitmapIter hbi;
>  
>      granularity = bdrv_dirty_bitmap_granularity(job->sync_bitmap);
> -    clusters_per_iter = MAX((granularity / BACKUP_CLUSTER_SIZE), 1);
> +    clusters_per_iter = MAX((granularity / job->cluster_size), 1);
>      bdrv_dirty_iter_init(job->sync_bitmap, &hbi);
>  
>      /* Find the next dirty sector(s) */
>      while ((sector = hbitmap_iter_next(&hbi)) != -1) {
> -        cluster = sector / BACKUP_SECTORS_PER_CLUSTER;
> +        cluster = sector / sectors_per_cluster;
>  
>          /* Fake progress updates for any clusters we skipped */
>          if (cluster != last_cluster + 1) {
>              job->common.offset += ((cluster - last_cluster - 1) *
> -                                   BACKUP_CLUSTER_SIZE);
> +                                   job->cluster_size);
>          }
>  
>          for (end = cluster + clusters_per_iter; cluster < end; cluster++) {
> @@ -344,8 +350,8 @@ static int coroutine_fn backup_run_incremental(BackupBlockJob *job)
>                  if (yield_and_check(job)) {
>                      return ret;
>                  }
> -                ret = backup_do_cow(bs, cluster * BACKUP_SECTORS_PER_CLUSTER,
> -                                    BACKUP_SECTORS_PER_CLUSTER, &error_is_read,
> +                ret = backup_do_cow(bs, cluster * sectors_per_cluster,
> +                                    sectors_per_cluster, &error_is_read,
>                                      false);
>                  if ((ret < 0) &&
>                      backup_error_action(job, error_is_read, -ret) ==
> @@ -357,17 +363,17 @@ static int coroutine_fn backup_run_incremental(BackupBlockJob *job)
>  
>          /* If the bitmap granularity is smaller than the backup granularity,
>           * we need to advance the iterator pointer to the next cluster. */
> -        if (granularity < BACKUP_CLUSTER_SIZE) {
> -            bdrv_set_dirty_iter(&hbi, cluster * BACKUP_SECTORS_PER_CLUSTER);
> +        if (granularity < job->cluster_size) {
> +            bdrv_set_dirty_iter(&hbi, cluster * sectors_per_cluster);
>          }
>  
>          last_cluster = cluster - 1;
>      }
>  
>      /* Play some final catchup with the progress meter */
> -    end = DIV_ROUND_UP(job->common.len, BACKUP_CLUSTER_SIZE);
> +    end = DIV_ROUND_UP(job->common.len, job->cluster_size);
>      if (last_cluster + 1 < end) {
> -        job->common.offset += ((end - last_cluster - 1) * BACKUP_CLUSTER_SIZE);
> +        job->common.offset += ((end - last_cluster - 1) * job->cluster_size);
>      }
>  
>      return ret;
> @@ -384,13 +390,14 @@ static void coroutine_fn backup_run(void *opaque)
>          .notify = backup_before_write_notify,
>      };
>      int64_t start, end;
> +    int64_t sectors_per_cluster = cluster_size_sectors(job);
>      int ret = 0;
>  
>      QLIST_INIT(&job->inflight_reqs);
>      qemu_co_rwlock_init(&job->flush_rwlock);
>  
>      start = 0;
> -    end = DIV_ROUND_UP(job->common.len, BACKUP_CLUSTER_SIZE);
> +    end = DIV_ROUND_UP(job->common.len, job->cluster_size);
>  
>      job->bitmap = hbitmap_alloc(end, 0);
>  
> @@ -427,7 +434,7 @@ static void coroutine_fn backup_run(void *opaque)
>                  /* Check to see if these blocks are already in the
>                   * backing file. */
>  
> -                for (i = 0; i < BACKUP_SECTORS_PER_CLUSTER;) {
> +                for (i = 0; i < sectors_per_cluster;) {
>                      /* bdrv_is_allocated() only returns true/false based
>                       * on the first set of sectors it comes across that
>                       * are are all in the same state.
> @@ -436,8 +443,8 @@ static void coroutine_fn backup_run(void *opaque)
>                       * needed but at some point that is always the case. */
>                      alloced =
>                          bdrv_is_allocated(bs,
> -                                start * BACKUP_SECTORS_PER_CLUSTER + i,
> -                                BACKUP_SECTORS_PER_CLUSTER - i, &n);
> +                                start * sectors_per_cluster + i,
> +                                sectors_per_cluster - i, &n);
>                      i += n;
>  
>                      if (alloced == 1 || n == 0) {
> @@ -452,8 +459,8 @@ static void coroutine_fn backup_run(void *opaque)
>                  }
>              }
>              /* FULL sync mode we copy the whole drive. */
> -            ret = backup_do_cow(bs, start * BACKUP_SECTORS_PER_CLUSTER,
> -                    BACKUP_SECTORS_PER_CLUSTER, &error_is_read, false);
> +            ret = backup_do_cow(bs, start * sectors_per_cluster,
> +                                sectors_per_cluster, &error_is_read, false);
>              if (ret < 0) {
>                  /* Depending on error action, fail now or retry cluster */
>                  BlockErrorAction action =
> @@ -571,6 +578,7 @@ void backup_start(BlockDriverState *bs, BlockDriverState *target,
>      job->sync_mode = sync_mode;
>      job->sync_bitmap = sync_mode == MIRROR_SYNC_MODE_INCREMENTAL ?
>                         sync_bitmap : NULL;
> +    job->cluster_size = BACKUP_CLUSTER_SIZE_DEFAULT;
>      job->common.len = len;
>      job->common.co = qemu_coroutine_create(backup_run);
>      block_job_txn_add_job(txn, &job->common);
> -- 
> 2.4.3
> 

Reviewed-by: Fam Zheng <famz@redhat.com>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/3] block/backup: avoid copying less than full target clusters
  2016-02-22 22:07 ` [Qemu-devel] [PATCH v2 2/3] block/backup: avoid copying less than full target clusters John Snow
@ 2016-02-23  5:08   ` Fam Zheng
  2016-02-23 17:16     ` John Snow
  0 siblings, 1 reply; 9+ messages in thread
From: Fam Zheng @ 2016-02-23  5:08 UTC (permalink / raw)
  To: John Snow; +Cc: kwolf, jcody, stefanha, qemu-devel, qemu-block

On Mon, 02/22 17:07, John Snow wrote:
> During incremental backups, if the target has a cluster size that is
> larger than the backup cluster size and we are backing up to a target
> that cannot (for whichever reason) pull clusters up from a backing image,
> we may inadvertantly create unusable incremental backup images.
> 
> For example:
> 
> If the bitmap tracks changes at a 64KB granularity and we transmit 64KB
> of data at a time but the target uses a 128KB cluster size, it is
> possible that only half of a target cluster will be recognized as dirty
> by the backup block job. When the cluster is allocated on the target
> image but only half populated with data, we lose the ability to
> distinguish between zero padding and uninitialized data.
> 
> This does not happen if the target image has a backing file that points
> to the last known good backup.
> 
> Even if we have a backing file, though, it's likely going to be faster
> to just buffer the redundant data ourselves from the live image than
> fetching it from the backing file, so let's just always round up to the
> target granularity.
> 
> The same logic applies to backup modes top, none, and full. Copying
> fractional clusters without the guarantee of COW is dangerous, but even
> if we can rely on COW, it's likely better to just re-copy the data.
> 
> Reported-by: Fam Zheng <famz@redhat.com>
> Signed-off-by: John Snow <jsnow@redhat.com>
> ---
>  block/backup.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/block/backup.c b/block/backup.c
> index 76addef..a9a4d5c 100644
> --- a/block/backup.c
> +++ b/block/backup.c
> @@ -501,6 +501,7 @@ void backup_start(BlockDriverState *bs, BlockDriverState *target,
>                    BlockJobTxn *txn, Error **errp)
>  {
>      int64_t len;
> +    BlockDriverInfo bdi;
>  
>      assert(bs);
>      assert(target);
> @@ -578,7 +579,14 @@ void backup_start(BlockDriverState *bs, BlockDriverState *target,
>      job->sync_mode = sync_mode;
>      job->sync_bitmap = sync_mode == MIRROR_SYNC_MODE_INCREMENTAL ?
>                         sync_bitmap : NULL;
> -    job->cluster_size = BACKUP_CLUSTER_SIZE_DEFAULT;
> +
> +    /* If there is no backing file on the target, we cannot rely on COW if our
> +     * backup cluster size is smaller than the target cluster size. Instead of
> +     * checking for a backing file, we assume that just copying the data in the
> +     * backup loop is comparable to the unreliable COW. */
> +    bdrv_get_info(job->target, &bdi);

bdrv_get_info can fail and bdi fields are uninitialized.  Pleae test the return
value and handle the error.

Fam

> +    job->cluster_size = MAX(BACKUP_CLUSTER_SIZE_DEFAULT, bdi.cluster_size);
> +
>      job->common.len = len;
>      job->common.co = qemu_coroutine_create(backup_run);
>      block_job_txn_add_job(txn, &job->common);
> -- 
> 2.4.3
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/3] block/backup: avoid copying less than full target clusters
  2016-02-23  5:08   ` Fam Zheng
@ 2016-02-23 17:16     ` John Snow
  0 siblings, 0 replies; 9+ messages in thread
From: John Snow @ 2016-02-23 17:16 UTC (permalink / raw)
  To: Fam Zheng; +Cc: kwolf, jcody, stefanha, qemu-devel, qemu-block



On 02/23/2016 12:08 AM, Fam Zheng wrote:
> On Mon, 02/22 17:07, John Snow wrote:
>> During incremental backups, if the target has a cluster size that is
>> larger than the backup cluster size and we are backing up to a target
>> that cannot (for whichever reason) pull clusters up from a backing image,
>> we may inadvertantly create unusable incremental backup images.
>>
>> For example:
>>
>> If the bitmap tracks changes at a 64KB granularity and we transmit 64KB
>> of data at a time but the target uses a 128KB cluster size, it is
>> possible that only half of a target cluster will be recognized as dirty
>> by the backup block job. When the cluster is allocated on the target
>> image but only half populated with data, we lose the ability to
>> distinguish between zero padding and uninitialized data.
>>
>> This does not happen if the target image has a backing file that points
>> to the last known good backup.
>>
>> Even if we have a backing file, though, it's likely going to be faster
>> to just buffer the redundant data ourselves from the live image than
>> fetching it from the backing file, so let's just always round up to the
>> target granularity.
>>
>> The same logic applies to backup modes top, none, and full. Copying
>> fractional clusters without the guarantee of COW is dangerous, but even
>> if we can rely on COW, it's likely better to just re-copy the data.
>>
>> Reported-by: Fam Zheng <famz@redhat.com>
>> Signed-off-by: John Snow <jsnow@redhat.com>
>> ---
>>  block/backup.c | 10 +++++++++-
>>  1 file changed, 9 insertions(+), 1 deletion(-)
>>
>> diff --git a/block/backup.c b/block/backup.c
>> index 76addef..a9a4d5c 100644
>> --- a/block/backup.c
>> +++ b/block/backup.c
>> @@ -501,6 +501,7 @@ void backup_start(BlockDriverState *bs, BlockDriverState *target,
>>                    BlockJobTxn *txn, Error **errp)
>>  {
>>      int64_t len;
>> +    BlockDriverInfo bdi;
>>  
>>      assert(bs);
>>      assert(target);
>> @@ -578,7 +579,14 @@ void backup_start(BlockDriverState *bs, BlockDriverState *target,
>>      job->sync_mode = sync_mode;
>>      job->sync_bitmap = sync_mode == MIRROR_SYNC_MODE_INCREMENTAL ?
>>                         sync_bitmap : NULL;
>> -    job->cluster_size = BACKUP_CLUSTER_SIZE_DEFAULT;
>> +
>> +    /* If there is no backing file on the target, we cannot rely on COW if our
>> +     * backup cluster size is smaller than the target cluster size. Instead of
>> +     * checking for a backing file, we assume that just copying the data in the
>> +     * backup loop is comparable to the unreliable COW. */
>> +    bdrv_get_info(job->target, &bdi);
> 
> bdrv_get_info can fail and bdi fields are uninitialized.  Pleae test the return
> value and handle the error.
> 
> Fam
> 

You're right. I thought it always did the memset, but it does have a
failure route that is open prior to the memset.

>> +    job->cluster_size = MAX(BACKUP_CLUSTER_SIZE_DEFAULT, bdi.cluster_size);
>> +
>>      job->common.len = len;
>>      job->common.co = qemu_coroutine_create(backup_run);
>>      block_job_txn_add_job(txn, &job->common);
>> -- 
>> 2.4.3
>>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v2 0/3] blockjob: correct backup cluster size for backups
  2016-02-22 22:07 [Qemu-devel] [PATCH v2 0/3] blockjob: correct backup cluster size for backups John Snow
                   ` (2 preceding siblings ...)
  2016-02-22 22:07 ` [Qemu-devel] [PATCH v2 3/3] iotests/124: Add cluster_size mismatch test John Snow
@ 2016-03-02 10:48 ` Stefan Hajnoczi
  2016-03-02 11:03   ` Fam Zheng
  3 siblings, 1 reply; 9+ messages in thread
From: Stefan Hajnoczi @ 2016-03-02 10:48 UTC (permalink / raw)
  To: John Snow; +Cc: kwolf, famz, qemu-block, Jeff Cody, qemu-devel, stefanha

[-- Attachment #1: Type: text/plain, Size: 1521 bytes --]

On Mon, Feb 22, 2016 at 05:07:04PM -0500, John Snow wrote:
> Backups sometimes need a non-64KiB transfer cluster size.
> See patch #2 for the detailed justificaton.
> 
> ===
> v2
> ===
> 
> 01: Removed "sectors_per_cluster" as a cached property of the Backup Block Job,
>     In favor of recomputing it with a small function where needed.
>     (I like v1 more. Thoughts?)
> 02: Expand correction to all backup modes instead of just incremental.
>     Added credit: Thanks to Fam Z for noticing this problem!
> 03: Minor phrasing change in a comment.
>     Added r-b.
> 
> ________________________________________________________________________________
> 
> For convenience, this branch is available at:
> https://github.com/jnsnow/qemu.git branch incremental-granularity-fix
> https://github.com/jnsnow/qemu/tree/incremental-granularity-fix
> 
> This version is tagged incremental-granularity-fix-v2:
> https://github.com/jnsnow/qemu/releases/tag/incremental-granularity-fix-v2
> 
> John Snow (3):
>   block/backup: make backup cluster size configurable
>   block/backup: avoid copying less than full target clusters
>   iotests/124: Add cluster_size mismatch test
> 
>  block/backup.c             | 72 ++++++++++++++++++++++++++++------------------
>  tests/qemu-iotests/124     | 58 +++++++++++++++++++++++++++++++++----
>  tests/qemu-iotests/124.out |  4 +--
>  3 files changed, 99 insertions(+), 35 deletions(-)

Please CC Jeff Cody.  He maintains block jobs.

Thanks,
Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v2 0/3] blockjob: correct backup cluster size for backups
  2016-03-02 10:48 ` [Qemu-devel] [Qemu-block] [PATCH v2 0/3] blockjob: correct backup cluster size for backups Stefan Hajnoczi
@ 2016-03-02 11:03   ` Fam Zheng
  0 siblings, 0 replies; 9+ messages in thread
From: Fam Zheng @ 2016-03-02 11:03 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: kwolf, qemu-block, Jeff Cody, qemu-devel, stefanha, John Snow

On Wed, 03/02 10:48, Stefan Hajnoczi wrote:
> On Mon, Feb 22, 2016 at 05:07:04PM -0500, John Snow wrote:
> > Backups sometimes need a non-64KiB transfer cluster size.
> > See patch #2 for the detailed justificaton.
> > 
> > ===
> > v2
> > ===
> > 
> > 01: Removed "sectors_per_cluster" as a cached property of the Backup Block Job,
> >     In favor of recomputing it with a small function where needed.
> >     (I like v1 more. Thoughts?)
> > 02: Expand correction to all backup modes instead of just incremental.
> >     Added credit: Thanks to Fam Z for noticing this problem!
> > 03: Minor phrasing change in a comment.
> >     Added r-b.
> > 
> > ________________________________________________________________________________
> > 
> > For convenience, this branch is available at:
> > https://github.com/jnsnow/qemu.git branch incremental-granularity-fix
> > https://github.com/jnsnow/qemu/tree/incremental-granularity-fix
> > 
> > This version is tagged incremental-granularity-fix-v2:
> > https://github.com/jnsnow/qemu/releases/tag/incremental-granularity-fix-v2
> > 
> > John Snow (3):
> >   block/backup: make backup cluster size configurable
> >   block/backup: avoid copying less than full target clusters
> >   iotests/124: Add cluster_size mismatch test
> > 
> >  block/backup.c             | 72 ++++++++++++++++++++++++++++------------------
> >  tests/qemu-iotests/124     | 58 +++++++++++++++++++++++++++++++++----
> >  tests/qemu-iotests/124.out |  4 +--
> >  3 files changed, 99 insertions(+), 35 deletions(-)
> 
> Please CC Jeff Cody.  He maintains block jobs.

Jeff was Cc'ed. This series has already being applied and merged. :)

Fam

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-03-02 11:04 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-02-22 22:07 [Qemu-devel] [PATCH v2 0/3] blockjob: correct backup cluster size for backups John Snow
2016-02-22 22:07 ` [Qemu-devel] [PATCH v2 1/3] block/backup: make backup cluster size configurable John Snow
2016-02-23  5:06   ` Fam Zheng
2016-02-22 22:07 ` [Qemu-devel] [PATCH v2 2/3] block/backup: avoid copying less than full target clusters John Snow
2016-02-23  5:08   ` Fam Zheng
2016-02-23 17:16     ` John Snow
2016-02-22 22:07 ` [Qemu-devel] [PATCH v2 3/3] iotests/124: Add cluster_size mismatch test John Snow
2016-03-02 10:48 ` [Qemu-devel] [Qemu-block] [PATCH v2 0/3] blockjob: correct backup cluster size for backups Stefan Hajnoczi
2016-03-02 11:03   ` Fam Zheng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).