qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v2 0/3] block: block-backup live backup command
@ 2013-04-29  7:42 Stefan Hajnoczi
  2013-04-29  7:42 ` [Qemu-devel] [PATCH v2 1/3] block: add basic backup support to block driver Stefan Hajnoczi
                   ` (2 more replies)
  0 siblings, 3 replies; 27+ messages in thread
From: Stefan Hajnoczi @ 2013-04-29  7:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Fam Zheng, Wenchao Xia, imain, Stefan Hajnoczi,
	pbonzini, dietmar

This series adds a new QMP command, block-backup, which takes a point-in-time
snapshot of a block device.  The snapshot is copied out to a target block
device.  A simple example is:

  block-backup device=virtio0 format=qcow2 target=backup-20130401.qcow2

The original block-backup blockjob was written by Dietmar Maurer
<dietmar@proxmox.com>.  He is currently busy but I feel the feature is worth
pushing into QEMU since there has been interest.  This is my version of his
patch, plus the QMP command and qemu-iotests test case.

How is this different from block-stream and drive-mirror?
---------------------------------------------------------
Both block-stream and drive-mirror do not provide immediate point-in-time
snapshots.  Instead they copy data into a new file and then switch to it.  In
other words, the point at which the "snapshot" is taken cannot be controlled
directly.

block-backup intercepts guest writes and saves data into the target block
device before it is overwritten.  The target block device can be a raw image
file, backing files are not used to implement this feature.

How can block-backup be used?
-----------------------------
The simplest use-case is to copy a point-in-time snapshot to a local file.

More advanced users may wish to make the target an NBD URL.  The NBD server
listening on the other side can process the backup writes any way it wishes.  I
previously posted an RFC series with a backup server that streamed Dietmar's
VMA backup archive format.

What's next for block-backup?
-----------------------------
The following enhancements are left for future patches:

1. QMP 'transaction' support.  It is handy to atomically snapshot multiple
   block devices.  We need qmp_transaction() support for this.  Wenchao Xia is
   currently making qmp_transaction() extensible so new action types, like
   block-backup, can be added.

2. Sync modes like drive-mirror (top, full, none).  This makes it possible to
   preserve the backing file chain.

v2:
 * s/block_backup/block-backup/ in commit message [eblake]
 * Avoid funny spacing in QMP docs [eblake]
 * Document query-block-jobs and block-job-cancel usage [eblake]

Dietmar Maurer (1):
  block: add basic backup support to block driver

Stefan Hajnoczi (2):
  block: add block-backup QMP command
  qemu-iotests: add 054 block-backup test case

 block.c                    |  69 ++++++++++++-
 block/Makefile.objs        |   1 +
 block/backup.c             | 252 +++++++++++++++++++++++++++++++++++++++++++++
 blockdev.c                 |  92 +++++++++++++++++
 include/block/block.h      |   2 +
 include/block/block_int.h  |  16 +++
 include/block/blockjob.h   |  10 ++
 qapi-schema.json           |  31 ++++++
 qmp-commands.hx            |   6 ++
 tests/qemu-iotests/054     | 230 +++++++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/054.out |   5 +
 tests/qemu-iotests/group   |   1 +
 12 files changed, 710 insertions(+), 5 deletions(-)
 create mode 100644 block/backup.c
 create mode 100755 tests/qemu-iotests/054
 create mode 100644 tests/qemu-iotests/054.out

-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH v2 1/3] block: add basic backup support to block driver
  2013-04-29  7:42 [Qemu-devel] [PATCH v2 0/3] block: block-backup live backup command Stefan Hajnoczi
@ 2013-04-29  7:42 ` Stefan Hajnoczi
  2013-05-08 12:39   ` Kevin Wolf
  2013-05-30  3:37   ` Fam Zheng
  2013-04-29  7:42 ` [Qemu-devel] [PATCH v2 2/3] block: add block-backup QMP command Stefan Hajnoczi
  2013-04-29  7:42 ` [Qemu-devel] [PATCH v2 3/3] qemu-iotests: add 054 block-backup test case Stefan Hajnoczi
  2 siblings, 2 replies; 27+ messages in thread
From: Stefan Hajnoczi @ 2013-04-29  7:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Fam Zheng, Wenchao Xia, imain, Stefan Hajnoczi,
	pbonzini, dietmar

From: Dietmar Maurer <dietmar@proxmox.com>

backup_start() creates a block job that copies a point-in-time snapshot
of a block device to a target block device.

We call backup_do_cow() for each write during backup. That function
reads the original data from the block device before it gets
overwritten.  The data is then written to the target device.

The tracked_request infrastructure is used to serialize access.  Both
reads and writes are serialized if they overlap.

Currently backup cluster size is hardcoded to 65536 bytes.

[I made a number of changes to Dietmar's original patch and folded them
in to make code review easy.  Here is the full list:

 * Drop BackupDumpFunc interface in favor of a target block device
 * Detect zero clusters with buffer_is_zero()
 * Don't write zero clusters to the target
 * Use 0 delay instead of 1us, like other block jobs
 * Unify creation/start functions into backup_start()
 * Simplify cleanup, free bitmap in backup_run() instead of cb function
 * Use HBitmap to avoid duplicating bitmap code
 * Use bdrv_getlength() instead of accessing ->total_sectors directly
 * Delete the backup.h header file, it is no longer necessary
 * Move ./backup.c to block/backup.c
 * Remove #ifdefed out code
 * Coding style and whitespace cleanups

-- stefanha]

Signed-off-by: Dietmar Maurer <dietmar@proxmox.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block.c                   |  69 ++++++++++++-
 block/Makefile.objs       |   1 +
 block/backup.c            | 252 ++++++++++++++++++++++++++++++++++++++++++++++
 include/block/block.h     |   2 +
 include/block/block_int.h |  16 +++
 include/block/blockjob.h  |  10 ++
 6 files changed, 345 insertions(+), 5 deletions(-)
 create mode 100644 block/backup.c

diff --git a/block.c b/block.c
index aa9a533..c5c09b7 100644
--- a/block.c
+++ b/block.c
@@ -54,6 +54,7 @@
 typedef enum {
     BDRV_REQ_COPY_ON_READ = 0x1,
     BDRV_REQ_ZERO_WRITE   = 0x2,
+    BDRV_REQ_BACKUP_ONLY  = 0x4,
 } BdrvRequestFlags;
 
 static void bdrv_dev_change_media_cb(BlockDriverState *bs, bool load);
@@ -1902,6 +1903,22 @@ void bdrv_round_to_clusters(BlockDriverState *bs,
     }
 }
 
+/**
+ * Round a region to job cluster boundaries
+ */
+static void round_to_job_clusters(BlockDriverState *bs,
+                                  int64_t sector_num, int nb_sectors,
+                                  int job_cluster_size,
+                                  int64_t *cluster_sector_num,
+                                  int *cluster_nb_sectors)
+{
+    int64_t c = job_cluster_size / BDRV_SECTOR_SIZE;
+
+    *cluster_sector_num = QEMU_ALIGN_DOWN(sector_num, c);
+    *cluster_nb_sectors = QEMU_ALIGN_UP(sector_num - *cluster_sector_num +
+                                        nb_sectors, c);
+}
+
 static bool tracked_request_overlaps(BdrvTrackedRequest *req,
                                      int64_t sector_num, int nb_sectors) {
     /*        aaaa   bbbb */
@@ -1916,7 +1933,9 @@ static bool tracked_request_overlaps(BdrvTrackedRequest *req,
 }
 
 static void coroutine_fn wait_for_overlapping_requests(BlockDriverState *bs,
-        int64_t sector_num, int nb_sectors)
+                                                       int64_t sector_num,
+                                                       int nb_sectors,
+                                                       int job_cluster_size)
 {
     BdrvTrackedRequest *req;
     int64_t cluster_sector_num;
@@ -1932,6 +1951,11 @@ static void coroutine_fn wait_for_overlapping_requests(BlockDriverState *bs,
     bdrv_round_to_clusters(bs, sector_num, nb_sectors,
                            &cluster_sector_num, &cluster_nb_sectors);
 
+    if (job_cluster_size) {
+        round_to_job_clusters(bs, sector_num, nb_sectors, job_cluster_size,
+                              &cluster_sector_num, &cluster_nb_sectors);
+    }
+
     do {
         retry = false;
         QLIST_FOREACH(req, &bs->tracked_requests, list) {
@@ -2507,12 +2531,24 @@ static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs,
         bs->copy_on_read_in_flight++;
     }
 
-    if (bs->copy_on_read_in_flight) {
-        wait_for_overlapping_requests(bs, sector_num, nb_sectors);
+    int job_cluster_size = bs->job && bs->job->cluster_size ?
+        bs->job->cluster_size : 0;
+
+    if (bs->copy_on_read_in_flight || job_cluster_size) {
+        wait_for_overlapping_requests(bs, sector_num, nb_sectors,
+                                      job_cluster_size);
     }
 
     tracked_request_begin(&req, bs, sector_num, nb_sectors, false);
 
+    if (bs->job && bs->job->job_type->before_read) {
+        ret = bs->job->job_type->before_read(bs, sector_num, nb_sectors, qiov);
+        if ((ret < 0) || (flags & BDRV_REQ_BACKUP_ONLY)) {
+            /* Note: We do not return any data to the caller */
+            goto out;
+        }
+    }
+
     if (flags & BDRV_REQ_COPY_ON_READ) {
         int pnum;
 
@@ -2556,6 +2592,17 @@ int coroutine_fn bdrv_co_copy_on_readv(BlockDriverState *bs,
                             BDRV_REQ_COPY_ON_READ);
 }
 
+int coroutine_fn bdrv_co_backup(BlockDriverState *bs,
+    int64_t sector_num, int nb_sectors)
+{
+    if (!bs->job) {
+        return -ENOTSUP;
+    }
+
+    return bdrv_co_do_readv(bs, sector_num, nb_sectors, NULL,
+                            BDRV_REQ_BACKUP_ONLY);
+}
+
 static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs,
     int64_t sector_num, int nb_sectors)
 {
@@ -2613,12 +2660,23 @@ static int coroutine_fn bdrv_co_do_writev(BlockDriverState *bs,
         bdrv_io_limits_intercept(bs, true, nb_sectors);
     }
 
-    if (bs->copy_on_read_in_flight) {
-        wait_for_overlapping_requests(bs, sector_num, nb_sectors);
+    int job_cluster_size = bs->job && bs->job->cluster_size ?
+        bs->job->cluster_size : 0;
+
+    if (bs->copy_on_read_in_flight || job_cluster_size) {
+        wait_for_overlapping_requests(bs, sector_num, nb_sectors,
+                                      job_cluster_size);
     }
 
     tracked_request_begin(&req, bs, sector_num, nb_sectors, true);
 
+    if (bs->job && bs->job->job_type->before_write) {
+        ret = bs->job->job_type->before_write(bs, sector_num, nb_sectors, qiov);
+        if (ret < 0) {
+            goto out;
+        }
+    }
+
     if (flags & BDRV_REQ_ZERO_WRITE) {
         ret = bdrv_co_do_write_zeroes(bs, sector_num, nb_sectors);
     } else {
@@ -2637,6 +2695,7 @@ static int coroutine_fn bdrv_co_do_writev(BlockDriverState *bs,
         bs->wr_highest_sector = sector_num + nb_sectors - 1;
     }
 
+out:
     tracked_request_end(&req);
 
     return ret;
diff --git a/block/Makefile.objs b/block/Makefile.objs
index 6c4b5bc..dcab6d3 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -19,5 +19,6 @@ endif
 common-obj-y += stream.o
 common-obj-y += commit.o
 common-obj-y += mirror.o
+common-obj-y += backup.o
 
 $(obj)/curl.o: QEMU_CFLAGS+=$(CURL_CFLAGS)
diff --git a/block/backup.c b/block/backup.c
new file mode 100644
index 0000000..45d8c21
--- /dev/null
+++ b/block/backup.c
@@ -0,0 +1,252 @@
+/*
+ * QEMU backup
+ *
+ * Copyright (C) 2013 Proxmox Server Solutions
+ *
+ * Authors:
+ *  Dietmar Maurer (dietmar@proxmox.com)
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include <stdio.h>
+#include <errno.h>
+#include <unistd.h>
+
+#include "block/block.h"
+#include "block/block_int.h"
+#include "block/blockjob.h"
+#include "qemu/ratelimit.h"
+
+#define DEBUG_BACKUP 0
+
+#define DPRINTF(fmt, ...) \
+    do { \
+        if (DEBUG_BACKUP) { \
+            fprintf(stderr, "backup: " fmt, ## __VA_ARGS__); \
+        } \
+    } while (0)
+
+#define BACKUP_CLUSTER_BITS 16
+#define BACKUP_CLUSTER_SIZE (1 << BACKUP_CLUSTER_BITS)
+#define BACKUP_BLOCKS_PER_CLUSTER (BACKUP_CLUSTER_SIZE / BDRV_SECTOR_SIZE)
+
+#define SLICE_TIME 100000000ULL /* ns */
+
+typedef struct BackupBlockJob {
+    BlockJob common;
+    BlockDriverState *target;
+    RateLimit limit;
+    CoRwlock rwlock;
+    uint64_t sectors_read;
+    HBitmap *bitmap;
+} BackupBlockJob;
+
+static int coroutine_fn backup_do_cow(BlockDriverState *bs,
+                                      int64_t sector_num, int nb_sectors)
+{
+    assert(bs);
+    BackupBlockJob *job = (BackupBlockJob *)bs->job;
+    assert(job);
+
+    BlockDriver *drv = bs->drv;
+    struct iovec iov;
+    QEMUIOVector bounce_qiov;
+    void *bounce_buffer = NULL;
+    int ret = 0;
+
+    qemu_co_rwlock_rdlock(&job->rwlock);
+
+    int64_t start, end;
+
+    start = sector_num / BACKUP_BLOCKS_PER_CLUSTER;
+    end = DIV_ROUND_UP(sector_num + nb_sectors, BACKUP_BLOCKS_PER_CLUSTER);
+
+    DPRINTF("brdv_co_backup_cow enter %s C%" PRId64 " %" PRId64 " %d\n",
+            bdrv_get_device_name(bs), start, sector_num, nb_sectors);
+
+    for (; start < end; start++) {
+        if (hbitmap_get(job->bitmap, start)) {
+            DPRINTF("brdv_co_backup_cow skip C%" PRId64 "\n", start);
+            continue; /* already copied */
+        }
+
+        /* immediately set bitmap (avoid coroutine race) */
+        hbitmap_set(job->bitmap, start, 1);
+
+        DPRINTF("brdv_co_backup_cow C%" PRId64 "\n", start);
+
+        if (!bounce_buffer) {
+            iov.iov_len = BACKUP_CLUSTER_SIZE;
+            iov.iov_base = bounce_buffer = qemu_blockalign(bs, iov.iov_len);
+            qemu_iovec_init_external(&bounce_qiov, &iov, 1);
+        }
+
+        ret = drv->bdrv_co_readv(bs, start * BACKUP_BLOCKS_PER_CLUSTER,
+                                 BACKUP_BLOCKS_PER_CLUSTER,
+                                 &bounce_qiov);
+        if (ret < 0) {
+            DPRINTF("brdv_co_backup_cow bdrv_read C%" PRId64 " failed\n",
+                    start);
+            goto out;
+        }
+
+        job->sectors_read += BACKUP_BLOCKS_PER_CLUSTER;
+
+        if (!buffer_is_zero(bounce_buffer, BACKUP_CLUSTER_SIZE)) {
+            ret = bdrv_co_writev(job->target, start * BACKUP_BLOCKS_PER_CLUSTER,
+                                 BACKUP_BLOCKS_PER_CLUSTER,
+                                 &bounce_qiov);
+            if (ret < 0) {
+                DPRINTF("brdv_co_backup_cow dump_cluster_cb C%" PRId64
+                        " failed\n", start);
+                goto out;
+            }
+        }
+
+        DPRINTF("brdv_co_backup_cow done C%" PRId64 "\n", start);
+    }
+
+out:
+    if (bounce_buffer) {
+        qemu_vfree(bounce_buffer);
+    }
+
+    qemu_co_rwlock_unlock(&job->rwlock);
+
+    return ret;
+}
+
+static int coroutine_fn backup_before_read(BlockDriverState *bs,
+                                           int64_t sector_num,
+                                           int nb_sectors, QEMUIOVector *qiov)
+{
+    return backup_do_cow(bs, sector_num, nb_sectors);
+}
+
+static int coroutine_fn backup_before_write(BlockDriverState *bs,
+                                            int64_t sector_num,
+                                            int nb_sectors, QEMUIOVector *qiov)
+{
+    return backup_do_cow(bs, sector_num, nb_sectors);
+}
+
+static void backup_set_speed(BlockJob *job, int64_t speed, Error **errp)
+{
+    BackupBlockJob *s = container_of(job, BackupBlockJob, common);
+
+    if (speed < 0) {
+        error_set(errp, QERR_INVALID_PARAMETER, "speed");
+        return;
+    }
+    ratelimit_set_speed(&s->limit, speed / BDRV_SECTOR_SIZE, SLICE_TIME);
+}
+
+static BlockJobType backup_job_type = {
+    .instance_size = sizeof(BackupBlockJob),
+    .before_read = backup_before_read,
+    .before_write = backup_before_write,
+    .job_type = "backup",
+    .set_speed = backup_set_speed,
+};
+
+static void coroutine_fn backup_run(void *opaque)
+{
+    BackupBlockJob *job = opaque;
+    BlockDriverState *bs = job->common.bs;
+    assert(bs);
+
+    int64_t start, end;
+
+    start = 0;
+    end = DIV_ROUND_UP(bdrv_getlength(bs) / BDRV_SECTOR_SIZE,
+                       BACKUP_BLOCKS_PER_CLUSTER);
+
+    job->bitmap = hbitmap_alloc(end, 0);
+
+    DPRINTF("backup_run start %s %" PRId64 " %" PRId64 "\n",
+            bdrv_get_device_name(bs), start, end);
+
+    int ret = 0;
+
+    for (; start < end; start++) {
+        if (block_job_is_cancelled(&job->common)) {
+            ret = -1;
+            break;
+        }
+
+        /* we need to yield so that qemu_aio_flush() returns.
+         * (without, VM does not reboot)
+         */
+        if (job->common.speed) {
+            uint64_t delay_ns = ratelimit_calculate_delay(
+                &job->limit, job->sectors_read);
+            job->sectors_read = 0;
+            block_job_sleep_ns(&job->common, rt_clock, delay_ns);
+        } else {
+            block_job_sleep_ns(&job->common, rt_clock, 0);
+        }
+
+        if (block_job_is_cancelled(&job->common)) {
+            ret = -1;
+            break;
+        }
+
+        if (hbitmap_get(job->bitmap, start)) {
+            continue; /* already copied */
+        }
+
+        DPRINTF("backup_run loop C%" PRId64 "\n", start);
+
+        /**
+         * This triggers a cluster copy
+         * Note: avoid direct call to brdv_co_backup_cow, because
+         * this does not call tracked_request_begin()
+         */
+        ret = bdrv_co_backup(bs, start*BACKUP_BLOCKS_PER_CLUSTER, 1);
+        if (ret < 0) {
+            break;
+        }
+        /* Publish progress */
+        job->common.offset += BACKUP_CLUSTER_SIZE;
+    }
+
+    /* wait until pending backup_do_cow()calls have completed */
+    qemu_co_rwlock_wrlock(&job->rwlock);
+    qemu_co_rwlock_unlock(&job->rwlock);
+
+    hbitmap_free(job->bitmap);
+
+    bdrv_delete(job->target);
+
+    DPRINTF("backup_run complete %d\n", ret);
+    block_job_completed(&job->common, ret);
+}
+
+void backup_start(BlockDriverState *bs, BlockDriverState *target,
+                  int64_t speed,
+                  BlockDriverCompletionFunc *cb, void *opaque,
+                  Error **errp)
+{
+    assert(bs);
+    assert(target);
+    assert(cb);
+
+    DPRINTF("backup_start %s\n", bdrv_get_device_name(bs));
+
+    BackupBlockJob *job = block_job_create(&backup_job_type, bs, speed,
+                                           cb, opaque, errp);
+    if (!job) {
+        return;
+    }
+
+    qemu_co_rwlock_init(&job->rwlock);
+
+    job->target = target;
+    job->common.cluster_size = BACKUP_CLUSTER_SIZE;
+    job->common.len = bdrv_getlength(bs);
+    job->common.co = qemu_coroutine_create(backup_run);
+    qemu_coroutine_enter(job->common.co, job);
+}
diff --git a/include/block/block.h b/include/block/block.h
index 1251c5c..19a41f4 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -178,6 +178,8 @@ int coroutine_fn bdrv_co_readv(BlockDriverState *bs, int64_t sector_num,
     int nb_sectors, QEMUIOVector *qiov);
 int coroutine_fn bdrv_co_copy_on_readv(BlockDriverState *bs,
     int64_t sector_num, int nb_sectors, QEMUIOVector *qiov);
+int coroutine_fn bdrv_co_backup(BlockDriverState *bs,
+    int64_t sector_num, int nb_sectors);
 int coroutine_fn bdrv_co_writev(BlockDriverState *bs, int64_t sector_num,
     int nb_sectors, QEMUIOVector *qiov);
 /*
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 6078dd3..bd51abf 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -377,4 +377,20 @@ void mirror_start(BlockDriverState *bs, BlockDriverState *target,
                   BlockDriverCompletionFunc *cb,
                   void *opaque, Error **errp);
 
+/*
+ * backup_start:
+ * @bs: Block device to operate on.
+ * @target: Block device to write to.
+ * @speed: The maximum speed, in bytes per second, or 0 for unlimited.
+ * @cb: Completion function for the job.
+ * @opaque: Opaque pointer value passed to @cb.
+ *
+ * Start a backup operation on @bs.  Clusters in @bs are written to @target
+ * until the job is cancelled or manually completed.
+ */
+void backup_start(BlockDriverState *bs, BlockDriverState *target,
+                  int64_t speed,
+                  BlockDriverCompletionFunc *cb, void *opaque,
+                  Error **errp);
+
 #endif /* BLOCK_INT_H */
diff --git a/include/block/blockjob.h b/include/block/blockjob.h
index c290d07..6f42495 100644
--- a/include/block/blockjob.h
+++ b/include/block/blockjob.h
@@ -50,6 +50,13 @@ typedef struct BlockJobType {
      * manually.
      */
     void (*complete)(BlockJob *job, Error **errp);
+
+    /** tracked requests */
+    int coroutine_fn (*before_read)(BlockDriverState *bs, int64_t sector_num,
+                                    int nb_sectors, QEMUIOVector *qiov);
+    int coroutine_fn (*before_write)(BlockDriverState *bs, int64_t sector_num,
+                                     int nb_sectors, QEMUIOVector *qiov);
+
 } BlockJobType;
 
 /**
@@ -103,6 +110,9 @@ struct BlockJob {
     /** Speed that was set with @block_job_set_speed.  */
     int64_t speed;
 
+    /** tracked requests */
+    int cluster_size;
+
     /** The completion function that will be called when the job completes.  */
     BlockDriverCompletionFunc *cb;
 
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH v2 2/3] block: add block-backup QMP command
  2013-04-29  7:42 [Qemu-devel] [PATCH v2 0/3] block: block-backup live backup command Stefan Hajnoczi
  2013-04-29  7:42 ` [Qemu-devel] [PATCH v2 1/3] block: add basic backup support to block driver Stefan Hajnoczi
@ 2013-04-29  7:42 ` Stefan Hajnoczi
  2013-05-08 12:49   ` Kevin Wolf
  2013-05-11  4:02   ` Eric Blake
  2013-04-29  7:42 ` [Qemu-devel] [PATCH v2 3/3] qemu-iotests: add 054 block-backup test case Stefan Hajnoczi
  2 siblings, 2 replies; 27+ messages in thread
From: Stefan Hajnoczi @ 2013-04-29  7:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Fam Zheng, Wenchao Xia, imain, Stefan Hajnoczi,
	pbonzini, dietmar

@block-backup

Start a point-in-time copy of a block device to a new destination.  The
status of ongoing block backup operations can be checked with
query-block-jobs.  The operation can be stopped before it has completed using
the block-job-cancel command.

@device: the name of the device whose writes should be mirrored.

@target: the target of the new image. If the file exists, or if it
         is a device, the existing file/device will be used as the new
         destination.  If it does not exist, a new file will be created.

@format: #optional the format of the new destination, default is to
         probe if @mode is 'existing', else the format of the source

@mode: #optional whether and how QEMU should create a new image, default is
       'absolute-paths'.

@speed: #optional the maximum speed, in bytes per second

Returns: nothing on success
         If @device is not a valid block device, DeviceNotFound

Since 1.6

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 blockdev.c       | 92 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 qapi-schema.json | 31 +++++++++++++++++++
 qmp-commands.hx  |  6 ++++
 3 files changed, 129 insertions(+)

diff --git a/blockdev.c b/blockdev.c
index 6e293e9..afb9b4a 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1279,6 +1279,98 @@ void qmp_block_commit(const char *device,
     drive_get_ref(drive_get_by_blockdev(bs));
 }
 
+void qmp_block_backup(const char *device, const char *target,
+                      bool has_format, const char *format,
+                      bool has_mode, enum NewImageMode mode,
+                      bool has_speed, int64_t speed,
+                      Error **errp)
+{
+    BlockDriverState *bs;
+    BlockDriverState *target_bs;
+    BlockDriver *proto_drv;
+    BlockDriver *drv = NULL;
+    Error *local_err = NULL;
+    int flags;
+    uint64_t size;
+    int ret;
+
+    if (!has_speed) {
+        speed = 0;
+    }
+    if (!has_mode) {
+        mode = NEW_IMAGE_MODE_ABSOLUTE_PATHS;
+    }
+
+    bs = bdrv_find(device);
+    if (!bs) {
+        error_set(errp, QERR_DEVICE_NOT_FOUND, device);
+        return;
+    }
+
+    if (!bdrv_is_inserted(bs)) {
+        error_set(errp, QERR_DEVICE_HAS_NO_MEDIUM, device);
+        return;
+    }
+
+    if (!has_format) {
+        format = mode == NEW_IMAGE_MODE_EXISTING ? NULL : bs->drv->format_name;
+    }
+    if (format) {
+        drv = bdrv_find_format(format);
+        if (!drv) {
+            error_set(errp, QERR_INVALID_BLOCK_FORMAT, format);
+            return;
+        }
+    }
+
+    if (bdrv_in_use(bs)) {
+        error_set(errp, QERR_DEVICE_IN_USE, device);
+        return;
+    }
+
+    flags = bs->open_flags | BDRV_O_RDWR;
+
+    proto_drv = bdrv_find_protocol(target);
+    if (!proto_drv) {
+        error_set(errp, QERR_INVALID_BLOCK_FORMAT, format);
+        return;
+    }
+
+    bdrv_get_geometry(bs, &size);
+    size *= 512;
+    if (mode != NEW_IMAGE_MODE_EXISTING) {
+        assert(format && drv);
+        bdrv_img_create(target, format,
+                        NULL, NULL, NULL, size, flags, &local_err, false);
+    }
+
+    if (error_is_set(&local_err)) {
+        error_propagate(errp, local_err);
+        return;
+    }
+
+    target_bs = bdrv_new("");
+    ret = bdrv_open(target_bs, target, NULL, flags, drv);
+
+    if (ret < 0) {
+        bdrv_delete(target_bs);
+        error_set(errp, QERR_OPEN_FILE_FAILED, target);
+        return;
+    }
+
+    backup_start(bs, target_bs, speed, block_job_cb, bs, &local_err);
+    if (local_err != NULL) {
+        bdrv_delete(target_bs);
+        error_propagate(errp, local_err);
+        return;
+    }
+
+    /* Grab a reference so hotplug does not delete the BlockDriverState from
+     * underneath us.
+     */
+    drive_get_ref(drive_get_by_blockdev(bs));
+}
+
 #define DEFAULT_MIRROR_BUF_SIZE   (10 << 20)
 
 void qmp_drive_mirror(const char *device, const char *target,
diff --git a/qapi-schema.json b/qapi-schema.json
index 5b0fb3b..550ccac 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1715,6 +1715,37 @@
             '*speed': 'int' } }
 
 ##
+# @block-backup
+#
+# Start a point-in-time copy of a block device to a new destination.  The
+# status of ongoing block backup operations can be checked with
+# query-block-jobs.  The operation can be stopped before it has completed using
+# the block-job-cancel command.
+#
+# @device: the name of the device whose writes should be mirrored.
+#
+# @target: the target of the new image. If the file exists, or if it
+#          is a device, the existing file/device will be used as the new
+#          destination.  If it does not exist, a new file will be created.
+#
+# @format: #optional the format of the new destination, default is to
+#          probe if @mode is 'existing', else the format of the source
+#
+# @mode: #optional whether and how QEMU should create a new image, default is
+#        'absolute-paths'.
+#
+# @speed: #optional the maximum speed, in bytes per second
+#
+# Returns: nothing on success
+#          If @device is not a valid block device, DeviceNotFound
+#
+# Since 1.6
+##
+{ 'command': 'block-backup',
+  'data': { 'device': 'str', 'target': 'str', '*format': 'str',
+            '*mode': 'NewImageMode', '*speed': 'int' } }
+
+##
 # @drive-mirror
 #
 # Start mirroring a block device's writes to a new destination.
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 0e89132..f9fb926 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -889,6 +889,12 @@ EQMP
     },
 
     {
+        .name       = "block-backup",
+        .args_type  = "device:B,target:s,speed:i?,mode:s?,format:s?",
+        .mhandler.cmd_new = qmp_marshal_input_block_backup,
+    },
+
+    {
         .name       = "block-job-set-speed",
         .args_type  = "device:B,speed:o",
         .mhandler.cmd_new = qmp_marshal_input_block_job_set_speed,
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH v2 3/3] qemu-iotests: add 054 block-backup test case
  2013-04-29  7:42 [Qemu-devel] [PATCH v2 0/3] block: block-backup live backup command Stefan Hajnoczi
  2013-04-29  7:42 ` [Qemu-devel] [PATCH v2 1/3] block: add basic backup support to block driver Stefan Hajnoczi
  2013-04-29  7:42 ` [Qemu-devel] [PATCH v2 2/3] block: add block-backup QMP command Stefan Hajnoczi
@ 2013-04-29  7:42 ` Stefan Hajnoczi
  2 siblings, 0 replies; 27+ messages in thread
From: Stefan Hajnoczi @ 2013-04-29  7:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Fam Zheng, Wenchao Xia, imain, Stefan Hajnoczi,
	pbonzini, dietmar

Testing block-backup is similar to image streaming and drive mirroring.
This test case is based on 041.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 tests/qemu-iotests/054     | 230 +++++++++++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/054.out |   5 +
 tests/qemu-iotests/group   |   1 +
 3 files changed, 236 insertions(+)
 create mode 100755 tests/qemu-iotests/054
 create mode 100644 tests/qemu-iotests/054.out

diff --git a/tests/qemu-iotests/054 b/tests/qemu-iotests/054
new file mode 100755
index 0000000..506bf05
--- /dev/null
+++ b/tests/qemu-iotests/054
@@ -0,0 +1,230 @@
+#!/usr/bin/env python
+#
+# Tests for block-backup
+#
+# Copyright (C) 2013 Red Hat, Inc.
+#
+# Based on 041.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+import time
+import os
+import iotests
+from iotests import qemu_img, qemu_io
+
+test_img = os.path.join(iotests.test_dir, 'test.img')
+target_img = os.path.join(iotests.test_dir, 'target.img')
+
+class BlockBackupTestCase(iotests.QMPTestCase):
+    '''Abstract base class for block-backup test cases'''
+
+    def assert_no_active_backups(self):
+        result = self.vm.qmp('query-block-jobs')
+        self.assert_qmp(result, 'return', [])
+
+    def cancel_and_wait(self, drive='drive0'):
+        '''Cancel a block job and wait for it to finish'''
+        result = self.vm.qmp('block-job-cancel', device=drive)
+        self.assert_qmp(result, 'return', {})
+
+        cancelled = False
+        while not cancelled:
+            for event in self.vm.get_qmp_events(wait=True):
+                if event['event'] == 'BLOCK_JOB_COMPLETED' or \
+                   event['event'] == 'BLOCK_JOB_CANCELLED':
+                    self.assert_qmp(event, 'data/type', 'backup')
+                    self.assert_qmp(event, 'data/device', drive)
+                    cancelled = True
+
+        self.assert_no_active_backups()
+
+    def complete_and_wait(self):
+        completed = False
+        while not completed:
+            for event in self.vm.get_qmp_events(wait=True):
+                if event['event'] == 'BLOCK_JOB_COMPLETED':
+                    self.assert_qmp(event, 'data/type', 'backup')
+                    self.assert_qmp(event, 'data/device', 'drive0')
+                    self.assert_qmp(event, 'data/offset', self.image_len)
+                    self.assert_qmp(event, 'data/len', self.image_len)
+                    completed = True
+        self.assert_no_active_backups()
+
+    def compare_images(self, img1, img2):
+        try:
+            qemu_img('convert', '-f', iotests.imgfmt, '-O', 'raw', img1, img1 + '.raw')
+            qemu_img('convert', '-f', iotests.imgfmt, '-O', 'raw', img2, img2 + '.raw')
+            file1 = open(img1 + '.raw', 'r')
+            file2 = open(img2 + '.raw', 'r')
+            return file1.read() == file2.read()
+        finally:
+            if file1 is not None:
+                file1.close()
+            if file2 is not None:
+                file2.close()
+            try:
+                os.remove(img1 + '.raw')
+            except OSError:
+                pass
+            try:
+                os.remove(img2 + '.raw')
+            except OSError:
+                pass
+
+class TestSingleDrive(BlockBackupTestCase):
+    image_len = 120 * 1024 * 1024 # MB
+
+    def setUp(self):
+        qemu_img('create', '-f', iotests.imgfmt, test_img, str(TestSingleDrive.image_len))
+        self.vm = iotests.VM().add_drive(test_img)
+        self.vm.launch()
+
+    def tearDown(self):
+        self.vm.shutdown()
+        os.remove(test_img)
+        try:
+            os.remove(target_img)
+        except OSError:
+            pass
+
+    def test_complete(self):
+        self.assert_no_active_backups()
+
+        result = self.vm.qmp('block-backup', device='drive0',
+                             target=target_img)
+        self.assert_qmp(result, 'return', {})
+
+        self.complete_and_wait()
+        self.vm.shutdown()
+        self.assertTrue(self.compare_images(test_img, target_img),
+                        'target image does not match source after backup')
+
+    def test_cancel(self):
+        self.assert_no_active_backups()
+
+        result = self.vm.qmp('block-backup', device='drive0',
+                             target=target_img)
+        self.assert_qmp(result, 'return', {})
+
+        self.cancel_and_wait()
+
+    def test_pause(self):
+        self.assert_no_active_backups()
+
+        result = self.vm.qmp('block-backup', device='drive0',
+                             target=target_img)
+        self.assert_qmp(result, 'return', {})
+
+        result = self.vm.qmp('block-job-pause', device='drive0')
+        self.assert_qmp(result, 'return', {})
+
+        time.sleep(1)
+        result = self.vm.qmp('query-block-jobs')
+        offset = self.dictpath(result, 'return[0]/offset')
+
+        time.sleep(1)
+        result = self.vm.qmp('query-block-jobs')
+        self.assert_qmp(result, 'return[0]/offset', offset)
+
+        result = self.vm.qmp('block-job-resume', device='drive0')
+        self.assert_qmp(result, 'return', {})
+
+        self.complete_and_wait()
+        self.vm.shutdown()
+        self.assertTrue(self.compare_images(test_img, target_img),
+                        'target image does not match source after backup')
+
+    def test_medium_not_found(self):
+        result = self.vm.qmp('block-backup', device='ide1-cd0',
+                             target=target_img)
+        self.assert_qmp(result, 'error/class', 'GenericError')
+
+    def test_image_not_found(self):
+        result = self.vm.qmp('block-backup', device='drive0',
+                             mode='existing', target=target_img)
+        self.assert_qmp(result, 'error/class', 'GenericError')
+
+    def test_device_not_found(self):
+        result = self.vm.qmp('block-backup', device='nonexistent',
+                             target=target_img)
+        self.assert_qmp(result, 'error/class', 'DeviceNotFound')
+
+class TestSetSpeed(BlockBackupTestCase):
+    image_len = 80 * 1024 * 1024 # MB
+
+    def setUp(self):
+        qemu_img('create', '-f', iotests.imgfmt, test_img, str(TestSetSpeed.image_len))
+        self.vm = iotests.VM().add_drive(test_img)
+        self.vm.launch()
+
+    def tearDown(self):
+        self.vm.shutdown()
+        os.remove(test_img)
+        os.remove(target_img)
+
+    def test_set_speed(self):
+        self.assert_no_active_backups()
+
+        result = self.vm.qmp('block-backup', device='drive0',
+                             target=target_img)
+        self.assert_qmp(result, 'return', {})
+
+        # Default speed is 0
+        result = self.vm.qmp('query-block-jobs')
+        self.assert_qmp(result, 'return[0]/device', 'drive0')
+        self.assert_qmp(result, 'return[0]/speed', 0)
+
+        result = self.vm.qmp('block-job-set-speed', device='drive0', speed=8 * 1024 * 1024)
+        self.assert_qmp(result, 'return', {})
+
+        # Ensure the speed we set was accepted
+        result = self.vm.qmp('query-block-jobs')
+        self.assert_qmp(result, 'return[0]/device', 'drive0')
+        self.assert_qmp(result, 'return[0]/speed', 8 * 1024 * 1024)
+
+        self.cancel_and_wait()
+
+        # Check setting speed in block-backup works
+        result = self.vm.qmp('block-backup', device='drive0',
+                             target=target_img, speed=4*1024*1024)
+        self.assert_qmp(result, 'return', {})
+
+        result = self.vm.qmp('query-block-jobs')
+        self.assert_qmp(result, 'return[0]/device', 'drive0')
+        self.assert_qmp(result, 'return[0]/speed', 4 * 1024 * 1024)
+
+        self.cancel_and_wait()
+
+    def test_set_speed_invalid(self):
+        self.assert_no_active_backups()
+
+        result = self.vm.qmp('block-backup', device='drive0',
+                             target=target_img, speed=-1)
+        self.assert_qmp(result, 'error/class', 'GenericError')
+
+        self.assert_no_active_backups()
+
+        result = self.vm.qmp('block-backup', device='drive0',
+                             target=target_img)
+        self.assert_qmp(result, 'return', {})
+
+        result = self.vm.qmp('block-job-set-speed', device='drive0', speed=-1)
+        self.assert_qmp(result, 'error/class', 'GenericError')
+
+        self.cancel_and_wait()
+
+if __name__ == '__main__':
+    iotests.main(supported_fmts=['raw', 'qcow2'])
diff --git a/tests/qemu-iotests/054.out b/tests/qemu-iotests/054.out
new file mode 100644
index 0000000..594c16f
--- /dev/null
+++ b/tests/qemu-iotests/054.out
@@ -0,0 +1,5 @@
+........
+----------------------------------------------------------------------
+Ran 8 tests
+
+OK
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index bf944d9..387b050 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -60,3 +60,4 @@
 #051 rw auto
 052 rw auto backing
 053 rw auto
+054 rw auto
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/3] block: add basic backup support to block driver
  2013-04-29  7:42 ` [Qemu-devel] [PATCH v2 1/3] block: add basic backup support to block driver Stefan Hajnoczi
@ 2013-05-08 12:39   ` Kevin Wolf
  2013-05-08 15:43     ` Paolo Bonzini
                       ` (2 more replies)
  2013-05-30  3:37   ` Fam Zheng
  1 sibling, 3 replies; 27+ messages in thread
From: Kevin Wolf @ 2013-05-08 12:39 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Fam Zheng, qemu-devel, Wenchao Xia, imain, pbonzini, dietmar

Am 29.04.2013 um 09:42 hat Stefan Hajnoczi geschrieben:
> From: Dietmar Maurer <dietmar@proxmox.com>
> 
> backup_start() creates a block job that copies a point-in-time snapshot
> of a block device to a target block device.
> 
> We call backup_do_cow() for each write during backup. That function
> reads the original data from the block device before it gets
> overwritten.  The data is then written to the target device.
> 
> The tracked_request infrastructure is used to serialize access.  Both
> reads and writes are serialized if they overlap.
> 
> Currently backup cluster size is hardcoded to 65536 bytes.
> 
> [I made a number of changes to Dietmar's original patch and folded them
> in to make code review easy.  Here is the full list:
> 
>  * Drop BackupDumpFunc interface in favor of a target block device
>  * Detect zero clusters with buffer_is_zero()
>  * Don't write zero clusters to the target
>  * Use 0 delay instead of 1us, like other block jobs
>  * Unify creation/start functions into backup_start()
>  * Simplify cleanup, free bitmap in backup_run() instead of cb function
>  * Use HBitmap to avoid duplicating bitmap code
>  * Use bdrv_getlength() instead of accessing ->total_sectors directly
>  * Delete the backup.h header file, it is no longer necessary
>  * Move ./backup.c to block/backup.c
>  * Remove #ifdefed out code
>  * Coding style and whitespace cleanups
> 
> -- stefanha]
> 
> Signed-off-by: Dietmar Maurer <dietmar@proxmox.com>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  block.c                   |  69 ++++++++++++-
>  block/Makefile.objs       |   1 +
>  block/backup.c            | 252 ++++++++++++++++++++++++++++++++++++++++++++++
>  include/block/block.h     |   2 +
>  include/block/block_int.h |  16 +++
>  include/block/blockjob.h  |  10 ++
>  6 files changed, 345 insertions(+), 5 deletions(-)
>  create mode 100644 block/backup.c

(Moving some hunks around so I can comment on the headers first.)

> diff --git a/include/block/blockjob.h b/include/block/blockjob.h
> index c290d07..6f42495 100644
> --- a/include/block/blockjob.h
> +++ b/include/block/blockjob.h
> @@ -50,6 +50,13 @@ typedef struct BlockJobType {
>       * manually.
>       */
>      void (*complete)(BlockJob *job, Error **errp);
> +
> +    /** tracked requests */
> +    int coroutine_fn (*before_read)(BlockDriverState *bs, int64_t sector_num,
> +                                    int nb_sectors, QEMUIOVector *qiov);
> +    int coroutine_fn (*before_write)(BlockDriverState *bs, int64_t sector_num,
> +                                     int nb_sectors, QEMUIOVector *qiov);
> +
>  } BlockJobType;

This is actually a sign that a block job isn't the right tool. Jobs are
something that runs in the background and doesn't have callbacks. You
really want to have a filter here (that happens to be coupled to a job).
Need the BlockBackend split before we can do this right.

The second thing that this conflicts with is generalising block jobs to
generic background jobs.

Each hack like this that we accumulate makes it harder to get the real
thing eventually.

>  
>  /**
> @@ -103,6 +110,9 @@ struct BlockJob {
>      /** Speed that was set with @block_job_set_speed.  */
>      int64_t speed;
>  
> +    /** tracked requests */
> +    int cluster_size;

Sure that this is the right comment here?

Does really every job need a cluster size?

> diff --git a/block.c b/block.c
> index aa9a533..c5c09b7 100644
> --- a/block.c
> +++ b/block.c
> @@ -54,6 +54,7 @@
>  typedef enum {
>      BDRV_REQ_COPY_ON_READ = 0x1,
>      BDRV_REQ_ZERO_WRITE   = 0x2,
> +    BDRV_REQ_BACKUP_ONLY  = 0x4,
>  } BdrvRequestFlags;

Without having read the rest of the code, it's unclear to me what this
new flag means. Could use a comment. (Though it has "backup" in its name,
so I suspect having this in generic block layer is wrong.)

>  static void bdrv_dev_change_media_cb(BlockDriverState *bs, bool load);
> @@ -1902,6 +1903,22 @@ void bdrv_round_to_clusters(BlockDriverState *bs,
>      }
>  }
>  
> +/**
> + * Round a region to job cluster boundaries
> + */
> +static void round_to_job_clusters(BlockDriverState *bs,
> +                                  int64_t sector_num, int nb_sectors,
> +                                  int job_cluster_size,
> +                                  int64_t *cluster_sector_num,
> +                                  int *cluster_nb_sectors)
> +{
> +    int64_t c = job_cluster_size / BDRV_SECTOR_SIZE;
> +
> +    *cluster_sector_num = QEMU_ALIGN_DOWN(sector_num, c);
> +    *cluster_nb_sectors = QEMU_ALIGN_UP(sector_num - *cluster_sector_num +
> +                                        nb_sectors, c);
> +}

This function has really nothing to do with jobs except that it happens
to be useful in some function actually related to jobs.

It should probably be renamed and bdrv_round_to_clusters() should call
into it.

> +
>  static bool tracked_request_overlaps(BdrvTrackedRequest *req,
>                                       int64_t sector_num, int nb_sectors) {
>      /*        aaaa   bbbb */
> @@ -1916,7 +1933,9 @@ static bool tracked_request_overlaps(BdrvTrackedRequest *req,
>  }
>  
>  static void coroutine_fn wait_for_overlapping_requests(BlockDriverState *bs,
> -        int64_t sector_num, int nb_sectors)
> +                                                       int64_t sector_num,
> +                                                       int nb_sectors,
> +                                                       int job_cluster_size)
>  {
>      BdrvTrackedRequest *req;
>      int64_t cluster_sector_num;
> @@ -1932,6 +1951,11 @@ static void coroutine_fn wait_for_overlapping_requests(BlockDriverState *bs,
>      bdrv_round_to_clusters(bs, sector_num, nb_sectors,
>                             &cluster_sector_num, &cluster_nb_sectors);
>  
> +    if (job_cluster_size) {
> +        round_to_job_clusters(bs, sector_num, nb_sectors, job_cluster_size,
> +                              &cluster_sector_num, &cluster_nb_sectors);
> +    }
> +
>      do {
>          retry = false;
>          QLIST_FOREACH(req, &bs->tracked_requests, list) {
> @@ -2507,12 +2531,24 @@ static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs,
>          bs->copy_on_read_in_flight++;
>      }
>  
> -    if (bs->copy_on_read_in_flight) {
> -        wait_for_overlapping_requests(bs, sector_num, nb_sectors);
> +    int job_cluster_size = bs->job && bs->job->cluster_size ?
> +        bs->job->cluster_size : 0;
> +
> +    if (bs->copy_on_read_in_flight || job_cluster_size) {
> +        wait_for_overlapping_requests(bs, sector_num, nb_sectors,
> +                                      job_cluster_size);
>      }
>  
>      tracked_request_begin(&req, bs, sector_num, nb_sectors, false);
>  
> +    if (bs->job && bs->job->job_type->before_read) {
> +        ret = bs->job->job_type->before_read(bs, sector_num, nb_sectors, qiov);
> +        if ((ret < 0) || (flags & BDRV_REQ_BACKUP_ONLY)) {
> +            /* Note: We do not return any data to the caller */
> +            goto out;
> +        }
> +    }

Ugh, so we're introducing calls of bdrv_co_do_readv() that don't
actually read, but only cause some side-effects of reads, except if
there is no block job running?

Is it intentional that these fake requests are considered for I/O
throttling?

> +
>      if (flags & BDRV_REQ_COPY_ON_READ) {
>          int pnum;
>  
> @@ -2556,6 +2592,17 @@ int coroutine_fn bdrv_co_copy_on_readv(BlockDriverState *bs,
>                              BDRV_REQ_COPY_ON_READ);
>  }
>  
> +int coroutine_fn bdrv_co_backup(BlockDriverState *bs,
> +    int64_t sector_num, int nb_sectors)
> +{
> +    if (!bs->job) {
> +        return -ENOTSUP;
> +    }

Don't you expect a very specific job to be running, or is it acceptable
to have streaming running? It looks a bit arbitrary.

And you make the assumption that the functionality (running only
side-effects of read) can only work correctly in the context of jobs (or
actually not just in the context of jobs, but only when a job is running
at the same time). Not sure why.

I suspect you're not really checking what you wanted to check here.

> +
> +    return bdrv_co_do_readv(bs, sector_num, nb_sectors, NULL,
> +                            BDRV_REQ_BACKUP_ONLY);
> +}
> +
>  static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs,
>      int64_t sector_num, int nb_sectors)
>  {
> @@ -2613,12 +2660,23 @@ static int coroutine_fn bdrv_co_do_writev(BlockDriverState *bs,
>          bdrv_io_limits_intercept(bs, true, nb_sectors);
>      }
>  
> -    if (bs->copy_on_read_in_flight) {
> -        wait_for_overlapping_requests(bs, sector_num, nb_sectors);
> +    int job_cluster_size = bs->job && bs->job->cluster_size ?
> +        bs->job->cluster_size : 0;
> +
> +    if (bs->copy_on_read_in_flight || job_cluster_size) {
> +        wait_for_overlapping_requests(bs, sector_num, nb_sectors,
> +                                      job_cluster_size);
>      }
>  
>      tracked_request_begin(&req, bs, sector_num, nb_sectors, true);
>  
> +    if (bs->job && bs->job->job_type->before_write) {
> +        ret = bs->job->job_type->before_write(bs, sector_num, nb_sectors, qiov);
> +        if (ret < 0) {
> +            goto out;
> +        }
> +    }
> +
>      if (flags & BDRV_REQ_ZERO_WRITE) {
>          ret = bdrv_co_do_write_zeroes(bs, sector_num, nb_sectors);
>      } else {
> @@ -2637,6 +2695,7 @@ static int coroutine_fn bdrv_co_do_writev(BlockDriverState *bs,
>          bs->wr_highest_sector = sector_num + nb_sectors - 1;
>      }
>  
> +out:
>      tracked_request_end(&req);
>  
>      return ret;

The changes to block.c and friends should be a separate patch, so let me
sum up my expectations here: This patch should be as small as possible,
it should mention the word "backup" as little as possible and it should
build successfully without backup.o.

This is the very minimum at which we can start talking about the patch
and whether we do proper block filters or what the best way is to work
around the lack of them today.

And then we get to things like fake requests which aren't my favourite
design either...

> diff --git a/block/Makefile.objs b/block/Makefile.objs
> index 6c4b5bc..dcab6d3 100644
> --- a/block/Makefile.objs
> +++ b/block/Makefile.objs
> @@ -19,5 +19,6 @@ endif
>  common-obj-y += stream.o
>  common-obj-y += commit.o
>  common-obj-y += mirror.o
> +common-obj-y += backup.o
>  
>  $(obj)/curl.o: QEMU_CFLAGS+=$(CURL_CFLAGS)
> diff --git a/block/backup.c b/block/backup.c
> new file mode 100644
> index 0000000..45d8c21
> --- /dev/null
> +++ b/block/backup.c
> @@ -0,0 +1,252 @@
> +/*
> + * QEMU backup
> + *
> + * Copyright (C) 2013 Proxmox Server Solutions
> + *
> + * Authors:
> + *  Dietmar Maurer (dietmar@proxmox.com)
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#include <stdio.h>
> +#include <errno.h>
> +#include <unistd.h>
> +
> +#include "block/block.h"
> +#include "block/block_int.h"
> +#include "block/blockjob.h"
> +#include "qemu/ratelimit.h"
> +
> +#define DEBUG_BACKUP 0
> +
> +#define DPRINTF(fmt, ...) \
> +    do { \
> +        if (DEBUG_BACKUP) { \
> +            fprintf(stderr, "backup: " fmt, ## __VA_ARGS__); \
> +        } \
> +    } while (0)
> +
> +#define BACKUP_CLUSTER_BITS 16
> +#define BACKUP_CLUSTER_SIZE (1 << BACKUP_CLUSTER_BITS)
> +#define BACKUP_BLOCKS_PER_CLUSTER (BACKUP_CLUSTER_SIZE / BDRV_SECTOR_SIZE)
> +
> +#define SLICE_TIME 100000000ULL /* ns */
> +
> +typedef struct BackupBlockJob {
> +    BlockJob common;
> +    BlockDriverState *target;
> +    RateLimit limit;
> +    CoRwlock rwlock;

flush_rwlock?

> +    uint64_t sectors_read;
> +    HBitmap *bitmap;
> +} BackupBlockJob;
> +
> +static int coroutine_fn backup_do_cow(BlockDriverState *bs,
> +                                      int64_t sector_num, int nb_sectors)
> +{
> +    assert(bs);
> +    BackupBlockJob *job = (BackupBlockJob *)bs->job;
> +    assert(job);
> +
> +    BlockDriver *drv = bs->drv;
> +    struct iovec iov;
> +    QEMUIOVector bounce_qiov;
> +    void *bounce_buffer = NULL;
> +    int ret = 0;
> +
> +    qemu_co_rwlock_rdlock(&job->rwlock);
> +
> +    int64_t start, end;
> +
> +    start = sector_num / BACKUP_BLOCKS_PER_CLUSTER;
> +    end = DIV_ROUND_UP(sector_num + nb_sectors, BACKUP_BLOCKS_PER_CLUSTER);
> +
> +    DPRINTF("brdv_co_backup_cow enter %s C%" PRId64 " %" PRId64 " %d\n",
> +            bdrv_get_device_name(bs), start, sector_num, nb_sectors);
> +
> +    for (; start < end; start++) {
> +        if (hbitmap_get(job->bitmap, start)) {
> +            DPRINTF("brdv_co_backup_cow skip C%" PRId64 "\n", start);
> +            continue; /* already copied */
> +        }
> +
> +        /* immediately set bitmap (avoid coroutine race) */
> +        hbitmap_set(job->bitmap, start, 1);
> +
> +        DPRINTF("brdv_co_backup_cow C%" PRId64 "\n", start);
> +
> +        if (!bounce_buffer) {
> +            iov.iov_len = BACKUP_CLUSTER_SIZE;
> +            iov.iov_base = bounce_buffer = qemu_blockalign(bs, iov.iov_len);
> +            qemu_iovec_init_external(&bounce_qiov, &iov, 1);
> +        }
> +
> +        ret = drv->bdrv_co_readv(bs, start * BACKUP_BLOCKS_PER_CLUSTER,
> +                                 BACKUP_BLOCKS_PER_CLUSTER,
> +                                 &bounce_qiov);

Why do you bypass the block layer here?

> +        if (ret < 0) {
> +            DPRINTF("brdv_co_backup_cow bdrv_read C%" PRId64 " failed\n",
> +                    start);
> +            goto out;
> +        }
> +
> +        job->sectors_read += BACKUP_BLOCKS_PER_CLUSTER;
> +
> +        if (!buffer_is_zero(bounce_buffer, BACKUP_CLUSTER_SIZE)) {
> +            ret = bdrv_co_writev(job->target, start * BACKUP_BLOCKS_PER_CLUSTER,
> +                                 BACKUP_BLOCKS_PER_CLUSTER,
> +                                 &bounce_qiov);
> +            if (ret < 0) {
> +                DPRINTF("brdv_co_backup_cow dump_cluster_cb C%" PRId64
> +                        " failed\n", start);
> +                goto out;
> +            }
> +        }
> +
> +        DPRINTF("brdv_co_backup_cow done C%" PRId64 "\n", start);
> +    }
> +
> +out:
> +    if (bounce_buffer) {
> +        qemu_vfree(bounce_buffer);
> +    }
> +
> +    qemu_co_rwlock_unlock(&job->rwlock);
> +
> +    return ret;
> +}
> +
> +static int coroutine_fn backup_before_read(BlockDriverState *bs,
> +                                           int64_t sector_num,
> +                                           int nb_sectors, QEMUIOVector *qiov)
> +{
> +    return backup_do_cow(bs, sector_num, nb_sectors);
> +}

Why do you do copy on _read_? Without actually making use of the read
data like COR in block.c, but by reading the data a second time.

I mean, obviously this is the implementation of bdrv_co_backup(), but it
doesn't make a lot of sense for guest requests. You could directly call
into the function from below instead of going through block.c for
bdrv_co_backup() and get rid of the fake requests.

> +static int coroutine_fn backup_before_write(BlockDriverState *bs,
> +                                            int64_t sector_num,
> +                                            int nb_sectors, QEMUIOVector *qiov)
> +{
> +    return backup_do_cow(bs, sector_num, nb_sectors);
> +}
> +
> +static void backup_set_speed(BlockJob *job, int64_t speed, Error **errp)
> +{
> +    BackupBlockJob *s = container_of(job, BackupBlockJob, common);
> +
> +    if (speed < 0) {
> +        error_set(errp, QERR_INVALID_PARAMETER, "speed");
> +        return;
> +    }
> +    ratelimit_set_speed(&s->limit, speed / BDRV_SECTOR_SIZE, SLICE_TIME);
> +}
> +
> +static BlockJobType backup_job_type = {
> +    .instance_size = sizeof(BackupBlockJob),
> +    .before_read = backup_before_read,
> +    .before_write = backup_before_write,
> +    .job_type = "backup",
> +    .set_speed = backup_set_speed,
> +};
> +
> +static void coroutine_fn backup_run(void *opaque)
> +{
> +    BackupBlockJob *job = opaque;
> +    BlockDriverState *bs = job->common.bs;
> +    assert(bs);
> +
> +    int64_t start, end;
> +
> +    start = 0;
> +    end = DIV_ROUND_UP(bdrv_getlength(bs) / BDRV_SECTOR_SIZE,
> +                       BACKUP_BLOCKS_PER_CLUSTER);
> +
> +    job->bitmap = hbitmap_alloc(end, 0);
> +
> +    DPRINTF("backup_run start %s %" PRId64 " %" PRId64 "\n",
> +            bdrv_get_device_name(bs), start, end);
> +
> +    int ret = 0;
> +
> +    for (; start < end; start++) {
> +        if (block_job_is_cancelled(&job->common)) {
> +            ret = -1;
> +            break;
> +        }
> +
> +        /* we need to yield so that qemu_aio_flush() returns.
> +         * (without, VM does not reboot)
> +         */
> +        if (job->common.speed) {
> +            uint64_t delay_ns = ratelimit_calculate_delay(
> +                &job->limit, job->sectors_read);
> +            job->sectors_read = 0;
> +            block_job_sleep_ns(&job->common, rt_clock, delay_ns);
> +        } else {
> +            block_job_sleep_ns(&job->common, rt_clock, 0);
> +        }
> +
> +        if (block_job_is_cancelled(&job->common)) {
> +            ret = -1;
> +            break;
> +        }
> +
> +        if (hbitmap_get(job->bitmap, start)) {
> +            continue; /* already copied */
> +        }

Isn't this duplicated and would be checked in the bdrv_co_backup() call
below anyway?

The semantic difference is that job->common.offset wouldn't be increased
with this additional check. Looks like a bug.

> +        DPRINTF("backup_run loop C%" PRId64 "\n", start);
> +
> +        /**
> +         * This triggers a cluster copy
> +         * Note: avoid direct call to brdv_co_backup_cow, because
> +         * this does not call tracked_request_begin()
> +         */
> +        ret = bdrv_co_backup(bs, start*BACKUP_BLOCKS_PER_CLUSTER, 1);
> +        if (ret < 0) {
> +            break;
> +        }
> +        /* Publish progress */
> +        job->common.offset += BACKUP_CLUSTER_SIZE;
> +    }
> +
> +    /* wait until pending backup_do_cow()calls have completed */

Missing space.

> +    qemu_co_rwlock_wrlock(&job->rwlock);
> +    qemu_co_rwlock_unlock(&job->rwlock);
> +
> +    hbitmap_free(job->bitmap);
> +
> +    bdrv_delete(job->target);
> +
> +    DPRINTF("backup_run complete %d\n", ret);
> +    block_job_completed(&job->common, ret);
> +}
> +
> +void backup_start(BlockDriverState *bs, BlockDriverState *target,
> +                  int64_t speed,
> +                  BlockDriverCompletionFunc *cb, void *opaque,
> +                  Error **errp)
> +{
> +    assert(bs);
> +    assert(target);
> +    assert(cb);
> +
> +    DPRINTF("backup_start %s\n", bdrv_get_device_name(bs));
> +
> +    BackupBlockJob *job = block_job_create(&backup_job_type, bs, speed,
> +                                           cb, opaque, errp);
> +    if (!job) {
> +        return;
> +    }
> +
> +    qemu_co_rwlock_init(&job->rwlock);
> +
> +    job->target = target;
> +    job->common.cluster_size = BACKUP_CLUSTER_SIZE;
> +    job->common.len = bdrv_getlength(bs);
> +    job->common.co = qemu_coroutine_create(backup_run);
> +    qemu_coroutine_enter(job->common.co, job);
> +}

Sorry, but this patch doesn't look very close to mergable yet.

Kevin

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/3] block: add block-backup QMP command
  2013-04-29  7:42 ` [Qemu-devel] [PATCH v2 2/3] block: add block-backup QMP command Stefan Hajnoczi
@ 2013-05-08 12:49   ` Kevin Wolf
  2013-05-11  3:34     ` Eric Blake
  2013-05-14  8:44     ` Stefan Hajnoczi
  2013-05-11  4:02   ` Eric Blake
  1 sibling, 2 replies; 27+ messages in thread
From: Kevin Wolf @ 2013-05-08 12:49 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Fam Zheng, qemu-devel, Wenchao Xia, imain, pbonzini, dietmar

Am 29.04.2013 um 09:42 hat Stefan Hajnoczi geschrieben:
> @block-backup
> 
> Start a point-in-time copy of a block device to a new destination.  The
> status of ongoing block backup operations can be checked with
> query-block-jobs.  The operation can be stopped before it has completed using
> the block-job-cancel command.
> 
> @device: the name of the device whose writes should be mirrored.
> 
> @target: the target of the new image. If the file exists, or if it
>          is a device, the existing file/device will be used as the new
>          destination.  If it does not exist, a new file will be created.
> 
> @format: #optional the format of the new destination, default is to
>          probe if @mode is 'existing', else the format of the source
> 
> @mode: #optional whether and how QEMU should create a new image, default is
>        'absolute-paths'.
> 
> @speed: #optional the maximum speed, in bytes per second
> 
> Returns: nothing on success
>          If @device is not a valid block device, DeviceNotFound
> 
> Since 1.6
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

drive-backup would probably be a more consistent naming. We would then
still have block-backup for a future low-level command that doesn't
create everything by itself but takes an existing BlockDriverState (e.g.
created by blockdev-add).

We should also make it transactionable from the beginning, as we don't
have schema introspection yet. This way we allow to assume that if the
standalone command exists, the transaction subcommand exists as well.

Kevin

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/3] block: add basic backup support to block driver
  2013-05-08 12:39   ` Kevin Wolf
@ 2013-05-08 15:43     ` Paolo Bonzini
  2013-05-14  8:51     ` Stefan Hajnoczi
  2013-05-14 13:24     ` Stefan Hajnoczi
  2 siblings, 0 replies; 27+ messages in thread
From: Paolo Bonzini @ 2013-05-08 15:43 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Fam Zheng, qemu-devel, Wenchao Xia, imain, Stefan Hajnoczi,
	dietmar

Il 08/05/2013 14:39, Kevin Wolf ha scritto:
> Am 29.04.2013 um 09:42 hat Stefan Hajnoczi geschrieben:
>> From: Dietmar Maurer <dietmar@proxmox.com>
>>
>> backup_start() creates a block job that copies a point-in-time snapshot
>> of a block device to a target block device.
>>
>> We call backup_do_cow() for each write during backup. That function
>> reads the original data from the block device before it gets
>> overwritten.  The data is then written to the target device.
>>
>> The tracked_request infrastructure is used to serialize access.  Both
>> reads and writes are serialized if they overlap.
>>
>> Currently backup cluster size is hardcoded to 65536 bytes.
>>
>> [I made a number of changes to Dietmar's original patch and folded them
>> in to make code review easy.  Here is the full list:
>>
>>  * Drop BackupDumpFunc interface in favor of a target block device
>>  * Detect zero clusters with buffer_is_zero()
>>  * Don't write zero clusters to the target
>>  * Use 0 delay instead of 1us, like other block jobs
>>  * Unify creation/start functions into backup_start()
>>  * Simplify cleanup, free bitmap in backup_run() instead of cb function
>>  * Use HBitmap to avoid duplicating bitmap code
>>  * Use bdrv_getlength() instead of accessing ->total_sectors directly
>>  * Delete the backup.h header file, it is no longer necessary
>>  * Move ./backup.c to block/backup.c
>>  * Remove #ifdefed out code
>>  * Coding style and whitespace cleanups
>>
>> -- stefanha]
>>
>> Signed-off-by: Dietmar Maurer <dietmar@proxmox.com>
>> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
>> ---
>>  block.c                   |  69 ++++++++++++-
>>  block/Makefile.objs       |   1 +
>>  block/backup.c            | 252 ++++++++++++++++++++++++++++++++++++++++++++++
>>  include/block/block.h     |   2 +
>>  include/block/block_int.h |  16 +++
>>  include/block/blockjob.h  |  10 ++
>>  6 files changed, 345 insertions(+), 5 deletions(-)
>>  create mode 100644 block/backup.c
> 
> (Moving some hunks around so I can comment on the headers first.)
> 
>> diff --git a/include/block/blockjob.h b/include/block/blockjob.h
>> index c290d07..6f42495 100644
>> --- a/include/block/blockjob.h
>> +++ b/include/block/blockjob.h
>> @@ -50,6 +50,13 @@ typedef struct BlockJobType {
>>       * manually.
>>       */
>>      void (*complete)(BlockJob *job, Error **errp);
>> +
>> +    /** tracked requests */
>> +    int coroutine_fn (*before_read)(BlockDriverState *bs, int64_t sector_num,
>> +                                    int nb_sectors, QEMUIOVector *qiov);
>> +    int coroutine_fn (*before_write)(BlockDriverState *bs, int64_t sector_num,
>> +                                     int nb_sectors, QEMUIOVector *qiov);

The mechanism to handle different cluster sizes is complex.

I think you could instead use something that resembles the copy-on-read
code, but with the copy-on-read logic cut-and-pasted to block/backup.c.
 This way the state is held in the BlockBackupJob instead of the
BlockDriverState (e.g. using job->bs as the source instead of
bs->backing_file).

The before_write operations can just allocate a dummy buffer, do a
copy-on-read to the buffer (similar to streaming), discard the result
and proceed to do the write on the source BDS.  The serialization would
then happen on the destination BDS's cluster size (which is the one that
matters).  The destination BDS does not get writes from outside the job,
so it is fine to do all the serialization within the job.

I would leave out before_read from the initial patch.  An optimized
version that doesn't do the read twice has complex serialization issues,
as you found out.  And I'm not sure that a simple version that reads
twice (once as above with COR to a dummy buffer, the other on the source
BDS to serve the guest's request) is really faster than just letting the
job do COR in a streaming fashion.

Paolo

>> +
>>  } BlockJobType;
> 
> This is actually a sign that a block job isn't the right tool. Jobs are
> something that runs in the background and doesn't have callbacks. You
> really want to have a filter here (that happens to be coupled to a job).
> Need the BlockBackend split before we can do this right.
> 
> The second thing that this conflicts with is generalising block jobs to
> generic background jobs.
> 
> Each hack like this that we accumulate makes it harder to get the real
> thing eventually.
> 
>>  
>>  /**
>> @@ -103,6 +110,9 @@ struct BlockJob {
>>      /** Speed that was set with @block_job_set_speed.  */
>>      int64_t speed;
>>  
>> +    /** tracked requests */
>> +    int cluster_size;
> 
> Sure that this is the right comment here?
> 
> Does really every job need a cluster size?
> 
>> diff --git a/block.c b/block.c
>> index aa9a533..c5c09b7 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -54,6 +54,7 @@
>>  typedef enum {
>>      BDRV_REQ_COPY_ON_READ = 0x1,
>>      BDRV_REQ_ZERO_WRITE   = 0x2,
>> +    BDRV_REQ_BACKUP_ONLY  = 0x4,
>>  } BdrvRequestFlags;
> 
> Without having read the rest of the code, it's unclear to me what this
> new flag means. Could use a comment. (Though it has "backup" in its name,
> so I suspect having this in generic block layer is wrong.)
> 
>>  static void bdrv_dev_change_media_cb(BlockDriverState *bs, bool load);
>> @@ -1902,6 +1903,22 @@ void bdrv_round_to_clusters(BlockDriverState *bs,
>>      }
>>  }
>>  
>> +/**
>> + * Round a region to job cluster boundaries
>> + */
>> +static void round_to_job_clusters(BlockDriverState *bs,
>> +                                  int64_t sector_num, int nb_sectors,
>> +                                  int job_cluster_size,
>> +                                  int64_t *cluster_sector_num,
>> +                                  int *cluster_nb_sectors)
>> +{
>> +    int64_t c = job_cluster_size / BDRV_SECTOR_SIZE;
>> +
>> +    *cluster_sector_num = QEMU_ALIGN_DOWN(sector_num, c);
>> +    *cluster_nb_sectors = QEMU_ALIGN_UP(sector_num - *cluster_sector_num +
>> +                                        nb_sectors, c);
>> +}
> 
> This function has really nothing to do with jobs except that it happens
> to be useful in some function actually related to jobs.
> 
> It should probably be renamed and bdrv_round_to_clusters() should call
> into it.
> 
>> +
>>  static bool tracked_request_overlaps(BdrvTrackedRequest *req,
>>                                       int64_t sector_num, int nb_sectors) {
>>      /*        aaaa   bbbb */
>> @@ -1916,7 +1933,9 @@ static bool tracked_request_overlaps(BdrvTrackedRequest *req,
>>  }
>>  
>>  static void coroutine_fn wait_for_overlapping_requests(BlockDriverState *bs,
>> -        int64_t sector_num, int nb_sectors)
>> +                                                       int64_t sector_num,
>> +                                                       int nb_sectors,
>> +                                                       int job_cluster_size)
>>  {
>>      BdrvTrackedRequest *req;
>>      int64_t cluster_sector_num;
>> @@ -1932,6 +1951,11 @@ static void coroutine_fn wait_for_overlapping_requests(BlockDriverState *bs,
>>      bdrv_round_to_clusters(bs, sector_num, nb_sectors,
>>                             &cluster_sector_num, &cluster_nb_sectors);
>>  
>> +    if (job_cluster_size) {
>> +        round_to_job_clusters(bs, sector_num, nb_sectors, job_cluster_size,
>> +                              &cluster_sector_num, &cluster_nb_sectors);
>> +    }
>> +
>>      do {
>>          retry = false;
>>          QLIST_FOREACH(req, &bs->tracked_requests, list) {
>> @@ -2507,12 +2531,24 @@ static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs,
>>          bs->copy_on_read_in_flight++;
>>      }
>>  
>> -    if (bs->copy_on_read_in_flight) {
>> -        wait_for_overlapping_requests(bs, sector_num, nb_sectors);
>> +    int job_cluster_size = bs->job && bs->job->cluster_size ?
>> +        bs->job->cluster_size : 0;
>> +
>> +    if (bs->copy_on_read_in_flight || job_cluster_size) {
>> +        wait_for_overlapping_requests(bs, sector_num, nb_sectors,
>> +                                      job_cluster_size);
>>      }
>>  
>>      tracked_request_begin(&req, bs, sector_num, nb_sectors, false);
>>  
>> +    if (bs->job && bs->job->job_type->before_read) {
>> +        ret = bs->job->job_type->before_read(bs, sector_num, nb_sectors, qiov);
>> +        if ((ret < 0) || (flags & BDRV_REQ_BACKUP_ONLY)) {
>> +            /* Note: We do not return any data to the caller */
>> +            goto out;
>> +        }
>> +    }
> 
> Ugh, so we're introducing calls of bdrv_co_do_readv() that don't
> actually read, but only cause some side-effects of reads, except if
> there is no block job running?
> 
> Is it intentional that these fake requests are considered for I/O
> throttling?
> 
>> +
>>      if (flags & BDRV_REQ_COPY_ON_READ) {
>>          int pnum;
>>  
>> @@ -2556,6 +2592,17 @@ int coroutine_fn bdrv_co_copy_on_readv(BlockDriverState *bs,
>>                              BDRV_REQ_COPY_ON_READ);
>>  }
>>  
>> +int coroutine_fn bdrv_co_backup(BlockDriverState *bs,
>> +    int64_t sector_num, int nb_sectors)
>> +{
>> +    if (!bs->job) {
>> +        return -ENOTSUP;
>> +    }
> 
> Don't you expect a very specific job to be running, or is it acceptable
> to have streaming running? It looks a bit arbitrary.
> 
> And you make the assumption that the functionality (running only
> side-effects of read) can only work correctly in the context of jobs (or
> actually not just in the context of jobs, but only when a job is running
> at the same time). Not sure why.
> 
> I suspect you're not really checking what you wanted to check here.
> 
>> +
>> +    return bdrv_co_do_readv(bs, sector_num, nb_sectors, NULL,
>> +                            BDRV_REQ_BACKUP_ONLY);
>> +}
>> +
>>  static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs,
>>      int64_t sector_num, int nb_sectors)
>>  {
>> @@ -2613,12 +2660,23 @@ static int coroutine_fn bdrv_co_do_writev(BlockDriverState *bs,
>>          bdrv_io_limits_intercept(bs, true, nb_sectors);
>>      }
>>  
>> -    if (bs->copy_on_read_in_flight) {
>> -        wait_for_overlapping_requests(bs, sector_num, nb_sectors);
>> +    int job_cluster_size = bs->job && bs->job->cluster_size ?
>> +        bs->job->cluster_size : 0;
>> +
>> +    if (bs->copy_on_read_in_flight || job_cluster_size) {
>> +        wait_for_overlapping_requests(bs, sector_num, nb_sectors,
>> +                                      job_cluster_size);
>>      }
>>  
>>      tracked_request_begin(&req, bs, sector_num, nb_sectors, true);
>>  
>> +    if (bs->job && bs->job->job_type->before_write) {
>> +        ret = bs->job->job_type->before_write(bs, sector_num, nb_sectors, qiov);
>> +        if (ret < 0) {
>> +            goto out;
>> +        }
>> +    }
>> +
>>      if (flags & BDRV_REQ_ZERO_WRITE) {
>>          ret = bdrv_co_do_write_zeroes(bs, sector_num, nb_sectors);
>>      } else {
>> @@ -2637,6 +2695,7 @@ static int coroutine_fn bdrv_co_do_writev(BlockDriverState *bs,
>>          bs->wr_highest_sector = sector_num + nb_sectors - 1;
>>      }
>>  
>> +out:
>>      tracked_request_end(&req);
>>  
>>      return ret;
> 
> The changes to block.c and friends should be a separate patch, so let me
> sum up my expectations here: This patch should be as small as possible,
> it should mention the word "backup" as little as possible and it should
> build successfully without backup.o.
> 
> This is the very minimum at which we can start talking about the patch
> and whether we do proper block filters or what the best way is to work
> around the lack of them today.
> 
> And then we get to things like fake requests which aren't my favourite
> design either...
> 
>> diff --git a/block/Makefile.objs b/block/Makefile.objs
>> index 6c4b5bc..dcab6d3 100644
>> --- a/block/Makefile.objs
>> +++ b/block/Makefile.objs
>> @@ -19,5 +19,6 @@ endif
>>  common-obj-y += stream.o
>>  common-obj-y += commit.o
>>  common-obj-y += mirror.o
>> +common-obj-y += backup.o
>>  
>>  $(obj)/curl.o: QEMU_CFLAGS+=$(CURL_CFLAGS)
>> diff --git a/block/backup.c b/block/backup.c
>> new file mode 100644
>> index 0000000..45d8c21
>> --- /dev/null
>> +++ b/block/backup.c
>> @@ -0,0 +1,252 @@
>> +/*
>> + * QEMU backup
>> + *
>> + * Copyright (C) 2013 Proxmox Server Solutions
>> + *
>> + * Authors:
>> + *  Dietmar Maurer (dietmar@proxmox.com)
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
>> + * See the COPYING file in the top-level directory.
>> + *
>> + */
>> +
>> +#include <stdio.h>
>> +#include <errno.h>
>> +#include <unistd.h>
>> +
>> +#include "block/block.h"
>> +#include "block/block_int.h"
>> +#include "block/blockjob.h"
>> +#include "qemu/ratelimit.h"
>> +
>> +#define DEBUG_BACKUP 0
>> +
>> +#define DPRINTF(fmt, ...) \
>> +    do { \
>> +        if (DEBUG_BACKUP) { \
>> +            fprintf(stderr, "backup: " fmt, ## __VA_ARGS__); \
>> +        } \
>> +    } while (0)
>> +
>> +#define BACKUP_CLUSTER_BITS 16
>> +#define BACKUP_CLUSTER_SIZE (1 << BACKUP_CLUSTER_BITS)
>> +#define BACKUP_BLOCKS_PER_CLUSTER (BACKUP_CLUSTER_SIZE / BDRV_SECTOR_SIZE)
>> +
>> +#define SLICE_TIME 100000000ULL /* ns */
>> +
>> +typedef struct BackupBlockJob {
>> +    BlockJob common;
>> +    BlockDriverState *target;
>> +    RateLimit limit;
>> +    CoRwlock rwlock;
> 
> flush_rwlock?
> 
>> +    uint64_t sectors_read;
>> +    HBitmap *bitmap;
>> +} BackupBlockJob;
>> +
>> +static int coroutine_fn backup_do_cow(BlockDriverState *bs,
>> +                                      int64_t sector_num, int nb_sectors)
>> +{
>> +    assert(bs);
>> +    BackupBlockJob *job = (BackupBlockJob *)bs->job;
>> +    assert(job);
>> +
>> +    BlockDriver *drv = bs->drv;
>> +    struct iovec iov;
>> +    QEMUIOVector bounce_qiov;
>> +    void *bounce_buffer = NULL;
>> +    int ret = 0;
>> +
>> +    qemu_co_rwlock_rdlock(&job->rwlock);
>> +
>> +    int64_t start, end;
>> +
>> +    start = sector_num / BACKUP_BLOCKS_PER_CLUSTER;
>> +    end = DIV_ROUND_UP(sector_num + nb_sectors, BACKUP_BLOCKS_PER_CLUSTER);
>> +
>> +    DPRINTF("brdv_co_backup_cow enter %s C%" PRId64 " %" PRId64 " %d\n",
>> +            bdrv_get_device_name(bs), start, sector_num, nb_sectors);
>> +
>> +    for (; start < end; start++) {
>> +        if (hbitmap_get(job->bitmap, start)) {
>> +            DPRINTF("brdv_co_backup_cow skip C%" PRId64 "\n", start);
>> +            continue; /* already copied */
>> +        }
>> +
>> +        /* immediately set bitmap (avoid coroutine race) */
>> +        hbitmap_set(job->bitmap, start, 1);
>> +
>> +        DPRINTF("brdv_co_backup_cow C%" PRId64 "\n", start);
>> +
>> +        if (!bounce_buffer) {
>> +            iov.iov_len = BACKUP_CLUSTER_SIZE;
>> +            iov.iov_base = bounce_buffer = qemu_blockalign(bs, iov.iov_len);
>> +            qemu_iovec_init_external(&bounce_qiov, &iov, 1);
>> +        }
>> +
>> +        ret = drv->bdrv_co_readv(bs, start * BACKUP_BLOCKS_PER_CLUSTER,
>> +                                 BACKUP_BLOCKS_PER_CLUSTER,
>> +                                 &bounce_qiov);
> 
> Why do you bypass the block layer here?
> 
>> +        if (ret < 0) {
>> +            DPRINTF("brdv_co_backup_cow bdrv_read C%" PRId64 " failed\n",
>> +                    start);
>> +            goto out;
>> +        }
>> +
>> +        job->sectors_read += BACKUP_BLOCKS_PER_CLUSTER;
>> +
>> +        if (!buffer_is_zero(bounce_buffer, BACKUP_CLUSTER_SIZE)) {
>> +            ret = bdrv_co_writev(job->target, start * BACKUP_BLOCKS_PER_CLUSTER,
>> +                                 BACKUP_BLOCKS_PER_CLUSTER,
>> +                                 &bounce_qiov);
>> +            if (ret < 0) {
>> +                DPRINTF("brdv_co_backup_cow dump_cluster_cb C%" PRId64
>> +                        " failed\n", start);
>> +                goto out;
>> +            }
>> +        }
>> +
>> +        DPRINTF("brdv_co_backup_cow done C%" PRId64 "\n", start);
>> +    }
>> +
>> +out:
>> +    if (bounce_buffer) {
>> +        qemu_vfree(bounce_buffer);
>> +    }
>> +
>> +    qemu_co_rwlock_unlock(&job->rwlock);
>> +
>> +    return ret;
>> +}
>> +
>> +static int coroutine_fn backup_before_read(BlockDriverState *bs,
>> +                                           int64_t sector_num,
>> +                                           int nb_sectors, QEMUIOVector *qiov)
>> +{
>> +    return backup_do_cow(bs, sector_num, nb_sectors);
>> +}
> 
> Why do you do copy on _read_? Without actually making use of the read
> data like COR in block.c, but by reading the data a second time.
> 
> I mean, obviously this is the implementation of bdrv_co_backup(), but it
> doesn't make a lot of sense for guest requests. You could directly call
> into the function from below instead of going through block.c for
> bdrv_co_backup() and get rid of the fake requests.
> 
>> +static int coroutine_fn backup_before_write(BlockDriverState *bs,
>> +                                            int64_t sector_num,
>> +                                            int nb_sectors, QEMUIOVector *qiov)
>> +{
>> +    return backup_do_cow(bs, sector_num, nb_sectors);
>> +}
>> +
>> +static void backup_set_speed(BlockJob *job, int64_t speed, Error **errp)
>> +{
>> +    BackupBlockJob *s = container_of(job, BackupBlockJob, common);
>> +
>> +    if (speed < 0) {
>> +        error_set(errp, QERR_INVALID_PARAMETER, "speed");
>> +        return;
>> +    }
>> +    ratelimit_set_speed(&s->limit, speed / BDRV_SECTOR_SIZE, SLICE_TIME);
>> +}
>> +
>> +static BlockJobType backup_job_type = {
>> +    .instance_size = sizeof(BackupBlockJob),
>> +    .before_read = backup_before_read,
>> +    .before_write = backup_before_write,
>> +    .job_type = "backup",
>> +    .set_speed = backup_set_speed,
>> +};
>> +
>> +static void coroutine_fn backup_run(void *opaque)
>> +{
>> +    BackupBlockJob *job = opaque;
>> +    BlockDriverState *bs = job->common.bs;
>> +    assert(bs);
>> +
>> +    int64_t start, end;
>> +
>> +    start = 0;
>> +    end = DIV_ROUND_UP(bdrv_getlength(bs) / BDRV_SECTOR_SIZE,
>> +                       BACKUP_BLOCKS_PER_CLUSTER);
>> +
>> +    job->bitmap = hbitmap_alloc(end, 0);
>> +
>> +    DPRINTF("backup_run start %s %" PRId64 " %" PRId64 "\n",
>> +            bdrv_get_device_name(bs), start, end);
>> +
>> +    int ret = 0;
>> +
>> +    for (; start < end; start++) {
>> +        if (block_job_is_cancelled(&job->common)) {
>> +            ret = -1;
>> +            break;
>> +        }
>> +
>> +        /* we need to yield so that qemu_aio_flush() returns.
>> +         * (without, VM does not reboot)
>> +         */
>> +        if (job->common.speed) {
>> +            uint64_t delay_ns = ratelimit_calculate_delay(
>> +                &job->limit, job->sectors_read);
>> +            job->sectors_read = 0;
>> +            block_job_sleep_ns(&job->common, rt_clock, delay_ns);
>> +        } else {
>> +            block_job_sleep_ns(&job->common, rt_clock, 0);
>> +        }
>> +
>> +        if (block_job_is_cancelled(&job->common)) {
>> +            ret = -1;
>> +            break;
>> +        }
>> +
>> +        if (hbitmap_get(job->bitmap, start)) {
>> +            continue; /* already copied */
>> +        }
> 
> Isn't this duplicated and would be checked in the bdrv_co_backup() call
> below anyway?
> 
> The semantic difference is that job->common.offset wouldn't be increased
> with this additional check. Looks like a bug.
> 
>> +        DPRINTF("backup_run loop C%" PRId64 "\n", start);
>> +
>> +        /**
>> +         * This triggers a cluster copy
>> +         * Note: avoid direct call to brdv_co_backup_cow, because
>> +         * this does not call tracked_request_begin()
>> +         */
>> +        ret = bdrv_co_backup(bs, start*BACKUP_BLOCKS_PER_CLUSTER, 1);
>> +        if (ret < 0) {
>> +            break;
>> +        }
>> +        /* Publish progress */
>> +        job->common.offset += BACKUP_CLUSTER_SIZE;
>> +    }
>> +
>> +    /* wait until pending backup_do_cow()calls have completed */
> 
> Missing space.
> 
>> +    qemu_co_rwlock_wrlock(&job->rwlock);
>> +    qemu_co_rwlock_unlock(&job->rwlock);
>> +
>> +    hbitmap_free(job->bitmap);
>> +
>> +    bdrv_delete(job->target);
>> +
>> +    DPRINTF("backup_run complete %d\n", ret);
>> +    block_job_completed(&job->common, ret);
>> +}
>> +
>> +void backup_start(BlockDriverState *bs, BlockDriverState *target,
>> +                  int64_t speed,
>> +                  BlockDriverCompletionFunc *cb, void *opaque,
>> +                  Error **errp)
>> +{
>> +    assert(bs);
>> +    assert(target);
>> +    assert(cb);
>> +
>> +    DPRINTF("backup_start %s\n", bdrv_get_device_name(bs));
>> +
>> +    BackupBlockJob *job = block_job_create(&backup_job_type, bs, speed,
>> +                                           cb, opaque, errp);
>> +    if (!job) {
>> +        return;
>> +    }
>> +
>> +    qemu_co_rwlock_init(&job->rwlock);
>> +
>> +    job->target = target;
>> +    job->common.cluster_size = BACKUP_CLUSTER_SIZE;
>> +    job->common.len = bdrv_getlength(bs);
>> +    job->common.co = qemu_coroutine_create(backup_run);
>> +    qemu_coroutine_enter(job->common.co, job);
>> +}
> 
> Sorry, but this patch doesn't look very close to mergable yet.
> 
> Kevin
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/3] block: add block-backup QMP command
  2013-05-08 12:49   ` Kevin Wolf
@ 2013-05-11  3:34     ` Eric Blake
  2013-05-11  8:02       ` Paolo Bonzini
  2013-05-13  8:23       ` Kevin Wolf
  2013-05-14  8:44     ` Stefan Hajnoczi
  1 sibling, 2 replies; 27+ messages in thread
From: Eric Blake @ 2013-05-11  3:34 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Fam Zheng, qemu-devel, Wenchao Xia, imain, Stefan Hajnoczi,
	pbonzini, dietmar

[-- Attachment #1: Type: text/plain, Size: 1587 bytes --]

On 05/08/2013 06:49 AM, Kevin Wolf wrote:
> Am 29.04.2013 um 09:42 hat Stefan Hajnoczi geschrieben:
>> @block-backup
>>

> drive-backup would probably be a more consistent naming. We would then
> still have block-backup for a future low-level command that doesn't
> create everything by itself but takes an existing BlockDriverState (e.g.
> created by blockdev-add).

At least it would match why we named a command 'drive-mirror' instead of
'block-mirror'.

Hmm, looking at qapi-schema.json, I wonder if we can rename
'BlockdevAction' to 'TransactionAction' as used in the @transaction
command.  It wouldn't change what is sent over the wire in JSON, and
until we have full introspection, there is no visibility into the type
name used.  Changing the name now would let it be more generic to adding
future transaction items that are not blockdev related.

> 
> We should also make it transactionable from the beginning, as we don't
> have schema introspection yet. This way we allow to assume that if the
> standalone command exists, the transaction subcommand exists as well.

Agreed - existence of a command at the same time the command is made
transactionable serves as a nice substitute for not having full
introspection into the 'BlockdevAction' union type, whereas if we
introduce the command now but not transaction support until 1.7, life
becomes tougher to know when it can be used where (although I HOPE we
have introspection in 1.6).

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 621 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/3] block: add block-backup QMP command
  2013-04-29  7:42 ` [Qemu-devel] [PATCH v2 2/3] block: add block-backup QMP command Stefan Hajnoczi
  2013-05-08 12:49   ` Kevin Wolf
@ 2013-05-11  4:02   ` Eric Blake
  2013-05-13  8:28     ` Kevin Wolf
  2013-05-14  8:48     ` Stefan Hajnoczi
  1 sibling, 2 replies; 27+ messages in thread
From: Eric Blake @ 2013-05-11  4:02 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Fam Zheng, qemu-devel, Wenchao Xia, imain, pbonzini,
	dietmar

[-- Attachment #1: Type: text/plain, Size: 1829 bytes --]

On 04/29/2013 01:42 AM, Stefan Hajnoczi wrote:
> @block-backup
> 
> +++ b/qapi-schema.json
> @@ -1715,6 +1715,37 @@
>              '*speed': 'int' } }
>  
>  ##
> +# @block-backup
> +#
> +# Start a point-in-time copy of a block device to a new destination.  The
> +# status of ongoing block backup operations can be checked with
> +# query-block-jobs.  The operation can be stopped before it has completed using
> +# the block-job-cancel command.

Still might be worth mentioning that 'query-block-jobs' will list it as
a job of type 'backup'.

> +#
> +# @device: the name of the device whose writes should be mirrored.
> +#
> +# @target: the target of the new image. If the file exists, or if it
> +#          is a device, the existing file/device will be used as the new
> +#          destination.  If it does not exist, a new file will be created.
> +#
> +# @format: #optional the format of the new destination, default is to
> +#          probe if @mode is 'existing', else the format of the source
> +#
> +# @mode: #optional whether and how QEMU should create a new image, default is
> +#        'absolute-paths'.
> +#
> +# @speed: #optional the maximum speed, in bytes per second
> +#
> +# Returns: nothing on success
> +#          If @device is not a valid block device, DeviceNotFound
> +#
> +# Since 1.6
> +##
> +{ 'command': 'block-backup',
> +  'data': { 'device': 'str', 'target': 'str', '*format': 'str',

Hmm - wondering if we should add an enum type for supported disk formats
instead of using free-form strings.  The wire representation would be
the same, and now's the time to do it before we add introspection (it's
more than just this command impacted).

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 621 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/3] block: add block-backup QMP command
  2013-05-11  3:34     ` Eric Blake
@ 2013-05-11  8:02       ` Paolo Bonzini
  2013-05-13  8:23       ` Kevin Wolf
  1 sibling, 0 replies; 27+ messages in thread
From: Paolo Bonzini @ 2013-05-11  8:02 UTC (permalink / raw)
  To: Eric Blake
  Cc: Kevin Wolf, Fam Zheng, qemu-devel, dietmar, imain,
	Stefan Hajnoczi, Wenchao Xia

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Il 11/05/2013 05:34, Eric Blake ha scritto:
> On 05/08/2013 06:49 AM, Kevin Wolf wrote:
>> Am 29.04.2013 um 09:42 hat Stefan Hajnoczi geschrieben:
>>> @block-backup
>>> 
> 
>> drive-backup would probably be a more consistent naming. We would
>> then still have block-backup for a future low-level command that
>> doesn't create everything by itself but takes an existing
>> BlockDriverState (e.g. created by blockdev-add).
> 
> At least it would match why we named a command 'drive-mirror'
> instead of 'block-mirror'.
> 
> Hmm, looking at qapi-schema.json, I wonder if we can rename 
> 'BlockdevAction' to 'TransactionAction' as used in the
> @transaction command.  It wouldn't change what is sent over the
> wire in JSON, and until we have full introspection, there is no
> visibility into the type name used.  Changing the name now would
> let it be more generic to adding future transaction items that are
> not blockdev related.

Right.  For example, "cont" could be made transactionable too (and
executed only if the transaction succeeds).

Paolo

>> 
>> We should also make it transactionable from the beginning, as we
>> don't have schema introspection yet. This way we allow to assume
>> that if the standalone command exists, the transaction subcommand
>> exists as well.
> 
> Agreed - existence of a command at the same time the command is
> made transactionable serves as a nice substitute for not having
> full introspection into the 'BlockdevAction' union type, whereas if
> we introduce the command now but not transaction support until 1.7,
> life becomes tougher to know when it can be used where (although I
> HOPE we have introspection in 1.6).
> 

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJRjfslAAoJEBvWZb6bTYbyiVkQAJpqoUcmTmzY9P8Me7pSlR5p
MVogKDdvtFpr+GVWWaiykB4rN79gGjduOGHtMpScuE3Grr42nFSeGnJoKqKP788T
1ZCaEaOt/Il3PeGWJOM7y8RkxnieTOqehPIUODq/qHVKE+mN0+sFlvGI67lhQveI
NWJCV2gzN6z7aCJE291BZxU4dtZzJv1SnkGqPZ2z/sEfQZulpVhmleE44SQRgFE1
oaSdfbniz/TRqmB5x8E3444X7YIZ9I+NTZuGlWr6V8tT9C5tnrB3jhMId9TVbQQg
2tDuYdm735kEC7K7byOtYWxJEKsEca/6dV5LbLh1gyoJWWb6/DM/bZ0je4XbtBFY
sxbKAV2llDbBRa8yWkp7p+N9THARj00skb3u9rE38+UP9p7aQUW6ZXwsn0cawBec
njDjdmnq4JiZQ+ez+wAFfxZmC1kx2Zfxsg/2aws67cf3ySzr14PSe0czgQaW0rgM
BP+7W4pd6pGmrnK+ASEK2r2gWniiF1OyngV4Q3v6d7SIAkKU6fPcu5iY/9INveNv
JRlMw/GE5/POENCuFhA6CEv/Dg48H6j9u9N44fC5i3KpS8mJaTn83av3MoBvAfqv
JAH2ZNHxaK+HoV4oxSm4gbIehlI1gh19Mb08u226EdGYcupTkPvDnOU+GhJFEqoN
0nQLUIysLgHjGJzQtX1/
=ULCr
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/3] block: add block-backup QMP command
  2013-05-11  3:34     ` Eric Blake
  2013-05-11  8:02       ` Paolo Bonzini
@ 2013-05-13  8:23       ` Kevin Wolf
  1 sibling, 0 replies; 27+ messages in thread
From: Kevin Wolf @ 2013-05-13  8:23 UTC (permalink / raw)
  To: Eric Blake
  Cc: Fam Zheng, qemu-devel, Wenchao Xia, imain, Stefan Hajnoczi,
	pbonzini, dietmar

Am 11.05.2013 um 05:34 hat Eric Blake geschrieben:
> On 05/08/2013 06:49 AM, Kevin Wolf wrote:
> > Am 29.04.2013 um 09:42 hat Stefan Hajnoczi geschrieben:
> >> @block-backup
> >>
> 
> > drive-backup would probably be a more consistent naming. We would then
> > still have block-backup for a future low-level command that doesn't
> > create everything by itself but takes an existing BlockDriverState (e.g.
> > created by blockdev-add).
> 
> At least it would match why we named a command 'drive-mirror' instead of
> 'block-mirror'.
> 
> Hmm, looking at qapi-schema.json, I wonder if we can rename
> 'BlockdevAction' to 'TransactionAction' as used in the @transaction
> command.  It wouldn't change what is sent over the wire in JSON, and
> until we have full introspection, there is no visibility into the type
> name used.  Changing the name now would let it be more generic to adding
> future transaction items that are not blockdev related.

Good point, I never realised that once we have schema introspection,
doing such changes is harder. I'm all for it - it's bad enough that we
have specifically block jobs instead of just background jobs, we
shouldn't repeat the same mistake with transactions.

Kevin

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/3] block: add block-backup QMP command
  2013-05-11  4:02   ` Eric Blake
@ 2013-05-13  8:28     ` Kevin Wolf
  2013-05-13 12:56       ` Eric Blake
  2013-05-14  8:48     ` Stefan Hajnoczi
  1 sibling, 1 reply; 27+ messages in thread
From: Kevin Wolf @ 2013-05-13  8:28 UTC (permalink / raw)
  To: Eric Blake
  Cc: Fam Zheng, qemu-devel, Wenchao Xia, imain, Stefan Hajnoczi,
	pbonzini, lcapitulino, dietmar

Am 11.05.2013 um 06:02 hat Eric Blake geschrieben:
> On 04/29/2013 01:42 AM, Stefan Hajnoczi wrote:
> > @block-backup
> > 
> > +++ b/qapi-schema.json
> > @@ -1715,6 +1715,37 @@
> >              '*speed': 'int' } }
> >  
> >  ##
> > +# @block-backup
> > +#
> > +# Start a point-in-time copy of a block device to a new destination.  The
> > +# status of ongoing block backup operations can be checked with
> > +# query-block-jobs.  The operation can be stopped before it has completed using
> > +# the block-job-cancel command.
> 
> Still might be worth mentioning that 'query-block-jobs' will list it as
> a job of type 'backup'.
> 
> > +#
> > +# @device: the name of the device whose writes should be mirrored.
> > +#
> > +# @target: the target of the new image. If the file exists, or if it
> > +#          is a device, the existing file/device will be used as the new
> > +#          destination.  If it does not exist, a new file will be created.
> > +#
> > +# @format: #optional the format of the new destination, default is to
> > +#          probe if @mode is 'existing', else the format of the source
> > +#
> > +# @mode: #optional whether and how QEMU should create a new image, default is
> > +#        'absolute-paths'.
> > +#
> > +# @speed: #optional the maximum speed, in bytes per second
> > +#
> > +# Returns: nothing on success
> > +#          If @device is not a valid block device, DeviceNotFound
> > +#
> > +# Since 1.6
> > +##
> > +{ 'command': 'block-backup',
> > +  'data': { 'device': 'str', 'target': 'str', '*format': 'str',
> 
> Hmm - wondering if we should add an enum type for supported disk formats
> instead of using free-form strings.  The wire representation would be
> the same, and now's the time to do it before we add introspection (it's
> more than just this command impacted).

And ideally we shouldn't make it a static list that contains every
format for which qemu has some code, but only those that are actually
compiled in. (Hm, and probably not protocols?)

Luiz, any idea how to do something like this, a QAPI enum with values
that are determined at runtime? Especially with respect to the coming
schema introspection?

Kevin

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/3] block: add block-backup QMP command
  2013-05-13  8:28     ` Kevin Wolf
@ 2013-05-13 12:56       ` Eric Blake
  2013-05-13 13:09         ` Kevin Wolf
  0 siblings, 1 reply; 27+ messages in thread
From: Eric Blake @ 2013-05-13 12:56 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Fam Zheng, qemu-devel, Wenchao Xia, imain, Stefan Hajnoczi,
	pbonzini, lcapitulino, dietmar

[-- Attachment #1: Type: text/plain, Size: 1178 bytes --]

On 05/13/2013 02:28 AM, Kevin Wolf wrote:

>>> +{ 'command': 'block-backup',
>>> +  'data': { 'device': 'str', 'target': 'str', '*format': 'str',
>>
>> Hmm - wondering if we should add an enum type for supported disk formats
>> instead of using free-form strings.  The wire representation would be
>> the same, and now's the time to do it before we add introspection (it's
>> more than just this command impacted).
> 
> And ideally we shouldn't make it a static list that contains every
> format for which qemu has some code, but only those that are actually
> compiled in. (Hm, and probably not protocols?)
> 
> Luiz, any idea how to do something like this, a QAPI enum with values
> that are determined at runtime? Especially with respect to the coming
> schema introspection?

Or maybe we make the 'enum' list ALL possible types, but then add a
query-* command that returns an array of only those enum values that are
supported.  Introspection would see all types, but the query command
would be the useful variant that is runtime-dependent.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 621 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/3] block: add block-backup QMP command
  2013-05-13 12:56       ` Eric Blake
@ 2013-05-13 13:09         ` Kevin Wolf
  2013-05-13 13:18           ` Luiz Capitulino
  0 siblings, 1 reply; 27+ messages in thread
From: Kevin Wolf @ 2013-05-13 13:09 UTC (permalink / raw)
  To: Eric Blake
  Cc: Fam Zheng, qemu-devel, Wenchao Xia, imain, Stefan Hajnoczi,
	pbonzini, lcapitulino, dietmar

Am 13.05.2013 um 14:56 hat Eric Blake geschrieben:
> On 05/13/2013 02:28 AM, Kevin Wolf wrote:
> 
> >>> +{ 'command': 'block-backup',
> >>> +  'data': { 'device': 'str', 'target': 'str', '*format': 'str',
> >>
> >> Hmm - wondering if we should add an enum type for supported disk formats
> >> instead of using free-form strings.  The wire representation would be
> >> the same, and now's the time to do it before we add introspection (it's
> >> more than just this command impacted).
> > 
> > And ideally we shouldn't make it a static list that contains every
> > format for which qemu has some code, but only those that are actually
> > compiled in. (Hm, and probably not protocols?)
> > 
> > Luiz, any idea how to do something like this, a QAPI enum with values
> > that are determined at runtime? Especially with respect to the coming
> > schema introspection?
> 
> Or maybe we make the 'enum' list ALL possible types, but then add a
> query-* command that returns an array of only those enum values that are
> supported.  Introspection would see all types, but the query command
> would be the useful variant that is runtime-dependent.

Then is there any advantage in making it an enum in the first place? Can
libvirt really make use of the information "this qemu version could have
been compiled with VHDX support, but it hasn't"?

Kevin

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/3] block: add block-backup QMP command
  2013-05-13 13:09         ` Kevin Wolf
@ 2013-05-13 13:18           ` Luiz Capitulino
  2013-05-13 14:14             ` Eric Blake
  0 siblings, 1 reply; 27+ messages in thread
From: Luiz Capitulino @ 2013-05-13 13:18 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Fam Zheng, qemu-devel, Wenchao Xia, imain, Stefan Hajnoczi,
	pbonzini, dietmar

On Mon, 13 May 2013 15:09:31 +0200
Kevin Wolf <kwolf@redhat.com> wrote:

> Am 13.05.2013 um 14:56 hat Eric Blake geschrieben:
> > On 05/13/2013 02:28 AM, Kevin Wolf wrote:
> > 
> > >>> +{ 'command': 'block-backup',
> > >>> +  'data': { 'device': 'str', 'target': 'str', '*format': 'str',
> > >>
> > >> Hmm - wondering if we should add an enum type for supported disk formats
> > >> instead of using free-form strings.  The wire representation would be
> > >> the same, and now's the time to do it before we add introspection (it's
> > >> more than just this command impacted).
> > > 
> > > And ideally we shouldn't make it a static list that contains every
> > > format for which qemu has some code, but only those that are actually
> > > compiled in. (Hm, and probably not protocols?)
> > > 
> > > Luiz, any idea how to do something like this, a QAPI enum with values
> > > that are determined at runtime? Especially with respect to the coming
> > > schema introspection?
> > 
> > Or maybe we make the 'enum' list ALL possible types, but then add a
> > query-* command that returns an array of only those enum values that are
> > supported.  Introspection would see all types, but the query command
> > would be the useful variant that is runtime-dependent.

Agreed. This is a capability, and we're adding query- commands to query
capabilities.

> Then is there any advantage in making it an enum in the first place?

Eric is in a better position to answer this, but the fact that this can
be queried isn't a strong pro for having it?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/3] block: add block-backup QMP command
  2013-05-13 13:18           ` Luiz Capitulino
@ 2013-05-13 14:14             ` Eric Blake
  2013-05-13 14:27               ` Kevin Wolf
  0 siblings, 1 reply; 27+ messages in thread
From: Eric Blake @ 2013-05-13 14:14 UTC (permalink / raw)
  To: Luiz Capitulino
  Cc: Kevin Wolf, Fam Zheng, qemu-devel, Wenchao Xia, imain,
	Stefan Hajnoczi, pbonzini, dietmar

[-- Attachment #1: Type: text/plain, Size: 1919 bytes --]

On 05/13/2013 07:18 AM, Luiz Capitulino wrote:
>>>> Luiz, any idea how to do something like this, a QAPI enum with values
>>>> that are determined at runtime? Especially with respect to the coming
>>>> schema introspection?
>>>
>>> Or maybe we make the 'enum' list ALL possible types, but then add a
>>> query-* command that returns an array of only those enum values that are
>>> supported.  Introspection would see all types, but the query command
>>> would be the useful variant that is runtime-dependent.
> 
> Agreed. This is a capability, and we're adding query- commands to query
> capabilities.
> 
>> Then is there any advantage in making it an enum in the first place?
> 
> Eric is in a better position to answer this, but the fact that this can
> be queried isn't a strong pro for having it?

Hmm, you raise an interesting point - if we have a query-block-formats
command that returns an array of strings, then keep 'str' everywhere
else a format is required, that is no different for what is sent over
the wire compared to a query-block-formats that returns an array of
'BlockFormat' enum values, with the enum showing all possible formats
(even if support wasn't compiled in), and with 'BlockFormat' everywhere
else.  Introspection-wise, you'd have to know that you call
query-block-formats instead of introspecting on the type of the format
argument, but information-wise, there's no loss of details if the query-
command provides the runtime list, and the remaining commands stick with
'str'.  I still think an enum is a bit nicer from the type-safety
aspect, but I'm finding it hard to envision any scenario where libvirt
would have to resort to introspection if we have a query-block-formats
in place, and thus am not opposed to your idea of avoiding the enum.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 621 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/3] block: add block-backup QMP command
  2013-05-13 14:14             ` Eric Blake
@ 2013-05-13 14:27               ` Kevin Wolf
  2013-05-13 14:50                 ` Eric Blake
  0 siblings, 1 reply; 27+ messages in thread
From: Kevin Wolf @ 2013-05-13 14:27 UTC (permalink / raw)
  To: Eric Blake
  Cc: Fam Zheng, Wenchao Xia, qemu-devel, Luiz Capitulino, imain,
	Stefan Hajnoczi, pbonzini, dietmar

Am 13.05.2013 um 16:14 hat Eric Blake geschrieben:
> On 05/13/2013 07:18 AM, Luiz Capitulino wrote:
> >>>> Luiz, any idea how to do something like this, a QAPI enum with values
> >>>> that are determined at runtime? Especially with respect to the coming
> >>>> schema introspection?
> >>>
> >>> Or maybe we make the 'enum' list ALL possible types, but then add a
> >>> query-* command that returns an array of only those enum values that are
> >>> supported.  Introspection would see all types, but the query command
> >>> would be the useful variant that is runtime-dependent.
> > 
> > Agreed. This is a capability, and we're adding query- commands to query
> > capabilities.
> > 
> >> Then is there any advantage in making it an enum in the first place?
> > 
> > Eric is in a better position to answer this, but the fact that this can
> > be queried isn't a strong pro for having it?
> 
> Hmm, you raise an interesting point - if we have a query-block-formats
> command that returns an array of strings, then keep 'str' everywhere
> else a format is required, that is no different for what is sent over
> the wire compared to a query-block-formats that returns an array of
> 'BlockFormat' enum values, with the enum showing all possible formats
> (even if support wasn't compiled in), and with 'BlockFormat' everywhere
> else.  Introspection-wise, you'd have to know that you call
> query-block-formats instead of introspecting on the type of the format
> argument, but information-wise, there's no loss of details if the query-
> command provides the runtime list, and the remaining commands stick with
> 'str'.  I still think an enum is a bit nicer from the type-safety
> aspect, but I'm finding it hard to envision any scenario where libvirt
> would have to resort to introspection if we have a query-block-formats
> in place, and thus am not opposed to your idea of avoiding the enum.

I think long term we'll need a dynamic schema anyway. As we go forward
with modularisation and putting things into shared libraries, we'll have
modules that add support for commands, enum values, etc.

Providing the full list of theoretically available elements (i.e. what
would be there, if everything was compiled and all modules were loaded)
would probably implement the spec for the introspection interfaces by the
letter, but just give useless information. Callers want to know what's
really there.

If we're going to have a query-* command for everything, then we don't
need introspection at all. I would however prefer having the uniform
schema introspection and building the schema at runtime instead of many
separate query-* commands.

Kevin

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/3] block: add block-backup QMP command
  2013-05-13 14:27               ` Kevin Wolf
@ 2013-05-13 14:50                 ` Eric Blake
  2013-05-14  2:18                   ` Wenchao Xia
  0 siblings, 1 reply; 27+ messages in thread
From: Eric Blake @ 2013-05-13 14:50 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Fam Zheng, Wenchao Xia, qemu-devel, Luiz Capitulino, imain,
	Stefan Hajnoczi, pbonzini, dietmar

[-- Attachment #1: Type: text/plain, Size: 1476 bytes --]

On 05/13/2013 08:27 AM, Kevin Wolf wrote:
> I think long term we'll need a dynamic schema anyway. As we go forward
> with modularisation and putting things into shared libraries, we'll have
> modules that add support for commands, enum values, etc.

In other words, qapi-schema.json should have a way to declare a
dynamic-enum, where introspection on that enum will see what is made
available at runtime, rather than manually listing the enum contents
directly in the .json file.

> 
> Providing the full list of theoretically available elements (i.e. what
> would be there, if everything was compiled and all modules were loaded)
> would probably implement the spec for the introspection interfaces by the
> letter, but just give useless information. Callers want to know what's
> really there.
> 
> If we're going to have a query-* command for everything, then we don't
> need introspection at all. I would however prefer having the uniform
> schema introspection and building the schema at runtime instead of many
> separate query-* commands.

Indeed, having introspection of a dynamic enum results in fewer commands
overall, by making a reusable command have more power.  And maybe it's
possible to have introspection do both - with an optional boolean
parameter that distinguishes between full vs. runtime querying of any
enum type.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 621 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/3] block: add block-backup QMP command
  2013-05-13 14:50                 ` Eric Blake
@ 2013-05-14  2:18                   ` Wenchao Xia
  0 siblings, 0 replies; 27+ messages in thread
From: Wenchao Xia @ 2013-05-14  2:18 UTC (permalink / raw)
  To: Eric Blake
  Cc: Kevin Wolf, Fam Zheng, qemu-devel, Luiz Capitulino, imain,
	Stefan Hajnoczi, pbonzini, dietmar

于 2013-5-13 22:50, Eric Blake 写道:
> On 05/13/2013 08:27 AM, Kevin Wolf wrote:
>> I think long term we'll need a dynamic schema anyway. As we go forward
>> with modularisation and putting things into shared libraries, we'll have
>> modules that add support for commands, enum values, etc.
>
> In other words, qapi-schema.json should have a way to declare a
> dynamic-enum, where introspection on that enum will see what is made
> available at runtime, rather than manually listing the enum contents
> directly in the .json file.
>
>>
>> Providing the full list of theoretically available elements (i.e. what
>> would be there, if everything was compiled and all modules were loaded)
>> would probably implement the spec for the introspection interfaces by the
>> letter, but just give useless information. Callers want to know what's
>> really there.
>>
>> If we're going to have a query-* command for everything, then we don't
>> need introspection at all. I would however prefer having the uniform
>> schema introspection and building the schema at runtime instead of many
>> separate query-* commands.
>
> Indeed, having introspection of a dynamic enum results in fewer commands
> overall, by making a reusable command have more power.  And maybe it's
> possible to have introspection do both - with an optional boolean
> parameter that distinguishes between full vs. runtime querying of any
> enum type.
>
   "string <-> enum mapping" issue happens when I try coding libqblock,
what I found is that, since lower level implement code still depends
string, so there will be another translating back to string inside
later. For format string, it is probably OK since two functions will be
enough, but when more enum is wanted, things become complex.
    So I feel it is right to try use enum in the API now, but maybe a
plan to reorganize block layer will be also needed, which may be
a bit complex needing serous plan.


-- 
Best Regards

Wenchao Xia

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/3] block: add block-backup QMP command
  2013-05-08 12:49   ` Kevin Wolf
  2013-05-11  3:34     ` Eric Blake
@ 2013-05-14  8:44     ` Stefan Hajnoczi
  1 sibling, 0 replies; 27+ messages in thread
From: Stefan Hajnoczi @ 2013-05-14  8:44 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Fam Zheng, qemu-devel, Wenchao Xia, imain, pbonzini, dietmar

On Wed, May 08, 2013 at 02:49:00PM +0200, Kevin Wolf wrote:
> Am 29.04.2013 um 09:42 hat Stefan Hajnoczi geschrieben:
> > @block-backup
> > 
> > Start a point-in-time copy of a block device to a new destination.  The
> > status of ongoing block backup operations can be checked with
> > query-block-jobs.  The operation can be stopped before it has completed using
> > the block-job-cancel command.
> > 
> > @device: the name of the device whose writes should be mirrored.
> > 
> > @target: the target of the new image. If the file exists, or if it
> >          is a device, the existing file/device will be used as the new
> >          destination.  If it does not exist, a new file will be created.
> > 
> > @format: #optional the format of the new destination, default is to
> >          probe if @mode is 'existing', else the format of the source
> > 
> > @mode: #optional whether and how QEMU should create a new image, default is
> >        'absolute-paths'.
> > 
> > @speed: #optional the maximum speed, in bytes per second
> > 
> > Returns: nothing on success
> >          If @device is not a valid block device, DeviceNotFound
> > 
> > Since 1.6
> > 
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> 
> drive-backup would probably be a more consistent naming. We would then
> still have block-backup for a future low-level command that doesn't
> create everything by itself but takes an existing BlockDriverState (e.g.
> created by blockdev-add).
> 
> We should also make it transactionable from the beginning, as we don't
> have schema introspection yet. This way we allow to assume that if the
> standalone command exists, the transaction subcommand exists as well.

Sounds good.  I'll make both changes.

Stefan

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/3] block: add block-backup QMP command
  2013-05-11  4:02   ` Eric Blake
  2013-05-13  8:28     ` Kevin Wolf
@ 2013-05-14  8:48     ` Stefan Hajnoczi
  1 sibling, 0 replies; 27+ messages in thread
From: Stefan Hajnoczi @ 2013-05-14  8:48 UTC (permalink / raw)
  To: Eric Blake
  Cc: Kevin Wolf, Fam Zheng, qemu-devel, Wenchao Xia, imain, pbonzini,
	dietmar

On Fri, May 10, 2013 at 10:02:14PM -0600, Eric Blake wrote:
> On 04/29/2013 01:42 AM, Stefan Hajnoczi wrote:
> > @block-backup
> > 
> > +++ b/qapi-schema.json
> > @@ -1715,6 +1715,37 @@
> >              '*speed': 'int' } }
> >  
> >  ##
> > +# @block-backup
> > +#
> > +# Start a point-in-time copy of a block device to a new destination.  The
> > +# status of ongoing block backup operations can be checked with
> > +# query-block-jobs.  The operation can be stopped before it has completed using
> > +# the block-job-cancel command.
> 
> Still might be worth mentioning that 'query-block-jobs' will list it as
> a job of type 'backup'.

Will fix in v3.

> > +#
> > +# @device: the name of the device whose writes should be mirrored.
> > +#
> > +# @target: the target of the new image. If the file exists, or if it
> > +#          is a device, the existing file/device will be used as the new
> > +#          destination.  If it does not exist, a new file will be created.
> > +#
> > +# @format: #optional the format of the new destination, default is to
> > +#          probe if @mode is 'existing', else the format of the source
> > +#
> > +# @mode: #optional whether and how QEMU should create a new image, default is
> > +#        'absolute-paths'.
> > +#
> > +# @speed: #optional the maximum speed, in bytes per second
> > +#
> > +# Returns: nothing on success
> > +#          If @device is not a valid block device, DeviceNotFound
> > +#
> > +# Since 1.6
> > +##
> > +{ 'command': 'block-backup',
> > +  'data': { 'device': 'str', 'target': 'str', '*format': 'str',
> 
> Hmm - wondering if we should add an enum type for supported disk formats
> instead of using free-form strings.  The wire representation would be
> the same, and now's the time to do it before we add introspection (it's
> more than just this command impacted).

Interesting discussion about dynamic schema.  Since the wire format is
the same, I don't want to attempt solving the problem in the
block-backup series.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/3] block: add basic backup support to block driver
  2013-05-08 12:39   ` Kevin Wolf
  2013-05-08 15:43     ` Paolo Bonzini
@ 2013-05-14  8:51     ` Stefan Hajnoczi
  2013-05-14 13:24     ` Stefan Hajnoczi
  2 siblings, 0 replies; 27+ messages in thread
From: Stefan Hajnoczi @ 2013-05-14  8:51 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Fam Zheng, qemu-devel, Wenchao Xia, imain, pbonzini, dietmar

On Wed, May 08, 2013 at 02:39:25PM +0200, Kevin Wolf wrote:
> Am 29.04.2013 um 09:42 hat Stefan Hajnoczi geschrieben:
>
> Sorry, but this patch doesn't look very close to mergable yet.

Fair enough.  I had some ideas on decoupling the backup block job from
the core block layer, making fewer assumptions on block jobs.  I've
tried to stick close to Dietmar's original patch but will be more
aggressive in v3.

I think together with some of Paolo's idea we can clean this up nicely.

Stefan

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/3] block: add basic backup support to block driver
  2013-05-08 12:39   ` Kevin Wolf
  2013-05-08 15:43     ` Paolo Bonzini
  2013-05-14  8:51     ` Stefan Hajnoczi
@ 2013-05-14 13:24     ` Stefan Hajnoczi
  2013-05-14 13:43       ` Kevin Wolf
  2 siblings, 1 reply; 27+ messages in thread
From: Stefan Hajnoczi @ 2013-05-14 13:24 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Fam Zheng, qemu-devel, dietmar, imain, Stefan Hajnoczi, pbonzini,
	Wenchao Xia

On Wed, May 08, 2013 at 02:39:25PM +0200, Kevin Wolf wrote:
> Am 29.04.2013 um 09:42 hat Stefan Hajnoczi geschrieben:
> > diff --git a/include/block/blockjob.h b/include/block/blockjob.h
> > index c290d07..6f42495 100644
> > --- a/include/block/blockjob.h
> > +++ b/include/block/blockjob.h
> > @@ -50,6 +50,13 @@ typedef struct BlockJobType {
> >       * manually.
> >       */
> >      void (*complete)(BlockJob *job, Error **errp);
> > +
> > +    /** tracked requests */
> > +    int coroutine_fn (*before_read)(BlockDriverState *bs, int64_t sector_num,
> > +                                    int nb_sectors, QEMUIOVector *qiov);
> > +    int coroutine_fn (*before_write)(BlockDriverState *bs, int64_t sector_num,
> > +                                     int nb_sectors, QEMUIOVector *qiov);
> > +
> >  } BlockJobType;
> 
> This is actually a sign that a block job isn't the right tool. Jobs are
> something that runs in the background and doesn't have callbacks. You
> really want to have a filter here (that happens to be coupled to a job).
> Need the BlockBackend split before we can do this right.
> 
> The second thing that this conflicts with is generalising block jobs to
> generic background jobs.
> 
> Each hack like this that we accumulate makes it harder to get the real
> thing eventually.

In v3 I added a slightly cleaner interface:

bdrv_set_before_write_cb(bs, backup_before_write);

This way the "before write" callback is not tied to block jobs and
could be reused in the future.

The alternative is to create a BlockDriver that mostly delegates plus
uses bdrv_swap().  The boilerplate involved in that is ugly though, so
I'm using bdrv_set_before_write_cb() instead.

> >  
> >  /**
> > @@ -103,6 +110,9 @@ struct BlockJob {
> >      /** Speed that was set with @block_job_set_speed.  */
> >      int64_t speed;
> >  
> > +    /** tracked requests */
> > +    int cluster_size;
> 
> Sure that this is the right comment here?
> 
> Does really every job need a cluster size?

Dropped in v3.

> > diff --git a/block.c b/block.c
> > index aa9a533..c5c09b7 100644
> > --- a/block.c
> > +++ b/block.c
> > @@ -54,6 +54,7 @@
> >  typedef enum {
> >      BDRV_REQ_COPY_ON_READ = 0x1,
> >      BDRV_REQ_ZERO_WRITE   = 0x2,
> > +    BDRV_REQ_BACKUP_ONLY  = 0x4,
> >  } BdrvRequestFlags;
> 
> Without having read the rest of the code, it's unclear to me what this
> new flag means. Could use a comment. (Though it has "backup" in its name,
> so I suspect having this in generic block layer is wrong.)

Agreed.  Dropped in v3.

> >  static void bdrv_dev_change_media_cb(BlockDriverState *bs, bool load);
> > @@ -1902,6 +1903,22 @@ void bdrv_round_to_clusters(BlockDriverState *bs,
> >      }
> >  }
> >  
> > +/**
> > + * Round a region to job cluster boundaries
> > + */
> > +static void round_to_job_clusters(BlockDriverState *bs,
> > +                                  int64_t sector_num, int nb_sectors,
> > +                                  int job_cluster_size,
> > +                                  int64_t *cluster_sector_num,
> > +                                  int *cluster_nb_sectors)
> > +{
> > +    int64_t c = job_cluster_size / BDRV_SECTOR_SIZE;
> > +
> > +    *cluster_sector_num = QEMU_ALIGN_DOWN(sector_num, c);
> > +    *cluster_nb_sectors = QEMU_ALIGN_UP(sector_num - *cluster_sector_num +
> > +                                        nb_sectors, c);
> > +}
> 
> This function has really nothing to do with jobs except that it happens
> to be useful in some function actually related to jobs.
> 
> It should probably be renamed and bdrv_round_to_clusters() should call
> into it.

Dropped in v3.  block/backup.c works in bitmap granularity units so we
don't need to mess with sectors and cluster sizes.

> > +
> >  static bool tracked_request_overlaps(BdrvTrackedRequest *req,
> >                                       int64_t sector_num, int nb_sectors) {
> >      /*        aaaa   bbbb */
> > @@ -1916,7 +1933,9 @@ static bool tracked_request_overlaps(BdrvTrackedRequest *req,
> >  }
> >  
> >  static void coroutine_fn wait_for_overlapping_requests(BlockDriverState *bs,
> > -        int64_t sector_num, int nb_sectors)
> > +                                                       int64_t sector_num,
> > +                                                       int nb_sectors,
> > +                                                       int job_cluster_size)
> >  {
> >      BdrvTrackedRequest *req;
> >      int64_t cluster_sector_num;
> > @@ -1932,6 +1951,11 @@ static void coroutine_fn wait_for_overlapping_requests(BlockDriverState *bs,
> >      bdrv_round_to_clusters(bs, sector_num, nb_sectors,
> >                             &cluster_sector_num, &cluster_nb_sectors);
> >  
> > +    if (job_cluster_size) {
> > +        round_to_job_clusters(bs, sector_num, nb_sectors, job_cluster_size,
> > +                              &cluster_sector_num, &cluster_nb_sectors);
> > +    }
> > +
> >      do {
> >          retry = false;
> >          QLIST_FOREACH(req, &bs->tracked_requests, list) {
> > @@ -2507,12 +2531,24 @@ static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs,
> >          bs->copy_on_read_in_flight++;
> >      }
> >  
> > -    if (bs->copy_on_read_in_flight) {
> > -        wait_for_overlapping_requests(bs, sector_num, nb_sectors);
> > +    int job_cluster_size = bs->job && bs->job->cluster_size ?
> > +        bs->job->cluster_size : 0;
> > +
> > +    if (bs->copy_on_read_in_flight || job_cluster_size) {
> > +        wait_for_overlapping_requests(bs, sector_num, nb_sectors,
> > +                                      job_cluster_size);
> >      }
> >  
> >      tracked_request_begin(&req, bs, sector_num, nb_sectors, false);
> >  
> > +    if (bs->job && bs->job->job_type->before_read) {
> > +        ret = bs->job->job_type->before_read(bs, sector_num, nb_sectors, qiov);
> > +        if ((ret < 0) || (flags & BDRV_REQ_BACKUP_ONLY)) {
> > +            /* Note: We do not return any data to the caller */
> > +            goto out;
> > +        }
> > +    }
> 
> Ugh, so we're introducing calls of bdrv_co_do_readv() that don't
> actually read, but only cause some side-effects of reads, except if
> there is no block job running?
> 
> Is it intentional that these fake requests are considered for I/O
> throttling?

Dropped in v3, using a separate queue of in-flight CoW requests in
block/backup.c.

> > +
> >      if (flags & BDRV_REQ_COPY_ON_READ) {
> >          int pnum;
> >  
> > @@ -2556,6 +2592,17 @@ int coroutine_fn bdrv_co_copy_on_readv(BlockDriverState *bs,
> >                              BDRV_REQ_COPY_ON_READ);
> >  }
> >  
> > +int coroutine_fn bdrv_co_backup(BlockDriverState *bs,
> > +    int64_t sector_num, int nb_sectors)
> > +{
> > +    if (!bs->job) {
> > +        return -ENOTSUP;
> > +    }
> 
> Don't you expect a very specific job to be running, or is it acceptable
> to have streaming running? It looks a bit arbitrary.
> 
> And you make the assumption that the functionality (running only
> side-effects of read) can only work correctly in the context of jobs (or
> actually not just in the context of jobs, but only when a job is running
> at the same time). Not sure why.
> 
> I suspect you're not really checking what you wanted to check here.

Dropped in v3.

> > +
> > +    return bdrv_co_do_readv(bs, sector_num, nb_sectors, NULL,
> > +                            BDRV_REQ_BACKUP_ONLY);
> > +}
> > +
> >  static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs,
> >      int64_t sector_num, int nb_sectors)
> >  {
> > @@ -2613,12 +2660,23 @@ static int coroutine_fn bdrv_co_do_writev(BlockDriverState *bs,
> >          bdrv_io_limits_intercept(bs, true, nb_sectors);
> >      }
> >  
> > -    if (bs->copy_on_read_in_flight) {
> > -        wait_for_overlapping_requests(bs, sector_num, nb_sectors);
> > +    int job_cluster_size = bs->job && bs->job->cluster_size ?
> > +        bs->job->cluster_size : 0;
> > +
> > +    if (bs->copy_on_read_in_flight || job_cluster_size) {
> > +        wait_for_overlapping_requests(bs, sector_num, nb_sectors,
> > +                                      job_cluster_size);
> >      }
> >  
> >      tracked_request_begin(&req, bs, sector_num, nb_sectors, true);
> >  
> > +    if (bs->job && bs->job->job_type->before_write) {
> > +        ret = bs->job->job_type->before_write(bs, sector_num, nb_sectors, qiov);
> > +        if (ret < 0) {
> > +            goto out;
> > +        }
> > +    }
> > +
> >      if (flags & BDRV_REQ_ZERO_WRITE) {
> >          ret = bdrv_co_do_write_zeroes(bs, sector_num, nb_sectors);
> >      } else {
> > @@ -2637,6 +2695,7 @@ static int coroutine_fn bdrv_co_do_writev(BlockDriverState *bs,
> >          bs->wr_highest_sector = sector_num + nb_sectors - 1;
> >      }
> >  
> > +out:
> >      tracked_request_end(&req);
> >  
> >      return ret;
> 
> The changes to block.c and friends should be a separate patch, so let me
> sum up my expectations here: This patch should be as small as possible,
> it should mention the word "backup" as little as possible and it should
> build successfully without backup.o.
> 
> This is the very minimum at which we can start talking about the patch
> and whether we do proper block filters or what the best way is to work
> around the lack of them today.
> 
> And then we get to things like fake requests which aren't my favourite
> design either...

Split into two patches for v3:

block: add bdrv_set_before_write_cb()
block: add basic backup support to block driver

This patch no longer touches block.c or headers.

> > diff --git a/block/Makefile.objs b/block/Makefile.objs
> > index 6c4b5bc..dcab6d3 100644
> > --- a/block/Makefile.objs
> > +++ b/block/Makefile.objs
> > @@ -19,5 +19,6 @@ endif
> >  common-obj-y += stream.o
> >  common-obj-y += commit.o
> >  common-obj-y += mirror.o
> > +common-obj-y += backup.o
> >  
> >  $(obj)/curl.o: QEMU_CFLAGS+=$(CURL_CFLAGS)
> > diff --git a/block/backup.c b/block/backup.c
> > new file mode 100644
> > index 0000000..45d8c21
> > --- /dev/null
> > +++ b/block/backup.c
> > @@ -0,0 +1,252 @@
> > +/*
> > + * QEMU backup
> > + *
> > + * Copyright (C) 2013 Proxmox Server Solutions
> > + *
> > + * Authors:
> > + *  Dietmar Maurer (dietmar@proxmox.com)
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> > + * See the COPYING file in the top-level directory.
> > + *
> > + */
> > +
> > +#include <stdio.h>
> > +#include <errno.h>
> > +#include <unistd.h>
> > +
> > +#include "block/block.h"
> > +#include "block/block_int.h"
> > +#include "block/blockjob.h"
> > +#include "qemu/ratelimit.h"
> > +
> > +#define DEBUG_BACKUP 0
> > +
> > +#define DPRINTF(fmt, ...) \
> > +    do { \
> > +        if (DEBUG_BACKUP) { \
> > +            fprintf(stderr, "backup: " fmt, ## __VA_ARGS__); \
> > +        } \
> > +    } while (0)
> > +
> > +#define BACKUP_CLUSTER_BITS 16
> > +#define BACKUP_CLUSTER_SIZE (1 << BACKUP_CLUSTER_BITS)
> > +#define BACKUP_BLOCKS_PER_CLUSTER (BACKUP_CLUSTER_SIZE / BDRV_SECTOR_SIZE)
> > +
> > +#define SLICE_TIME 100000000ULL /* ns */
> > +
> > +typedef struct BackupBlockJob {
> > +    BlockJob common;
> > +    BlockDriverState *target;
> > +    RateLimit limit;
> > +    CoRwlock rwlock;
> 
> flush_rwlock?

Thanks, renamed in v3.

> > +    uint64_t sectors_read;
> > +    HBitmap *bitmap;
> > +} BackupBlockJob;
> > +
> > +static int coroutine_fn backup_do_cow(BlockDriverState *bs,
> > +                                      int64_t sector_num, int nb_sectors)
> > +{
> > +    assert(bs);
> > +    BackupBlockJob *job = (BackupBlockJob *)bs->job;
> > +    assert(job);
> > +
> > +    BlockDriver *drv = bs->drv;
> > +    struct iovec iov;
> > +    QEMUIOVector bounce_qiov;
> > +    void *bounce_buffer = NULL;
> > +    int ret = 0;
> > +
> > +    qemu_co_rwlock_rdlock(&job->rwlock);
> > +
> > +    int64_t start, end;
> > +
> > +    start = sector_num / BACKUP_BLOCKS_PER_CLUSTER;
> > +    end = DIV_ROUND_UP(sector_num + nb_sectors, BACKUP_BLOCKS_PER_CLUSTER);
> > +
> > +    DPRINTF("brdv_co_backup_cow enter %s C%" PRId64 " %" PRId64 " %d\n",
> > +            bdrv_get_device_name(bs), start, sector_num, nb_sectors);
> > +
> > +    for (; start < end; start++) {
> > +        if (hbitmap_get(job->bitmap, start)) {
> > +            DPRINTF("brdv_co_backup_cow skip C%" PRId64 "\n", start);
> > +            continue; /* already copied */
> > +        }
> > +
> > +        /* immediately set bitmap (avoid coroutine race) */
> > +        hbitmap_set(job->bitmap, start, 1);
> > +
> > +        DPRINTF("brdv_co_backup_cow C%" PRId64 "\n", start);
> > +
> > +        if (!bounce_buffer) {
> > +            iov.iov_len = BACKUP_CLUSTER_SIZE;
> > +            iov.iov_base = bounce_buffer = qemu_blockalign(bs, iov.iov_len);
> > +            qemu_iovec_init_external(&bounce_qiov, &iov, 1);
> > +        }
> > +
> > +        ret = drv->bdrv_co_readv(bs, start * BACKUP_BLOCKS_PER_CLUSTER,
> > +                                 BACKUP_BLOCKS_PER_CLUSTER,
> > +                                 &bounce_qiov);
> 
> Why do you bypass the block layer here?

Necessary because the request would deadlock due to
wait_for_overlapping_requests() being called - this internal read
overlaps with the guest's write.

Anyway, dropped in v3 since I'm using a different approach.

> > +        if (ret < 0) {
> > +            DPRINTF("brdv_co_backup_cow bdrv_read C%" PRId64 " failed\n",
> > +                    start);
> > +            goto out;
> > +        }
> > +
> > +        job->sectors_read += BACKUP_BLOCKS_PER_CLUSTER;
> > +
> > +        if (!buffer_is_zero(bounce_buffer, BACKUP_CLUSTER_SIZE)) {
> > +            ret = bdrv_co_writev(job->target, start * BACKUP_BLOCKS_PER_CLUSTER,
> > +                                 BACKUP_BLOCKS_PER_CLUSTER,
> > +                                 &bounce_qiov);
> > +            if (ret < 0) {
> > +                DPRINTF("brdv_co_backup_cow dump_cluster_cb C%" PRId64
> > +                        " failed\n", start);
> > +                goto out;
> > +            }
> > +        }
> > +
> > +        DPRINTF("brdv_co_backup_cow done C%" PRId64 "\n", start);
> > +    }
> > +
> > +out:
> > +    if (bounce_buffer) {
> > +        qemu_vfree(bounce_buffer);
> > +    }
> > +
> > +    qemu_co_rwlock_unlock(&job->rwlock);
> > +
> > +    return ret;
> > +}
> > +
> > +static int coroutine_fn backup_before_read(BlockDriverState *bs,
> > +                                           int64_t sector_num,
> > +                                           int nb_sectors, QEMUIOVector *qiov)
> > +{
> > +    return backup_do_cow(bs, sector_num, nb_sectors);
> > +}
> 
> Why do you do copy on _read_? Without actually making use of the read
> data like COR in block.c, but by reading the data a second time.
> 
> I mean, obviously this is the implementation of bdrv_co_backup(), but it
> doesn't make a lot of sense for guest requests. You could directly call
> into the function from below instead of going through block.c for
> bdrv_co_backup() and get rid of the fake requests.

Dropped in v3.

> > +static int coroutine_fn backup_before_write(BlockDriverState *bs,
> > +                                            int64_t sector_num,
> > +                                            int nb_sectors, QEMUIOVector *qiov)
> > +{
> > +    return backup_do_cow(bs, sector_num, nb_sectors);
> > +}
> > +
> > +static void backup_set_speed(BlockJob *job, int64_t speed, Error **errp)
> > +{
> > +    BackupBlockJob *s = container_of(job, BackupBlockJob, common);
> > +
> > +    if (speed < 0) {
> > +        error_set(errp, QERR_INVALID_PARAMETER, "speed");
> > +        return;
> > +    }
> > +    ratelimit_set_speed(&s->limit, speed / BDRV_SECTOR_SIZE, SLICE_TIME);
> > +}
> > +
> > +static BlockJobType backup_job_type = {
> > +    .instance_size = sizeof(BackupBlockJob),
> > +    .before_read = backup_before_read,
> > +    .before_write = backup_before_write,
> > +    .job_type = "backup",
> > +    .set_speed = backup_set_speed,
> > +};
> > +
> > +static void coroutine_fn backup_run(void *opaque)
> > +{
> > +    BackupBlockJob *job = opaque;
> > +    BlockDriverState *bs = job->common.bs;
> > +    assert(bs);
> > +
> > +    int64_t start, end;
> > +
> > +    start = 0;
> > +    end = DIV_ROUND_UP(bdrv_getlength(bs) / BDRV_SECTOR_SIZE,
> > +                       BACKUP_BLOCKS_PER_CLUSTER);
> > +
> > +    job->bitmap = hbitmap_alloc(end, 0);
> > +
> > +    DPRINTF("backup_run start %s %" PRId64 " %" PRId64 "\n",
> > +            bdrv_get_device_name(bs), start, end);
> > +
> > +    int ret = 0;
> > +
> > +    for (; start < end; start++) {
> > +        if (block_job_is_cancelled(&job->common)) {
> > +            ret = -1;
> > +            break;
> > +        }
> > +
> > +        /* we need to yield so that qemu_aio_flush() returns.
> > +         * (without, VM does not reboot)
> > +         */
> > +        if (job->common.speed) {
> > +            uint64_t delay_ns = ratelimit_calculate_delay(
> > +                &job->limit, job->sectors_read);
> > +            job->sectors_read = 0;
> > +            block_job_sleep_ns(&job->common, rt_clock, delay_ns);
> > +        } else {
> > +            block_job_sleep_ns(&job->common, rt_clock, 0);
> > +        }
> > +
> > +        if (block_job_is_cancelled(&job->common)) {
> > +            ret = -1;
> > +            break;
> > +        }
> > +
> > +        if (hbitmap_get(job->bitmap, start)) {
> > +            continue; /* already copied */
> > +        }
> 
> Isn't this duplicated and would be checked in the bdrv_co_backup() call
> below anyway?
> 
> The semantic difference is that job->common.offset wouldn't be increased
> with this additional check. Looks like a bug.

Thanks, fixed in v3.

> > +        DPRINTF("backup_run loop C%" PRId64 "\n", start);
> > +
> > +        /**
> > +         * This triggers a cluster copy
> > +         * Note: avoid direct call to brdv_co_backup_cow, because
> > +         * this does not call tracked_request_begin()
> > +         */
> > +        ret = bdrv_co_backup(bs, start*BACKUP_BLOCKS_PER_CLUSTER, 1);
> > +        if (ret < 0) {
> > +            break;
> > +        }
> > +        /* Publish progress */
> > +        job->common.offset += BACKUP_CLUSTER_SIZE;
> > +    }
> > +
> > +    /* wait until pending backup_do_cow()calls have completed */
> 
> Missing space.

Thanks, fixed in v3.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/3] block: add basic backup support to block driver
  2013-05-14 13:24     ` Stefan Hajnoczi
@ 2013-05-14 13:43       ` Kevin Wolf
  2013-05-14 15:12         ` Stefan Hajnoczi
  0 siblings, 1 reply; 27+ messages in thread
From: Kevin Wolf @ 2013-05-14 13:43 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Fam Zheng, qemu-devel, dietmar, imain, Stefan Hajnoczi, pbonzini,
	Wenchao Xia

Am 14.05.2013 um 15:24 hat Stefan Hajnoczi geschrieben:
> On Wed, May 08, 2013 at 02:39:25PM +0200, Kevin Wolf wrote:
> > Am 29.04.2013 um 09:42 hat Stefan Hajnoczi geschrieben:
> > > diff --git a/include/block/blockjob.h b/include/block/blockjob.h
> > > index c290d07..6f42495 100644
> > > --- a/include/block/blockjob.h
> > > +++ b/include/block/blockjob.h
> > > @@ -50,6 +50,13 @@ typedef struct BlockJobType {
> > >       * manually.
> > >       */
> > >      void (*complete)(BlockJob *job, Error **errp);
> > > +
> > > +    /** tracked requests */
> > > +    int coroutine_fn (*before_read)(BlockDriverState *bs, int64_t sector_num,
> > > +                                    int nb_sectors, QEMUIOVector *qiov);
> > > +    int coroutine_fn (*before_write)(BlockDriverState *bs, int64_t sector_num,
> > > +                                     int nb_sectors, QEMUIOVector *qiov);
> > > +
> > >  } BlockJobType;
> > 
> > This is actually a sign that a block job isn't the right tool. Jobs are
> > something that runs in the background and doesn't have callbacks. You
> > really want to have a filter here (that happens to be coupled to a job).
> > Need the BlockBackend split before we can do this right.
> > 
> > The second thing that this conflicts with is generalising block jobs to
> > generic background jobs.
> > 
> > Each hack like this that we accumulate makes it harder to get the real
> > thing eventually.
> 
> In v3 I added a slightly cleaner interface:
> 
> bdrv_set_before_write_cb(bs, backup_before_write);
> 
> This way the "before write" callback is not tied to block jobs and
> could be reused in the future.
> 
> The alternative is to create a BlockDriver that mostly delegates plus
> uses bdrv_swap().  The boilerplate involved in that is ugly though, so
> I'm using bdrv_set_before_write_cb() instead.

The clean implementation of filters is putting a BlockDriver on top, it
gives you flexibility to intercept anything and you can stack multiple
of them instead of having just one callback per BDS.

But I'm not sure if simply bdrv_swap is good enough. You would definitely
have to disable snapshots while the filter is active and there may be
more cases. This is the part that I think getting right is a bit more
complex and might need the BlockBackend split.

So you would vote to introduce bdrv_set_before_write_cb() now and later
change how everything works?

Kevin

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/3] block: add basic backup support to block driver
  2013-05-14 13:43       ` Kevin Wolf
@ 2013-05-14 15:12         ` Stefan Hajnoczi
  0 siblings, 0 replies; 27+ messages in thread
From: Stefan Hajnoczi @ 2013-05-14 15:12 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Fam Zheng, qemu-devel, Dietmar Maurer, imain, Stefan Hajnoczi,
	Paolo Bonzini, Wenchao Xia

On Tue, May 14, 2013 at 3:43 PM, Kevin Wolf <kwolf@redhat.com> wrote:
> Am 14.05.2013 um 15:24 hat Stefan Hajnoczi geschrieben:
>> On Wed, May 08, 2013 at 02:39:25PM +0200, Kevin Wolf wrote:
>> > Am 29.04.2013 um 09:42 hat Stefan Hajnoczi geschrieben:
>> > > diff --git a/include/block/blockjob.h b/include/block/blockjob.h
>> > > index c290d07..6f42495 100644
>> > > --- a/include/block/blockjob.h
>> > > +++ b/include/block/blockjob.h
>> > > @@ -50,6 +50,13 @@ typedef struct BlockJobType {
>> > >       * manually.
>> > >       */
>> > >      void (*complete)(BlockJob *job, Error **errp);
>> > > +
>> > > +    /** tracked requests */
>> > > +    int coroutine_fn (*before_read)(BlockDriverState *bs, int64_t sector_num,
>> > > +                                    int nb_sectors, QEMUIOVector *qiov);
>> > > +    int coroutine_fn (*before_write)(BlockDriverState *bs, int64_t sector_num,
>> > > +                                     int nb_sectors, QEMUIOVector *qiov);
>> > > +
>> > >  } BlockJobType;
>> >
>> > This is actually a sign that a block job isn't the right tool. Jobs are
>> > something that runs in the background and doesn't have callbacks. You
>> > really want to have a filter here (that happens to be coupled to a job).
>> > Need the BlockBackend split before we can do this right.
>> >
>> > The second thing that this conflicts with is generalising block jobs to
>> > generic background jobs.
>> >
>> > Each hack like this that we accumulate makes it harder to get the real
>> > thing eventually.
>>
>> In v3 I added a slightly cleaner interface:
>>
>> bdrv_set_before_write_cb(bs, backup_before_write);
>>
>> This way the "before write" callback is not tied to block jobs and
>> could be reused in the future.
>>
>> The alternative is to create a BlockDriver that mostly delegates plus
>> uses bdrv_swap().  The boilerplate involved in that is ugly though, so
>> I'm using bdrv_set_before_write_cb() instead.
>
> The clean implementation of filters is putting a BlockDriver on top, it
> gives you flexibility to intercept anything and you can stack multiple
> of them instead of having just one callback per BDS.
>
> But I'm not sure if simply bdrv_swap is good enough. You would definitely
> have to disable snapshots while the filter is active and there may be
> more cases. This is the part that I think getting right is a bit more
> complex and might need the BlockBackend split.
>
> So you would vote to introduce bdrv_set_before_write_cb() now and later
> change how everything works?

Yes, I'd like to use bdrv_set_before_write_cb().  That's mainly
because I'm working on dataplane for QEMU 1.6 and I know that Ian and
Fam want to build on drive-backup, so the sooner it gets upstream, the
better for them.

I know these are non-technical reasons.  bdrv_set_before_write_cb() is
simple enough to convert though.

Stefan

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/3] block: add basic backup support to block driver
  2013-04-29  7:42 ` [Qemu-devel] [PATCH v2 1/3] block: add basic backup support to block driver Stefan Hajnoczi
  2013-05-08 12:39   ` Kevin Wolf
@ 2013-05-30  3:37   ` Fam Zheng
  2013-05-30 12:27     ` Stefan Hajnoczi
  1 sibling, 1 reply; 27+ messages in thread
From: Fam Zheng @ 2013-05-30  3:37 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, qemu-devel, Wenchao Xia, imain, pbonzini, dietmar

On Mon, 04/29 09:42, Stefan Hajnoczi wrote:
> +
> +static void coroutine_fn backup_run(void *opaque)
> +{
> +    BackupBlockJob *job = opaque;
> +    BlockDriverState *bs = job->common.bs;
> +    assert(bs);
> +
> +    int64_t start, end;
> +
> +    start = 0;
> +    end = DIV_ROUND_UP(bdrv_getlength(bs) / BDRV_SECTOR_SIZE,
> +                       BACKUP_BLOCKS_PER_CLUSTER);
> +
> +    job->bitmap = hbitmap_alloc(end, 0);
> +
> +    DPRINTF("backup_run start %s %" PRId64 " %" PRId64 "\n",
> +            bdrv_get_device_name(bs), start, end);
> +
> +    int ret = 0;
> +
> +    for (; start < end; start++) {
> +        if (block_job_is_cancelled(&job->common)) {
> +            ret = -1;
> +            break;
> +        }
> +
> +        /* we need to yield so that qemu_aio_flush() returns.
> +         * (without, VM does not reboot)
> +         */
> +        if (job->common.speed) {
> +            uint64_t delay_ns = ratelimit_calculate_delay(
> +                &job->limit, job->sectors_read);
> +            job->sectors_read = 0;
> +            block_job_sleep_ns(&job->common, rt_clock, delay_ns);
> +        } else {
> +            block_job_sleep_ns(&job->common, rt_clock, 0);
> +        }
> +
> +        if (block_job_is_cancelled(&job->common)) {
> +            ret = -1;
> +            break;
> +        }
> +
> +        if (hbitmap_get(job->bitmap, start)) {
> +            continue; /* already copied */
> +        }
> +
> +        DPRINTF("backup_run loop C%" PRId64 "\n", start);
> +
> +        /**
> +         * This triggers a cluster copy
> +         * Note: avoid direct call to brdv_co_backup_cow, because
> +         * this does not call tracked_request_begin()
> +         */
> +        ret = bdrv_co_backup(bs, start*BACKUP_BLOCKS_PER_CLUSTER, 1);
> +        if (ret < 0) {
> +            break;
> +        }
> +        /* Publish progress */
> +        job->common.offset += BACKUP_CLUSTER_SIZE;
> +    }
> +
> +    /* wait until pending backup_do_cow()calls have completed */
> +    qemu_co_rwlock_wrlock(&job->rwlock);
> +    qemu_co_rwlock_unlock(&job->rwlock);
> +
> +    hbitmap_free(job->bitmap);
> +
> +    bdrv_delete(job->target);
> +
> +    DPRINTF("backup_run complete %d\n", ret);
> +    block_job_completed(&job->common, ret);
> +}

What will ret value be when the source block is not aligned to cluster
size?

Fam

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/3] block: add basic backup support to block driver
  2013-05-30  3:37   ` Fam Zheng
@ 2013-05-30 12:27     ` Stefan Hajnoczi
  0 siblings, 0 replies; 27+ messages in thread
From: Stefan Hajnoczi @ 2013-05-30 12:27 UTC (permalink / raw)
  To: Stefan Hajnoczi, qemu-devel, Kevin Wolf, imain, Dietmar Maurer,
	Paolo Bonzini, Eric Blake, Wenchao Xia

On Thu, May 30, 2013 at 5:37 AM, Fam Zheng <famz@redhat.com> wrote:
> On Mon, 04/29 09:42, Stefan Hajnoczi wrote:
>> +
>> +static void coroutine_fn backup_run(void *opaque)
>> +{
>> +    BackupBlockJob *job = opaque;
>> +    BlockDriverState *bs = job->common.bs;
>> +    assert(bs);
>> +
>> +    int64_t start, end;
>> +
>> +    start = 0;
>> +    end = DIV_ROUND_UP(bdrv_getlength(bs) / BDRV_SECTOR_SIZE,
>> +                       BACKUP_BLOCKS_PER_CLUSTER);
>> +
>> +    job->bitmap = hbitmap_alloc(end, 0);
>> +
>> +    DPRINTF("backup_run start %s %" PRId64 " %" PRId64 "\n",
>> +            bdrv_get_device_name(bs), start, end);
>> +
>> +    int ret = 0;
>> +
>> +    for (; start < end; start++) {
>> +        if (block_job_is_cancelled(&job->common)) {
>> +            ret = -1;
>> +            break;
>> +        }
>> +
>> +        /* we need to yield so that qemu_aio_flush() returns.
>> +         * (without, VM does not reboot)
>> +         */
>> +        if (job->common.speed) {
>> +            uint64_t delay_ns = ratelimit_calculate_delay(
>> +                &job->limit, job->sectors_read);
>> +            job->sectors_read = 0;
>> +            block_job_sleep_ns(&job->common, rt_clock, delay_ns);
>> +        } else {
>> +            block_job_sleep_ns(&job->common, rt_clock, 0);
>> +        }
>> +
>> +        if (block_job_is_cancelled(&job->common)) {
>> +            ret = -1;
>> +            break;
>> +        }
>> +
>> +        if (hbitmap_get(job->bitmap, start)) {
>> +            continue; /* already copied */
>> +        }
>> +
>> +        DPRINTF("backup_run loop C%" PRId64 "\n", start);
>> +
>> +        /**
>> +         * This triggers a cluster copy
>> +         * Note: avoid direct call to brdv_co_backup_cow, because
>> +         * this does not call tracked_request_begin()
>> +         */
>> +        ret = bdrv_co_backup(bs, start*BACKUP_BLOCKS_PER_CLUSTER, 1);
>> +        if (ret < 0) {
>> +            break;
>> +        }
>> +        /* Publish progress */
>> +        job->common.offset += BACKUP_CLUSTER_SIZE;
>> +    }
>> +
>> +    /* wait until pending backup_do_cow()calls have completed */
>> +    qemu_co_rwlock_wrlock(&job->rwlock);
>> +    qemu_co_rwlock_unlock(&job->rwlock);
>> +
>> +    hbitmap_free(job->bitmap);
>> +
>> +    bdrv_delete(job->target);
>> +
>> +    DPRINTF("backup_run complete %d\n", ret);
>> +    block_job_completed(&job->common, ret);
>> +}
>
> What will ret value be when the source block is not aligned to cluster
> size?

Good catch, will fix.

Stefan

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2013-05-30 12:27 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-04-29  7:42 [Qemu-devel] [PATCH v2 0/3] block: block-backup live backup command Stefan Hajnoczi
2013-04-29  7:42 ` [Qemu-devel] [PATCH v2 1/3] block: add basic backup support to block driver Stefan Hajnoczi
2013-05-08 12:39   ` Kevin Wolf
2013-05-08 15:43     ` Paolo Bonzini
2013-05-14  8:51     ` Stefan Hajnoczi
2013-05-14 13:24     ` Stefan Hajnoczi
2013-05-14 13:43       ` Kevin Wolf
2013-05-14 15:12         ` Stefan Hajnoczi
2013-05-30  3:37   ` Fam Zheng
2013-05-30 12:27     ` Stefan Hajnoczi
2013-04-29  7:42 ` [Qemu-devel] [PATCH v2 2/3] block: add block-backup QMP command Stefan Hajnoczi
2013-05-08 12:49   ` Kevin Wolf
2013-05-11  3:34     ` Eric Blake
2013-05-11  8:02       ` Paolo Bonzini
2013-05-13  8:23       ` Kevin Wolf
2013-05-14  8:44     ` Stefan Hajnoczi
2013-05-11  4:02   ` Eric Blake
2013-05-13  8:28     ` Kevin Wolf
2013-05-13 12:56       ` Eric Blake
2013-05-13 13:09         ` Kevin Wolf
2013-05-13 13:18           ` Luiz Capitulino
2013-05-13 14:14             ` Eric Blake
2013-05-13 14:27               ` Kevin Wolf
2013-05-13 14:50                 ` Eric Blake
2013-05-14  2:18                   ` Wenchao Xia
2013-05-14  8:48     ` Stefan Hajnoczi
2013-04-29  7:42 ` [Qemu-devel] [PATCH v2 3/3] qemu-iotests: add 054 block-backup test case Stefan Hajnoczi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).