* [Qemu-devel] [PATCH v2 1/6] RFC: Efficient VM backup for qemu
@ 2012-11-30 9:22 Dietmar Maurer
2012-11-30 9:22 ` [Qemu-devel] [PATCH v2 2/6] add basic backup support to block driver Dietmar Maurer
` (4 more replies)
0 siblings, 5 replies; 9+ messages in thread
From: Dietmar Maurer @ 2012-11-30 9:22 UTC (permalink / raw)
To: qemu-devel; +Cc: Dietmar Maurer
This series provides a way to efficiently backup VMs.
* Backup to a single archive file
* Backup contain all data to restore VM (full backup)
* Do not depend on storage type or image format
* Avoid use of temporary storage
* store sparse images efficiently
The file docs/backup-rfc.txt contains more details.
Changes since v1:
* fix spelling errors
* move BackupInfo from BDS to BackupBlockJob
* introduce BackupDriver to allow more than one backup format
* vma: add suport to store vmstate (size is not known in advance)
* add ability to store VM state
Signed-off-by: Dietmar Maurer <dietmar@proxmox.com>
---
docs/backup-rfc.txt | 119 +++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 119 insertions(+), 0 deletions(-)
create mode 100644 docs/backup-rfc.txt
diff --git a/docs/backup-rfc.txt b/docs/backup-rfc.txt
new file mode 100644
index 0000000..5b4b3df
--- /dev/null
+++ b/docs/backup-rfc.txt
@@ -0,0 +1,119 @@
+RFC: Efficient VM backup for qemu
+
+=Requirements=
+
+* Backup to a single archive file
+* Backup needs to contain all data to restore VM (full backup)
+* Do not depend on storage type or image format
+* Avoid use of temporary storage
+* store sparse images efficiently
+
+=Introduction=
+
+Most VM backup solutions use some kind of snapshot to get a consistent
+VM view at a specific point in time. For example, we previously used
+LVM to create a snapshot of all used VM images, which are then copied
+into a tar file.
+
+That basically means that any data written during backup involve
+considerable overhead. For LVM we get the following steps:
+
+1.) read original data (VM write)
+2.) write original data into snapshot (VM write)
+3.) write new data (VM write)
+4.) read data from snapshot (backup)
+5.) write data from snapshot into tar file (backup)
+
+Another approach to backup VM images is to create a new qcow2 image
+which use the old image as base. During backup, writes are redirected
+to the new image, so the old image represents a 'snapshot'. After
+backup, data need to be copied back from new image into the old
+one (commit). So a simple write during backup triggers the following
+steps:
+
+1.) write new data to new image (VM write)
+2.) read data from old image (backup)
+3.) write data from old image into tar file (backup)
+
+4.) read data from new image (commit)
+5.) write data to old image (commit)
+
+This is in fact the same overhead as before. Other tools like qemu
+livebackup produces similar overhead (2 reads, 3 writes).
+
+Some storage types/formats supports internal snapshots using some kind
+of reference counting (rados, sheepdog, dm-thin, qcow2). It would be possible
+to use that for backups, but for now we want to be storage-independent.
+
+Note: It turned out that taking a qcow2 snapshot can take a very long
+time on larger files.
+
+=Make it more efficient=
+
+The be more efficient, we simply need to avoid unnecessary steps. The
+following steps are always required:
+
+1.) read old data before it gets overwritten
+2.) write that data into the backup archive
+3.) write new data (VM write)
+
+As you can see, this involves only one read, an two writes.
+
+To make that work, our backup archive need to be able to store image
+data 'out of order'. It is important to notice that this will not work
+with traditional archive formats like tar.
+
+During backup we simply intercept writes, then read existing data and
+store that directly into the archive. After that we can continue the
+write.
+
+==Advantages==
+
+* very good performance (1 read, 2 writes)
+* works on any storage type and image format.
+* avoid usage of temporary storage
+* we can define a new and simple archive format, which is able to
+ store sparse files efficiently.
+
+Note: Storing sparse files is a mess with existing archive
+formats. For example, tar requires information about holes at the
+beginning of the archive.
+
+==Disadvantages==
+
+* we need to define a new archive format
+
+Note: Most existing archive formats are optimized to store small files
+including file attributes. We simply do not need that for VM archives.
+
+* archive contains data 'out of order'
+
+If you want to access image data in sequential order, you need to
+re-order archive data. It would be possible to to that on the fly,
+using temporary files.
+
+Fortunately, a normal restore/extract works perfectly with 'out of
+order' data, because the target files are seekable.
+
+* slow backup storage can slow down VM during backup
+
+It is important to note that we only do sequential writes to the
+backup storage. Furthermore one can compress the backup stream. IMHO,
+it is better to slow down the VM a bit. All other solutions creates
+large amounts of temporary data during backup.
+
+=Archive format requirements=
+
+The basic requirement for such new format is that we can store image
+date 'out of order'. It is also very likely that we have less than 256
+drives/images per VM, and we want to be able to store VM configuration
+files.
+
+We have defined a very simply format with those properties, see:
+
+docs/specs/vma_spec.txt
+
+Please let us know if you know an existing format which provides the
+same functionality.
+
+
--
1.7.2.5
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [Qemu-devel] [PATCH v2 2/6] add basic backup support to block driver
2012-11-30 9:22 [Qemu-devel] [PATCH v2 1/6] RFC: Efficient VM backup for qemu Dietmar Maurer
@ 2012-11-30 9:22 ` Dietmar Maurer
2012-11-30 9:22 ` [Qemu-devel] [PATCH v2 3/6] add backup related monitor commands Dietmar Maurer
` (3 subsequent siblings)
4 siblings, 0 replies; 9+ messages in thread
From: Dietmar Maurer @ 2012-11-30 9:22 UTC (permalink / raw)
To: qemu-devel; +Cc: Dietmar Maurer
Function backup_job_start() creates a block job to backup a block device.
We call backup_do_cow() for each write during backup. That function
reads the original data and pass it to backup_dump_cb().
The tracked_request infrastructure is used to serialize access.
Currently backup cluster size is hardcoded to 65536 bytes.
Signed-off-by: Dietmar Maurer <dietmar@proxmox.com>
---
Makefile.objs | 1 +
backup.c | 302 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
backup.h | 30 ++++++
block.c | 71 ++++++++++++-
block.h | 2 +
blockjob.h | 10 ++
6 files changed, 410 insertions(+), 6 deletions(-)
create mode 100644 backup.c
create mode 100644 backup.h
diff --git a/Makefile.objs b/Makefile.objs
index 3c7abca..cb46be5 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -48,6 +48,7 @@ coroutine-obj-$(CONFIG_WIN32) += coroutine-win32.o
block-obj-y = iov.o cache-utils.o qemu-option.o module.o async.o
block-obj-y += nbd.o block.o blockjob.o aes.o qemu-config.o
block-obj-y += thread-pool.o qemu-progress.o qemu-sockets.o uri.o notify.o
+block-obj-y += backup.o
block-obj-y += $(coroutine-obj-y) $(qobject-obj-y) $(version-obj-y)
block-obj-$(CONFIG_POSIX) += event_notifier-posix.o aio-posix.o
block-obj-$(CONFIG_WIN32) += event_notifier-win32.o aio-win32.o
diff --git a/backup.c b/backup.c
new file mode 100644
index 0000000..6a44974
--- /dev/null
+++ b/backup.c
@@ -0,0 +1,302 @@
+/*
+ * QEMU backup
+ *
+ * Copyright (C) 2012 Proxmox Server Solutions
+ *
+ * Authors:
+ * Dietmar Maurer (dietmar@proxmox.com)
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include <stdio.h>
+#include <errno.h>
+#include <unistd.h>
+
+#include "block.h"
+#include "block_int.h"
+#include "blockjob.h"
+#include "backup.h"
+
+#define DEBUG_BACKUP 0
+
+#define DPRINTF(fmt, ...) \
+ do { if (DEBUG_BACKUP) { printf("backup: " fmt, ## __VA_ARGS__); } } \
+ while (0)
+
+
+#define BITS_PER_LONG (sizeof(unsigned long) * 8)
+
+typedef struct BackupBlockJob {
+ BlockJob common;
+ unsigned long *bitmap;
+ int bitmap_size;
+ BackupDumpFunc *backup_dump_cb;
+ BlockDriverCompletionFunc *backup_complete_cb;
+ void *opaque;
+} BackupBlockJob;
+
+static int backup_get_bitmap(BlockDriverState *bs, int64_t cluster_num)
+{
+ assert(bs);
+ BackupBlockJob *job = (BackupBlockJob *)bs->job;
+ assert(job);
+ assert(job->bitmap);
+
+ unsigned long val, idx, bit;
+
+ idx = cluster_num / BITS_PER_LONG;
+
+ assert(job->bitmap_size > idx);
+
+ bit = cluster_num % BITS_PER_LONG;
+ val = job->bitmap[idx];
+
+ return !!(val & (1UL << bit));
+}
+
+static void backup_set_bitmap(BlockDriverState *bs, int64_t cluster_num,
+ int dirty)
+{
+ assert(bs);
+ BackupBlockJob *job = (BackupBlockJob *)bs->job;
+ assert(job);
+ assert(job->bitmap);
+
+ unsigned long val, idx, bit;
+
+ idx = cluster_num / BITS_PER_LONG;
+
+ assert(job->bitmap_size > idx);
+
+ bit = cluster_num % BITS_PER_LONG;
+ val = job->bitmap[idx];
+ if (dirty) {
+ if (!(val & (1UL << bit))) {
+ val |= 1UL << bit;
+ }
+ } else {
+ if (val & (1UL << bit)) {
+ val &= ~(1UL << bit);
+ }
+ }
+ job->bitmap[idx] = val;
+}
+
+static int backup_in_progress_count;
+
+static int coroutine_fn backup_do_cow(BlockDriverState *bs,
+ int64_t sector_num, int nb_sectors)
+{
+ assert(bs);
+ BackupBlockJob *job = (BackupBlockJob *)bs->job;
+ assert(job);
+
+ BlockDriver *drv = bs->drv;
+ struct iovec iov;
+ QEMUIOVector bounce_qiov;
+ void *bounce_buffer = NULL;
+ int ret = 0;
+
+ backup_in_progress_count++;
+
+ int64_t start, end;
+
+ start = sector_num / BACKUP_BLOCKS_PER_CLUSTER;
+ end = (sector_num + nb_sectors + BACKUP_BLOCKS_PER_CLUSTER - 1) /
+ BACKUP_BLOCKS_PER_CLUSTER;
+
+ DPRINTF("brdv_co_backup_cow enter %s C%zd %zd %d\n",
+ bdrv_get_device_name(bs), start, sector_num, nb_sectors);
+
+ for (; start < end; start++) {
+ if (backup_get_bitmap(bs, start)) {
+ DPRINTF("brdv_co_backup_cow skip C%zd\n", start);
+ continue; /* already copied */
+ }
+
+ /* immediately set bitmap (avoid coroutine race) */
+ backup_set_bitmap(bs, start, 1);
+
+ DPRINTF("brdv_co_backup_cow C%zd\n", start);
+
+ if (!bounce_buffer) {
+ iov.iov_len = BACKUP_CLUSTER_SIZE;
+ iov.iov_base = bounce_buffer = qemu_blockalign(bs, iov.iov_len);
+ qemu_iovec_init_external(&bounce_qiov, &iov, 1);
+ }
+
+ ret = drv->bdrv_co_readv(bs, start * BACKUP_BLOCKS_PER_CLUSTER,
+ BACKUP_BLOCKS_PER_CLUSTER,
+ &bounce_qiov);
+ if (ret < 0) {
+ DPRINTF("brdv_co_backup_cow bdrv_read C%zd failed\n", start);
+ goto out;
+ }
+
+ ret = job->backup_dump_cb(job->opaque, bs, start, bounce_buffer);
+ if (ret < 0) {
+ DPRINTF("brdv_co_backup_cow dump_cluster_cb C%zd failed\n", start);
+ goto out;
+ }
+
+ DPRINTF("brdv_co_backup_cow done C%zd\n", start);
+ }
+
+out:
+ if (bounce_buffer) {
+ qemu_vfree(bounce_buffer);
+ }
+
+ backup_in_progress_count--;
+
+ return ret;
+}
+
+static int coroutine_fn backup_before_read(BlockDriverState *bs,
+ int64_t sector_num,
+ int nb_sectors, QEMUIOVector *qiov)
+{
+ return backup_do_cow(bs, sector_num, nb_sectors);
+}
+
+static int coroutine_fn backup_before_write(BlockDriverState *bs,
+ int64_t sector_num,
+ int nb_sectors, QEMUIOVector *qiov)
+{
+ return backup_do_cow(bs, sector_num, nb_sectors);
+}
+
+
+static BlockJobType backup_job_type = {
+ .instance_size = sizeof(BackupBlockJob),
+ .before_read = backup_before_read,
+ .before_write = backup_before_write,
+ .job_type = "backup",
+};
+
+static void coroutine_fn backup_run(void *opaque)
+{
+ BackupBlockJob *job = opaque;
+ BlockDriverState *bs = job->common.bs;
+ assert(bs);
+
+ int64_t start, end;
+
+ start = 0;
+ end = (bs->total_sectors + BACKUP_BLOCKS_PER_CLUSTER - 1) /
+ BACKUP_BLOCKS_PER_CLUSTER;
+
+ DPRINTF("backup_run start %s %zd %zd\n", bdrv_get_device_name(bs),
+ start, end);
+
+ int ret = 0;
+
+ for (; start < end; start++) {
+ if (block_job_is_cancelled(&job->common)) {
+ ret = -1;
+ break;
+ }
+
+ if (backup_get_bitmap(bs, start)) {
+ continue; /* already copied */
+ }
+
+ /* we need to yield so that qemu_aio_flush() returns.
+ * (without, VM does not reboot)
+ * todo: can we avoid that?
+ */
+ co_sleep_ns(rt_clock, 0);
+ if (block_job_is_cancelled(&job->common)) {
+ ret = -1;
+ break;
+ }
+ DPRINTF("backup_run loop C%zd\n", start);
+
+ /**
+ * This triggers a cluster copy
+ * Note: avoid direct call to brdv_co_backup_cow, because
+ * this does not call tracked_request_begin()
+ */
+ ret = bdrv_co_backup(bs, start*BACKUP_BLOCKS_PER_CLUSTER, 1);
+ if (ret < 0) {
+ break;
+ }
+ /* Publish progress */
+ job->common.offset += BACKUP_CLUSTER_SIZE;
+ }
+
+ while (backup_in_progress_count > 0) {
+ DPRINTF("backup_run backup_in_progress_count != 0 (%d)",
+ backup_in_progress_count);
+ co_sleep_ns(rt_clock, 10000);
+ }
+
+ DPRINTF("backup_run complete %d\n", ret);
+ block_job_completed(&job->common, ret);
+}
+
+static void backup_job_cleanup_cb(void *opaque, int ret)
+{
+ BlockDriverState *bs = opaque;
+ assert(bs);
+ BackupBlockJob *job = (BackupBlockJob *)bs->job;
+ assert(job);
+
+ DPRINTF("backup_job_cleanup_cb start %d\n", ret);
+
+ job->backup_complete_cb(job->opaque, ret);
+
+ DPRINTF("backup_job_cleanup_cb end\n");
+
+ g_free(job->bitmap);
+}
+
+int
+backup_job_start(BlockDriverState *bs, BackupDumpFunc *backup_dump_cb,
+ BlockDriverCompletionFunc *backup_complete_cb,
+ void *opaque)
+{
+ assert(bs);
+ assert(backup_dump_cb);
+ assert(backup_complete_cb);
+
+ if (bs->job) {
+ DPRINTF("bdrv_backup_init failed - running job on %s\n",
+ bdrv_get_device_name(bs));
+ return -1;
+ }
+
+ int64_t bitmap_size;
+ const char *devname = bdrv_get_device_name(bs);
+
+ if (!devname || !devname[0]) {
+ return -1;
+ }
+
+ DPRINTF("bdrv_backup_init %s\n", bdrv_get_device_name(bs));
+
+ Error *errp;
+ BackupBlockJob *job = block_job_create(&backup_job_type, bs, 0,
+ backup_job_cleanup_cb, bs, &errp);
+
+ job->common.cluster_size = BACKUP_CLUSTER_SIZE;
+
+ bitmap_size = bs->total_sectors +
+ BACKUP_BLOCKS_PER_CLUSTER * BITS_PER_LONG - 1;
+ bitmap_size /= BACKUP_BLOCKS_PER_CLUSTER * BITS_PER_LONG;
+
+ job->backup_dump_cb = backup_dump_cb;
+ job->backup_complete_cb = backup_complete_cb;
+ job->opaque = opaque;
+ job->bitmap_size = bitmap_size;
+ job->bitmap = g_new0(unsigned long, bitmap_size);
+
+ job->common.len = bs->total_sectors*BDRV_SECTOR_SIZE;
+ job->common.co = qemu_coroutine_create(backup_run);
+ qemu_coroutine_enter(job->common.co, job);
+
+ return 0;
+}
diff --git a/backup.h b/backup.h
new file mode 100644
index 0000000..e1f0290
--- /dev/null
+++ b/backup.h
@@ -0,0 +1,30 @@
+/*
+ * QEMU backup related definitions
+ *
+ * Copyright (C) Proxmox Server Solutions
+ *
+ * Authors:
+ * Dietmar Maurer (dietmar@proxmox.com)
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#ifndef QEMU_BACKUP_H
+#define QEMU_BACKUP_H
+
+#include <uuid/uuid.h>
+
+#define BACKUP_CLUSTER_BITS 16
+#define BACKUP_CLUSTER_SIZE (1<<BACKUP_CLUSTER_BITS)
+#define BACKUP_BLOCKS_PER_CLUSTER (BACKUP_CLUSTER_SIZE/BDRV_SECTOR_SIZE)
+
+typedef int BackupDumpFunc(void *opaque, BlockDriverState *bs,
+ int64_t cluster_num, unsigned char *buf);
+
+int backup_job_start(BlockDriverState *bs, BackupDumpFunc *backup_dump_cb,
+ BlockDriverCompletionFunc *backup_complete_cb,
+ void *opaque);
+
+#endif /* QEMU_BACKUP_H */
diff --git a/block.c b/block.c
index c05875f..2f7c2eb 100644
--- a/block.c
+++ b/block.c
@@ -54,6 +54,7 @@
typedef enum {
BDRV_REQ_COPY_ON_READ = 0x1,
BDRV_REQ_ZERO_WRITE = 0x2,
+ BDRV_REQ_BACKUP_ONLY = 0x4,
} BdrvRequestFlags;
static void bdrv_dev_change_media_cb(BlockDriverState *bs, bool load);
@@ -1542,7 +1543,7 @@ int bdrv_commit(BlockDriverState *bs)
if (!drv)
return -ENOMEDIUM;
-
+
if (!bs->backing_hd) {
return -ENOTSUP;
}
@@ -1679,6 +1680,22 @@ static void round_to_clusters(BlockDriverState *bs,
}
}
+/**
+ * Round a region to job cluster boundaries
+ */
+static void round_to_job_clusters(BlockDriverState *bs,
+ int64_t sector_num, int nb_sectors,
+ int job_cluster_size,
+ int64_t *cluster_sector_num,
+ int *cluster_nb_sectors)
+{
+ int64_t c = job_cluster_size/BDRV_SECTOR_SIZE;
+
+ *cluster_sector_num = QEMU_ALIGN_DOWN(sector_num, c);
+ *cluster_nb_sectors = QEMU_ALIGN_UP(sector_num - *cluster_sector_num +
+ nb_sectors, c);
+}
+
static bool tracked_request_overlaps(BdrvTrackedRequest *req,
int64_t sector_num, int nb_sectors) {
/* aaaa bbbb */
@@ -1693,7 +1710,9 @@ static bool tracked_request_overlaps(BdrvTrackedRequest *req,
}
static void coroutine_fn wait_for_overlapping_requests(BlockDriverState *bs,
- int64_t sector_num, int nb_sectors)
+ int64_t sector_num,
+ int nb_sectors,
+ int job_cluster_size)
{
BdrvTrackedRequest *req;
int64_t cluster_sector_num;
@@ -1709,6 +1728,11 @@ static void coroutine_fn wait_for_overlapping_requests(BlockDriverState *bs,
round_to_clusters(bs, sector_num, nb_sectors,
&cluster_sector_num, &cluster_nb_sectors);
+ if (job_cluster_size) {
+ round_to_job_clusters(bs, sector_num, nb_sectors, job_cluster_size,
+ &cluster_sector_num, &cluster_nb_sectors);
+ }
+
do {
retry = false;
QLIST_FOREACH(req, &bs->tracked_requests, list) {
@@ -2278,12 +2302,24 @@ static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs,
bs->copy_on_read_in_flight++;
}
- if (bs->copy_on_read_in_flight) {
- wait_for_overlapping_requests(bs, sector_num, nb_sectors);
+ int job_cluster_size = bs->job && bs->job->cluster_size ?
+ bs->job->cluster_size : 0;
+
+ if (bs->copy_on_read_in_flight || job_cluster_size) {
+ wait_for_overlapping_requests(bs, sector_num, nb_sectors,
+ job_cluster_size);
}
tracked_request_begin(&req, bs, sector_num, nb_sectors, false);
+ if (bs->job && bs->job->job_type->before_read) {
+ ret = bs->job->job_type->before_read(bs, sector_num, nb_sectors, qiov);
+ if (flags & BDRV_REQ_BACKUP_ONLY) {
+ /* Note: We do not return any data to the caller */
+ goto out;
+ }
+ }
+
if (flags & BDRV_REQ_COPY_ON_READ) {
int pnum;
@@ -2327,6 +2363,17 @@ int coroutine_fn bdrv_co_copy_on_readv(BlockDriverState *bs,
BDRV_REQ_COPY_ON_READ);
}
+int coroutine_fn bdrv_co_backup(BlockDriverState *bs,
+ int64_t sector_num, int nb_sectors)
+{
+ if (!bs->job) {
+ return -ENOTSUP;
+ }
+
+ return bdrv_co_do_readv(bs, sector_num, nb_sectors, NULL,
+ BDRV_REQ_BACKUP_ONLY);
+}
+
static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs,
int64_t sector_num, int nb_sectors)
{
@@ -2384,12 +2431,23 @@ static int coroutine_fn bdrv_co_do_writev(BlockDriverState *bs,
bdrv_io_limits_intercept(bs, true, nb_sectors);
}
- if (bs->copy_on_read_in_flight) {
- wait_for_overlapping_requests(bs, sector_num, nb_sectors);
+ int job_cluster_size = bs->job && bs->job->cluster_size ?
+ bs->job->cluster_size : 0;
+
+ if (bs->copy_on_read_in_flight || job_cluster_size) {
+ wait_for_overlapping_requests(bs, sector_num, nb_sectors,
+ job_cluster_size);
}
tracked_request_begin(&req, bs, sector_num, nb_sectors, true);
+ if (bs->job && bs->job->job_type->before_write) {
+ ret = bs->job->job_type->before_write(bs, sector_num, nb_sectors, qiov);
+ if (ret < 0) {
+ goto out;
+ }
+ }
+
if (flags & BDRV_REQ_ZERO_WRITE) {
ret = bdrv_co_do_write_zeroes(bs, sector_num, nb_sectors);
} else {
@@ -2408,6 +2466,7 @@ static int coroutine_fn bdrv_co_do_writev(BlockDriverState *bs,
bs->wr_highest_sector = sector_num + nb_sectors - 1;
}
+out:
tracked_request_end(&req);
return ret;
diff --git a/block.h b/block.h
index 722c620..94e5903 100644
--- a/block.h
+++ b/block.h
@@ -172,6 +172,8 @@ int coroutine_fn bdrv_co_readv(BlockDriverState *bs, int64_t sector_num,
int nb_sectors, QEMUIOVector *qiov);
int coroutine_fn bdrv_co_copy_on_readv(BlockDriverState *bs,
int64_t sector_num, int nb_sectors, QEMUIOVector *qiov);
+int coroutine_fn bdrv_co_backup(BlockDriverState *bs,
+ int64_t sector_num, int nb_sectors);
int coroutine_fn bdrv_co_writev(BlockDriverState *bs, int64_t sector_num,
int nb_sectors, QEMUIOVector *qiov);
/*
diff --git a/blockjob.h b/blockjob.h
index 3792b73..6621173 100644
--- a/blockjob.h
+++ b/blockjob.h
@@ -50,6 +50,13 @@ typedef struct BlockJobType {
* manually.
*/
void (*complete)(BlockJob *job, Error **errp);
+
+ /** tracked requests */
+ int coroutine_fn (*before_read)(BlockDriverState *bs, int64_t sector_num,
+ int nb_sectors, QEMUIOVector *qiov);
+ int coroutine_fn (*before_write)(BlockDriverState *bs, int64_t sector_num,
+ int nb_sectors, QEMUIOVector *qiov);
+
} BlockJobType;
/**
@@ -103,6 +110,9 @@ struct BlockJob {
/** Speed that was set with @block_job_set_speed. */
int64_t speed;
+ /** tracked requests */
+ int cluster_size;
+
/** The completion function that will be called when the job completes. */
BlockDriverCompletionFunc *cb;
--
1.7.2.5
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [Qemu-devel] [PATCH v2 3/6] add backup related monitor commands
2012-11-30 9:22 [Qemu-devel] [PATCH v2 1/6] RFC: Efficient VM backup for qemu Dietmar Maurer
2012-11-30 9:22 ` [Qemu-devel] [PATCH v2 2/6] add basic backup support to block driver Dietmar Maurer
@ 2012-11-30 9:22 ` Dietmar Maurer
2012-12-12 17:47 ` Eric Blake
2012-11-30 9:22 ` [Qemu-devel] [PATCH v2 4/6] introduce new vma archive format Dietmar Maurer
` (2 subsequent siblings)
4 siblings, 1 reply; 9+ messages in thread
From: Dietmar Maurer @ 2012-11-30 9:22 UTC (permalink / raw)
To: qemu-devel; +Cc: Dietmar Maurer
We use a generic BackupDriver struct to encaplulated all archive format
related function.
Another option would be to simply dump <devid,cluster_num,cluster_data> to
the output fh (pipe), and an external binary saves the data. That way we
could move the whole archive format related code out of qemu.
Signed-off-by: Dietmar Maurer <dietmar@proxmox.com>
---
backup.h | 14 +++
blockdev.c | 337 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
hmp-commands.hx | 31 +++++
hmp.c | 63 ++++++++++
hmp.h | 3 +
monitor.c | 7 +
qapi-schema.json | 81 +++++++++++++
qmp-commands.hx | 27 +++++
8 files changed, 563 insertions(+), 0 deletions(-)
diff --git a/backup.h b/backup.h
index e1f0290..881353f 100644
--- a/backup.h
+++ b/backup.h
@@ -27,4 +27,18 @@ int backup_job_start(BlockDriverState *bs, BackupDumpFunc *backup_dump_cb,
BlockDriverCompletionFunc *backup_complete_cb,
void *opaque);
+typedef struct BackupDriver {
+ const char *format;
+ void *(*open_cb)(const char *filename, uuid_t uuid, int64_t speed,
+ Error **errp);
+ int (*close_cb)(void *opaque, Error **errp);
+ int (*register_config_cb)(void *opaque, const char *name, gpointer data,
+ size_t data_len);
+ int (*register_stream_cb)(void *opaque, const char *devname, size_t size);
+ int (*dump_cb)(void *opaque, uint8_t dev_id, int64_t cluster_num,
+ unsigned char *buf, size_t *zero_bytes);
+ int (*complete_cb)(void *opaque, uint8_t dev_id, int ret);
+ void (*cancel_cb)(void *opaque);
+} BackupDriver;
+
#endif /* QEMU_BACKUP_H */
diff --git a/blockdev.c b/blockdev.c
index e73fd6e..a64490e 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -20,6 +20,7 @@
#include "qmp-commands.h"
#include "trace.h"
#include "arch_init.h"
+#include "backup.h"
static QTAILQ_HEAD(drivelist, DriveInfo) drives = QTAILQ_HEAD_INITIALIZER(drives);
@@ -1321,6 +1322,342 @@ void qmp_drive_mirror(const char *device, const char *target,
drive_get_ref(drive_get_by_blockdev(bs));
}
+/* Backup related function */
+
+static struct GenericBackupState {
+ Error *error;
+ uuid_t uuid;
+ char uuid_str[37];
+ time_t start_time;
+ time_t end_time;
+ char *backupfile;
+ const BackupDriver *driver;
+ void *writer;
+ GList *bcb_list;
+ size_t total;
+ size_t transferred;
+ size_t zero_bytes;
+} backup_state;
+
+typedef struct BackupCB {
+ BlockDriverState *bs;
+ uint8_t dev_id;
+ size_t size;
+ size_t transferred;
+ size_t zero_bytes;
+} BackupCB;
+
+static int backup_dump_cb(void *opaque, BlockDriverState *bs,
+ int64_t cluster_num, unsigned char *buf)
+{
+ BackupCB *bcb = opaque;
+
+ assert(backup_state.driver);
+ assert(backup_state.writer);
+ assert(backup_state.driver->dump_cb);
+
+ size_t zero_bytes = 0;
+ int bytes = backup_state.driver->dump_cb(backup_state.writer,
+ bcb->dev_id, cluster_num,
+ buf, &zero_bytes);
+
+ if (bytes > 0) {
+ bcb->transferred += bytes;
+ backup_state.transferred += bytes;
+ if (zero_bytes) {
+ bcb->zero_bytes += bytes;
+ backup_state.zero_bytes += zero_bytes;
+ }
+ }
+
+ return bytes;
+}
+
+static void backup_complete_cb(void *opaque, int ret)
+{
+ BackupCB *bcb = opaque;
+
+ assert(backup_state.driver);
+ assert(backup_state.writer);
+ assert(backup_state.driver->complete_cb);
+ assert(backup_state.driver->close_cb);
+
+ drive_put_ref_bh_schedule(drive_get_by_blockdev(bcb->bs));
+
+ backup_state.bcb_list = g_list_remove(backup_state.bcb_list, bcb);
+
+ backup_state.driver->complete_cb(backup_state.writer, bcb->dev_id, ret);
+
+ if (g_list_length(backup_state.bcb_list) == 0) {
+
+ backup_state.end_time = time(NULL);
+
+ backup_state.driver->close_cb(backup_state.writer, &backup_state.error);
+
+ backup_state.writer = NULL;
+ }
+
+ g_free(bcb);
+}
+
+void qmp_backup_cancel(Error **errp)
+{
+ if (backup_state.writer) {
+ assert(backup_state.driver);
+ assert(backup_state.driver->cancel_cb);
+ backup_state.driver->cancel_cb(backup_state.writer);
+ }
+}
+
+static void backup_start_jobs(void)
+{
+ /* start all jobs (one for each device) */
+ GList *l = backup_state.bcb_list;
+ while (l) {
+ BackupCB *bcb = l->data;
+ l = g_list_next(l);
+
+ if (backup_job_start(bcb->bs, backup_dump_cb, backup_complete_cb,
+ bcb) == 0) {
+ /* Grab a reference so hotplug does not delete the
+ * BlockDriverState from underneath us.
+ */
+ drive_get_ref(drive_get_by_blockdev(bcb->bs));
+ } else {
+ backup_state.driver->cancel_cb(backup_state.writer);
+ }
+ }
+}
+
+char *qmp_backup(const char *backupfile, bool has_format, const char *format,
+ bool has_config_filename, const char *config_filename,
+ bool has_devlist, const char *devlist,
+ bool has_speed, int64_t speed, Error **errp)
+{
+ BlockDriverState *bs;
+ Error *local_err = NULL;
+ uuid_t uuid;
+ void *writer = NULL;
+ gchar **devs = NULL;
+ GList *bcblist = NULL;
+
+ /* Todo: try to auto-detect format based on file name */
+ format = has_format ? format : "vma";
+
+ /* fixme: find driver for specifued format */
+ const BackupDriver *driver = NULL;
+
+ if (!driver) {
+ error_set(errp, ERROR_CLASS_GENERIC_ERROR,
+ "no backup driver for format '%s'", format);
+ return NULL;
+ }
+
+ if (has_devlist) {
+ devs = g_strsplit(devlist, ",;:", -1);
+
+ gchar **d = devs;
+ while (d && *d) {
+ bs = bdrv_find(*d);
+ if (bs) {
+ if (bdrv_is_read_only(bs)) {
+ error_set(errp, QERR_DEVICE_IS_READ_ONLY, *d);
+ goto err;
+ }
+ if (!bdrv_is_inserted(bs)) {
+ error_set(errp, QERR_DEVICE_HAS_NO_MEDIUM, *d);
+ goto err;
+ }
+ BackupCB *bcb = g_new0(BackupCB, 1);
+ bcb->bs = bs;
+ bcblist = g_list_append(bcblist, bcb);
+ } else {
+ error_set(errp, QERR_DEVICE_NOT_FOUND, *d);
+ goto err;
+ }
+ d++;
+ }
+
+ } else {
+
+ bs = NULL;
+ while ((bs = bdrv_next(bs))) {
+
+ if (!bdrv_is_inserted(bs) || bdrv_is_read_only(bs)) {
+ continue;
+ }
+
+ BackupCB *bcb = g_new0(BackupCB, 1);
+ bcb->bs = bs;
+ bcblist = g_list_append(bcblist, bcb);
+ }
+ }
+
+ if (!bcblist) {
+ error_set(errp, ERROR_CLASS_GENERIC_ERROR, "empty device list");
+ goto err;
+ }
+
+ GList *l = bcblist;
+ while (l) {
+ BackupCB *bcb = l->data;
+ l = g_list_next(l);
+ if (bcb->bs->job) {
+ error_set(errp, QERR_DEVICE_IN_USE, bdrv_get_device_name(bcb->bs));
+ goto err;
+ }
+ }
+
+ uuid_generate(uuid);
+
+ writer = driver->open_cb(backupfile, uuid, speed, &local_err);
+ if (!writer) {
+ if (error_is_set(&local_err)) {
+ error_propagate(errp, local_err);
+ }
+ goto err;
+ }
+
+ size_t total = 0;
+
+ /* register all devices for vma writer */
+ l = bcblist;
+ while (l) {
+ BackupCB *bcb = l->data;
+ l = g_list_next(l);
+
+ int64_t size = bdrv_getlength(bcb->bs);
+ const char *devname = bdrv_get_device_name(bcb->bs);
+ bcb->dev_id = driver->register_stream_cb(writer, devname, size);
+ if (bcb->dev_id <= 0) {
+ error_set(errp, ERROR_CLASS_GENERIC_ERROR,
+ "register_stream failed");
+ goto err;
+ }
+ bcb->size = size;
+ total += size;
+ }
+
+ /* add configuration file to archive */
+ if (has_config_filename) {
+ char *cdata = NULL;
+ gsize clen = 0;
+ GError *err = NULL;
+ if (!g_file_get_contents(config_filename, &cdata, &clen, &err)) {
+ error_setg(errp, "unable to read file '%s'", config_filename);
+ goto err;
+ }
+
+ const char *basename = g_path_get_basename(config_filename);
+ if (driver->register_config_cb(writer, basename, cdata, clen) <= 0) {
+ error_setg(errp, "register_config failed");
+ g_free(cdata);
+ goto err;
+ }
+ g_free(cdata);
+ }
+
+ /* initialize global backup_state now */
+
+ if (backup_state.error) {
+ error_free(backup_state.error);
+ backup_state.error = NULL;
+ }
+
+ backup_state.driver = driver;
+
+ backup_state.start_time = time(NULL);
+ backup_state.end_time = 0;
+
+ if (backup_state.backupfile) {
+ g_free(backup_state.backupfile);
+ }
+ backup_state.backupfile = g_strdup(backupfile);
+
+ backup_state.writer = writer;
+
+ uuid_copy(backup_state.uuid, uuid);
+ uuid_unparse_lower(uuid, backup_state.uuid_str);
+
+ backup_state.bcb_list = bcblist;
+
+ backup_state.total = total;
+ backup_state.transferred = 0;
+ backup_state.zero_bytes = 0;
+
+ backup_start_jobs();
+
+ return g_strdup(backup_state.uuid_str);
+
+err:
+
+ l = bcblist;
+ while (l) {
+ g_free(l->data);
+ l = g_list_next(l);
+ }
+ g_list_free(bcblist);
+
+ if (devs) {
+ g_strfreev(devs);
+ }
+
+ if (writer) {
+ unlink(backupfile);
+ if (driver) {
+ Error *err = NULL;
+ driver->close_cb(writer, &err);
+ }
+ }
+
+ return NULL;
+}
+
+BackupStatus *qmp_query_backup(Error **errp)
+{
+ BackupStatus *info = g_malloc0(sizeof(*info));
+
+ if (!backup_state.start_time) {
+ /* not started, return {} */
+ return info;
+ }
+
+ info->has_status = true;
+ info->has_start_time = true;
+ info->start_time = backup_state.start_time;
+
+ if (backup_state.backupfile) {
+ info->has_backupfile = true;
+ info->backupfile = g_strdup(backup_state.backupfile);
+ }
+
+ info->has_uuid = true;
+ info->uuid = g_strdup(backup_state.uuid_str);
+
+ if (backup_state.end_time) {
+ if (backup_state.error) {
+ info->status = g_strdup("error");
+ info->has_errmsg = true;
+ info->errmsg = g_strdup(error_get_pretty(backup_state.error));
+ } else {
+ info->status = g_strdup("done");
+ }
+ info->has_end_time = true;
+ info->end_time = backup_state.end_time;
+ } else {
+ info->status = g_strdup("active");
+ }
+
+ info->has_total = true;
+ info->total = backup_state.total;
+ info->has_zero_bytes = true;
+ info->zero_bytes = backup_state.zero_bytes;
+ info->has_transferred = true;
+ info->transferred = backup_state.transferred;
+
+ return info;
+}
+
static BlockJob *find_block_job(const char *device)
{
BlockDriverState *bs;
diff --git a/hmp-commands.hx b/hmp-commands.hx
index 010b8c9..57be357 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -83,6 +83,35 @@ STEXI
Copy data from a backing file into a block device.
ETEXI
+ {
+ .name = "backup",
+ .args_type = "backupfile:s,speed:o?,devlist:s?",
+ .params = "backupfile [speed [devlist]]",
+ .help = "create a VM Backup.",
+ .mhandler.cmd = hmp_backup,
+ },
+
+STEXI
+@item backup
+@findex backup
+Create a VM backup.
+ETEXI
+
+ {
+ .name = "backup_cancel",
+ .args_type = "",
+ .params = "",
+ .help = "cancel the current VM backup",
+ .mhandler.cmd = hmp_backup_cancel,
+ },
+
+STEXI
+@item backup_cancel
+@findex backup_cancel
+Cancel the current VM backup.
+
+ETEXI
+
{
.name = "block_job_set_speed",
.args_type = "device:B,speed:o",
@@ -1558,6 +1587,8 @@ show CPU statistics
show user network stack connection states
@item info migrate
show migration status
+@item info backup
+show backup status
@item info migrate_capabilities
show current migration capabilities
@item info migrate_cache_size
diff --git a/hmp.c b/hmp.c
index 180ba2b..600792f 100644
--- a/hmp.c
+++ b/hmp.c
@@ -130,6 +130,38 @@ void hmp_info_mice(Monitor *mon)
qapi_free_MouseInfoList(mice_list);
}
+void hmp_info_backup(Monitor *mon)
+{
+ BackupStatus *info;
+
+ info = qmp_query_backup(NULL);
+ if (info->has_status) {
+ if (info->has_errmsg) {
+ monitor_printf(mon, "Backup status: %s - %s\n",
+ info->status, info->errmsg);
+ } else {
+ monitor_printf(mon, "Backup status: %s\n", info->status);
+ }
+ }
+ if (info->has_backupfile) {
+ int per = (info->has_total && info->total &&
+ info->has_transferred && info->transferred) ?
+ (info->transferred * 100)/info->total : 0;
+ int zero_per = (info->has_total && info->total &&
+ info->has_zero_bytes && info->zero_bytes) ?
+ (info->zero_bytes * 100)/info->total : 0;
+ monitor_printf(mon, "Backup file: %s\n", info->backupfile);
+ monitor_printf(mon, "Backup uuid: %s\n", info->uuid);
+ monitor_printf(mon, "Total size: %zd\n", info->total);
+ monitor_printf(mon, "Transferred bytes: %zd (%d%%)\n",
+ info->transferred, per);
+ monitor_printf(mon, "Zero bytes: %zd (%d%%)\n",
+ info->zero_bytes, zero_per);
+ }
+
+ qapi_free_BackupStatus(info);
+}
+
void hmp_info_migrate(Monitor *mon)
{
MigrationInfo *info;
@@ -977,6 +1009,37 @@ void hmp_block_stream(Monitor *mon, const QDict *qdict)
hmp_handle_error(mon, &error);
}
+void hmp_backup_cancel(Monitor *mon, const QDict *qdict)
+{
+ Error *errp = NULL;
+
+ qmp_backup_cancel(&errp);
+
+ if (error_is_set(&errp)) {
+ monitor_printf(mon, "%s\n", error_get_pretty(errp));
+ error_free(errp);
+ return;
+ }
+}
+
+void hmp_backup(Monitor *mon, const QDict *qdict)
+{
+ const char *backupfile = qdict_get_str(qdict, "backupfile");
+ const char *devlist = qdict_get_try_str(qdict, "devlist");
+ int64_t speed = qdict_get_try_int(qdict, "speed", 0);
+
+ Error *errp = NULL;
+
+ qmp_backup(backupfile, false, NULL, false, NULL, !!devlist, devlist,
+ qdict_haskey(qdict, "speed"), speed, &errp);
+
+ if (error_is_set(&errp)) {
+ monitor_printf(mon, "%s\n", error_get_pretty(errp));
+ error_free(errp);
+ return;
+ }
+}
+
void hmp_block_job_set_speed(Monitor *mon, const QDict *qdict)
{
Error *error = NULL;
diff --git a/hmp.h b/hmp.h
index 0ab03be..20c9a62 100644
--- a/hmp.h
+++ b/hmp.h
@@ -28,6 +28,7 @@ void hmp_info_mice(Monitor *mon);
void hmp_info_migrate(Monitor *mon);
void hmp_info_migrate_capabilities(Monitor *mon);
void hmp_info_migrate_cache_size(Monitor *mon);
+void hmp_info_backup(Monitor *mon);
void hmp_info_cpus(Monitor *mon);
void hmp_info_block(Monitor *mon);
void hmp_info_blockstats(Monitor *mon);
@@ -63,6 +64,8 @@ void hmp_eject(Monitor *mon, const QDict *qdict);
void hmp_change(Monitor *mon, const QDict *qdict);
void hmp_block_set_io_throttle(Monitor *mon, const QDict *qdict);
void hmp_block_stream(Monitor *mon, const QDict *qdict);
+void hmp_backup(Monitor *mon, const QDict *qdict);
+void hmp_backup_cancel(Monitor *mon, const QDict *qdict);
void hmp_block_job_set_speed(Monitor *mon, const QDict *qdict);
void hmp_block_job_cancel(Monitor *mon, const QDict *qdict);
void hmp_block_job_pause(Monitor *mon, const QDict *qdict);
diff --git a/monitor.c b/monitor.c
index c0e32d6..85cf47e 100644
--- a/monitor.c
+++ b/monitor.c
@@ -2680,6 +2680,13 @@ static mon_cmd_t info_cmds[] = {
},
#endif
{
+ .name = "backup",
+ .args_type = "",
+ .params = "",
+ .help = "show backup status",
+ .mhandler.info = hmp_info_backup,
+ },
+ {
.name = "migrate",
.args_type = "",
.params = "",
diff --git a/qapi-schema.json b/qapi-schema.json
index 5dfa052..71e0ed9 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -358,6 +358,39 @@
{ 'type': 'EventInfo', 'data': {'name': 'str'} }
##
+# @BackupStatus:
+#
+# Detailed backup status.
+#
+# @status: #optional string describing the current backup status.
+# Tthis can be 'active', 'done', 'error'. If this field is not
+# returned, no backup process has been initiated
+#
+# @errmsg: #optional error message (only returned if status is 'error')
+#
+# @total: #optional total amount of bytes involved in the backup process
+#
+# @transferred: #optional amount of bytes already backed up.
+#
+# @zero-bytes: #optional amount of 'zero' bytes detected.
+#
+# @start-time: #optional time (epoch) when backup job started.
+#
+# @end-time: #optional time (epoch) when backup job finished.
+#
+# @backupfile: #optional backup file name
+#
+# @uuid: #optional uuid for this backup job
+#
+# Since: 1.4.0
+##
+{ 'type': 'BackupStatus',
+ 'data': {'*status': 'str', '*errmsg': 'str', '*total': 'int',
+ '*transferred': 'int', '*zero-bytes': 'int',
+ '*start-time': 'int', '*end-time': 'int',
+ '*backupfile': 'str', '*uuid': 'str' } }
+
+##
# @query-events:
#
# Return a list of supported QMP events by this server
@@ -1764,6 +1797,54 @@
'data': { 'path': 'str' },
'returns': [ 'ObjectPropertyInfo' ] }
+
+##
+# @backup:
+#
+# Starts a VM backup.
+#
+# @backupfile: the backup file name
+#
+# @format: format of the backup file
+#
+# @config-filename: #optional name of a configuration file to include into
+# the backup archive.
+#
+# @speed: #optional the maximum speed, in bytes per second
+#
+# Returns: the uuid of the backup job
+#
+# Since: 1.4.0
+##
+{ 'command': 'backup', 'data': { 'backupfile': 'str', '*format': 'str',
+ '*config-filename': 'str',
+ '*devlist': 'str', '*speed': 'int' },
+ 'returns': 'str' }
+
+##
+# @query-backup
+#
+# Returns information about current/last backup task.
+#
+# Returns: @BackupStatus
+#
+# Since: 1.4.0
+##
+{ 'command': 'query-backup', 'returns': 'BackupStatus' }
+
+##
+# @backup-cancel
+#
+# Cancel the current executing backup process.
+#
+# Returns: nothing on success
+#
+# Notes: This command succeeds even if there is no backup process running.
+#
+# Since: 1.4.0
+##
+{ 'command': 'backup-cancel' }
+
##
# @qom-get:
#
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 5c692d0..c46fdc4 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -822,6 +822,18 @@ EQMP
},
{
+ .name = "backup",
+ .args_type = "backupfile:s,format:s?,config-filename:F?,speed:o?,devlist:s?",
+ .mhandler.cmd_new = qmp_marshal_input_backup,
+ },
+
+ {
+ .name = "backup_cancel",
+ .args_type = "",
+ .mhandler.cmd_new = qmp_marshal_input_backup_cancel,
+ },
+
+ {
.name = "block-job-set-speed",
.args_type = "device:B,speed:o",
.mhandler.cmd_new = qmp_marshal_input_block_job_set_speed,
@@ -2491,6 +2503,21 @@ EQMP
},
SQMP
+
+query-backup
+-------------
+
+Backup status.
+
+EQMP
+
+ {
+ .name = "query-backup",
+ .args_type = "",
+ .mhandler.cmd_new = qmp_marshal_input_query_backup,
+ },
+
+SQMP
migrate-set-capabilities
-------
--
1.7.2.5
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [Qemu-devel] [PATCH v2 4/6] introduce new vma archive format
2012-11-30 9:22 [Qemu-devel] [PATCH v2 1/6] RFC: Efficient VM backup for qemu Dietmar Maurer
2012-11-30 9:22 ` [Qemu-devel] [PATCH v2 2/6] add basic backup support to block driver Dietmar Maurer
2012-11-30 9:22 ` [Qemu-devel] [PATCH v2 3/6] add backup related monitor commands Dietmar Maurer
@ 2012-11-30 9:22 ` Dietmar Maurer
2012-11-30 9:22 ` [Qemu-devel] [PATCH v2 5/6] add regression tests for backup Dietmar Maurer
2012-11-30 9:22 ` [Qemu-devel] [PATCH v2 6/6] add vm state to backups Dietmar Maurer
4 siblings, 0 replies; 9+ messages in thread
From: Dietmar Maurer @ 2012-11-30 9:22 UTC (permalink / raw)
To: qemu-devel; +Cc: Dietmar Maurer
This is a very simple archive format, see docs/specs/vma_spec.txt
Signed-off-by: Dietmar Maurer <dietmar@proxmox.com>
---
Makefile | 3 +-
Makefile.objs | 2 +-
blockdev.c | 6 +-
docs/specs/vma_spec.txt | 24 ++
vma-reader.c | 772 ++++++++++++++++++++++++++++++++++++++++
vma-writer.c | 908 +++++++++++++++++++++++++++++++++++++++++++++++
vma.c | 550 ++++++++++++++++++++++++++++
vma.h | 145 ++++++++
8 files changed, 2406 insertions(+), 4 deletions(-)
create mode 100644 docs/specs/vma_spec.txt
create mode 100644 vma-reader.c
create mode 100644 vma-writer.c
create mode 100644 vma.c
create mode 100644 vma.h
diff --git a/Makefile b/Makefile
index 9ecbcbb..30a9268 100644
--- a/Makefile
+++ b/Makefile
@@ -100,7 +100,7 @@ defconfig:
-include config-all-devices.mak
-all: $(DOCS) $(TOOLS) $(HELPERS-y) recurse-all
+all: $(DOCS) $(TOOLS) vma$(EXESUF) $(HELPERS-y) recurse-all
config-host.h: config-host.h-timestamp
config-host.h-timestamp: config-host.mak
@@ -194,6 +194,7 @@ tools-obj-$(CONFIG_POSIX) += compatfd.o
qemu-img$(EXESUF): qemu-img.o $(tools-obj-y) $(block-obj-y) libqemustub.a
qemu-nbd$(EXESUF): qemu-nbd.o $(tools-obj-y) $(block-obj-y) libqemustub.a
qemu-io$(EXESUF): qemu-io.o cmd.o $(tools-obj-y) $(block-obj-y) libqemustub.a
+vma$(EXESUF): vma.o vma-writer.o vma-reader.o $(tools-obj-y) $(block-obj-y) libqemustub.a
qemu-bridge-helper$(EXESUF): qemu-bridge-helper.o
diff --git a/Makefile.objs b/Makefile.objs
index cb46be5..b5732e2 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -48,7 +48,7 @@ coroutine-obj-$(CONFIG_WIN32) += coroutine-win32.o
block-obj-y = iov.o cache-utils.o qemu-option.o module.o async.o
block-obj-y += nbd.o block.o blockjob.o aes.o qemu-config.o
block-obj-y += thread-pool.o qemu-progress.o qemu-sockets.o uri.o notify.o
-block-obj-y += backup.o
+block-obj-y += vma-writer.o backup.o
block-obj-y += $(coroutine-obj-y) $(qobject-obj-y) $(version-obj-y)
block-obj-$(CONFIG_POSIX) += event_notifier-posix.o aio-posix.o
block-obj-$(CONFIG_WIN32) += event_notifier-win32.o aio-win32.o
diff --git a/blockdev.c b/blockdev.c
index a64490e..07db561 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -21,6 +21,7 @@
#include "trace.h"
#include "arch_init.h"
#include "backup.h"
+#include "vma.h"
static QTAILQ_HEAD(drivelist, DriveInfo) drives = QTAILQ_HEAD_INITIALIZER(drives);
@@ -1444,10 +1445,11 @@ char *qmp_backup(const char *backupfile, bool has_format, const char *format,
/* Todo: try to auto-detect format based on file name */
format = has_format ? format : "vma";
- /* fixme: find driver for specifued format */
const BackupDriver *driver = NULL;
- if (!driver) {
+ if (strcmp(format, "vma") == 0) {
+ driver = &backup_vma_driver;
+ } else {
error_set(errp, ERROR_CLASS_GENERIC_ERROR,
"no backup driver for format '%s'", format);
return NULL;
diff --git a/docs/specs/vma_spec.txt b/docs/specs/vma_spec.txt
new file mode 100644
index 0000000..052c629
--- /dev/null
+++ b/docs/specs/vma_spec.txt
@@ -0,0 +1,24 @@
+=Virtual Machine Archive format (VMA)=
+
+This format contains a header which includes the VM configuration as
+binary blobs, and a list of devices (dev_id, name).
+
+The actual VM image data is stored inside extents. An extent contains
+up to 64 clusters, and start with a 512 byte header containing
+additional information for those clusters.
+
+We use a cluster size of 65536, and use 8 bytes for each
+cluster in the header to store the following information:
+
+* 1 byte dev_id (to identity the drive)
+* 2 bytes zero indicator (mark zero regions (16x4096))
+* 4 bytes cluster number
+* 1 byte not used (reserved)
+
+We only store non-zero blocks (such block is 4096 bytes).
+
+Each archive is marked with a uuid. The archive header and all
+extent headers includes that uuid and a MD5 checksum (over header
+data).
+
+
diff --git a/vma-reader.c b/vma-reader.c
new file mode 100644
index 0000000..154c96b
--- /dev/null
+++ b/vma-reader.c
@@ -0,0 +1,772 @@
+/*
+ * VMA: Virtual Machine Archive
+ *
+ * Copyright (C) 2012 Proxmox Server Solutions
+ *
+ * Authors:
+ * Dietmar Maurer (dietmar@proxmox.com)
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include <stdio.h>
+#include <errno.h>
+#include <unistd.h>
+#include <stdio.h>
+#include <string.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+#include <glib.h>
+#include <uuid/uuid.h>
+
+#include "qemu-common.h"
+#include "qemu_socket.h"
+#include "qemu-coroutine.h"
+#include "qemu-aio.h"
+#include "qemu/ratelimit.h"
+#include "vma.h"
+#include "block.h"
+
+#define BITS_PER_LONG (sizeof(unsigned long) * 8)
+
+static unsigned char zero_vma_block[VMA_BLOCK_SIZE];
+
+typedef struct VmaRestoreState {
+ BlockDriverState *bs;
+ bool write_zeroes;
+ unsigned long *bitmap;
+ int bitmap_size;
+} VmaRestoreState;
+
+struct VmaReader {
+ int fd;
+ GChecksum *md5csum;
+ GHashTable *blob_hash;
+ unsigned char *head_data;
+ VmaDeviceInfo devinfo[256];
+ VmaRestoreState rstate[256];
+ GList *cdata_list;
+ guint8 vmstate_stream;
+ uint32_t vmstate_clusters;
+};
+
+static guint
+g_int32_hash(gconstpointer v)
+{
+ return *(const uint32_t *)v;
+}
+
+static gboolean
+g_int32_equal(gconstpointer v1, gconstpointer v2)
+{
+ return *((const uint32_t *)v1) == *((const uint32_t *)v2);
+}
+
+static int vma_reader_get_bitmap(VmaRestoreState *rstate, int64_t cluster_num)
+{
+ assert(rstate);
+ assert(rstate->bitmap);
+
+ unsigned long val, idx, bit;
+
+ idx = cluster_num / BITS_PER_LONG;
+
+ assert(rstate->bitmap_size > idx);
+
+ bit = cluster_num % BITS_PER_LONG;
+ val = rstate->bitmap[idx];
+
+ return !!(val & (1UL << bit));
+}
+
+static void vma_reader_set_bitmap(VmaRestoreState *rstate, int64_t cluster_num,
+ int dirty)
+{
+ assert(rstate);
+ assert(rstate->bitmap);
+
+ unsigned long val, idx, bit;
+
+ idx = cluster_num / BITS_PER_LONG;
+
+ assert(rstate->bitmap_size > idx);
+
+ bit = cluster_num % BITS_PER_LONG;
+ val = rstate->bitmap[idx];
+ if (dirty) {
+ if (!(val & (1UL << bit))) {
+ val |= 1UL << bit;
+ }
+ } else {
+ if (val & (1UL << bit)) {
+ val &= ~(1UL << bit);
+ }
+ }
+ rstate->bitmap[idx] = val;
+}
+
+typedef struct VmaBlob {
+ uint32_t start;
+ uint32_t len;
+ void *data;
+} VmaBlob;
+
+static const VmaBlob *get_header_blob(VmaReader *vmar, uint32_t pos)
+{
+ assert(vmar);
+ assert(vmar->blob_hash);
+
+ return g_hash_table_lookup(vmar->blob_hash, &pos);
+}
+
+static const char *get_header_str(VmaReader *vmar, uint32_t pos)
+{
+ const VmaBlob *blob = get_header_blob(vmar, pos);
+ if (!blob) {
+ return NULL;
+ }
+ const char *res = (char *)blob->data;
+ if (res[blob->len-1] != '\0') {
+ return NULL;
+ }
+ return res;
+}
+
+static ssize_t
+safe_read(int fd, unsigned char *buf, size_t count)
+{
+ ssize_t n;
+
+ do {
+ n = read(fd, buf, count);
+ } while (n < 0 && errno == EINTR);
+
+ return n;
+}
+
+static ssize_t
+full_read(int fd, unsigned char *buf, size_t len)
+{
+ ssize_t n;
+ size_t total;
+
+ total = 0;
+
+ while (len > 0) {
+ n = safe_read(fd, buf, len);
+
+ if (n == 0) {
+ return total;
+ }
+
+ if (n <= 0) {
+ break;
+ }
+
+ buf += n;
+ total += n;
+ len -= n;
+ }
+
+ if (len) {
+ return -1;
+ }
+
+ return total;
+}
+
+void vma_reader_destroy(VmaReader *vmar)
+{
+ assert(vmar);
+
+ if (vmar->fd >= 0) {
+ close(vmar->fd);
+ }
+
+ if (vmar->cdata_list) {
+ g_list_free(vmar->cdata_list);
+ }
+
+ int i;
+ for (i = 1; i < 256; i++) {
+ if (vmar->rstate[i].bitmap) {
+ g_free(vmar->rstate[i].bitmap);
+ }
+ }
+
+ if (vmar->md5csum) {
+ g_checksum_free(vmar->md5csum);
+ }
+
+ if (vmar->blob_hash) {
+ g_hash_table_destroy(vmar->blob_hash);
+ }
+
+ if (vmar->head_data) {
+ g_free(vmar->head_data);
+ }
+
+ g_free(vmar);
+
+};
+
+static int vma_reader_read_head(VmaReader *vmar, Error **errp)
+{
+ assert(vmar);
+ assert(errp);
+ assert(*errp == NULL);
+
+ unsigned char md5sum[16];
+ int i;
+ int ret = 0;
+
+ vmar->head_data = g_malloc(sizeof(VmaHeader));
+
+ if (full_read(vmar->fd, vmar->head_data, sizeof(VmaHeader)) !=
+ sizeof(VmaHeader)) {
+ error_setg(errp, "can't read vma header - %s",
+ errno ? strerror(errno) : "got EOF");
+ return -1;
+ }
+
+ VmaHeader *h = (VmaHeader *)vmar->head_data;
+
+ if (h->magic != VMA_MAGIC) {
+ error_setg(errp, "not a vma file - wrong magic number");
+ return -1;
+ }
+
+ uint32_t header_size = GUINT32_FROM_BE(h->header_size);
+ int need = header_size - sizeof(VmaHeader);
+ if (need <= 0) {
+ error_setg(errp, "wrong vma header size %d", header_size);
+ return -1;
+ }
+
+ vmar->head_data = g_realloc(vmar->head_data, header_size);
+ h = (VmaHeader *)vmar->head_data;
+
+ if (full_read(vmar->fd, vmar->head_data + sizeof(VmaHeader), need) !=
+ need) {
+ error_setg(errp, "can't read vma header data - %s",
+ errno ? strerror(errno) : "got EOF");
+ return -1;
+ }
+
+ memcpy(md5sum, h->md5sum, 16);
+ memset(h->md5sum, 0, 16);
+
+ g_checksum_reset(vmar->md5csum);
+ g_checksum_update(vmar->md5csum, vmar->head_data, header_size);
+ gsize csize = 16;
+ g_checksum_get_digest(vmar->md5csum, (guint8 *)(h->md5sum), &csize);
+
+ if (memcmp(md5sum, h->md5sum, 16) != 0) {
+ error_setg(errp, "wrong vma header chechsum");
+ return -1;
+ }
+
+ /* we can modify header data after checksum verify */
+ h->header_size = header_size;
+
+ h->version = GUINT32_FROM_BE(h->version);
+ if (h->version != 1) {
+ error_setg(errp, "wrong vma version %d", h->version);
+ return -1;
+ }
+
+ h->ctime = GUINT64_FROM_BE(h->ctime);
+ h->blob_buffer_offset = GUINT32_FROM_BE(h->blob_buffer_offset);
+ h->blob_buffer_size = GUINT32_FROM_BE(h->blob_buffer_size);
+
+ uint32_t bstart = h->blob_buffer_offset + 1;
+ uint32_t bend = h->blob_buffer_offset + h->blob_buffer_size;
+
+ if (bstart <= sizeof(VmaHeader)) {
+ error_setg(errp, "wrong vma blob buffer offset %d",
+ h->blob_buffer_offset);
+ return -1;
+ }
+
+ if (bend > header_size) {
+ error_setg(errp, "wrong vma blob buffer size %d/%d",
+ h->blob_buffer_offset, h->blob_buffer_size);
+ return -1;
+ }
+
+ while ((bstart + 2) <= bend) {
+ uint32_t size = vmar->head_data[bstart] +
+ (vmar->head_data[bstart+1] << 8);
+ if ((bstart + size + 2) <= bend) {
+ VmaBlob *blob = g_new0(VmaBlob, 1);
+ blob->start = bstart - h->blob_buffer_offset;
+ blob->len = size;
+ blob->data = vmar->head_data + bstart + 2;
+ g_hash_table_insert(vmar->blob_hash, &blob->start, blob);
+ }
+ bstart += size + 2;
+ }
+
+
+ int count = 0;
+ for (i = 1; i < 256; i++) {
+ VmaDeviceInfoHeader *dih = &h->dev_info[i];
+ uint32_t devname_ptr = GUINT32_FROM_BE(dih->devname_ptr);
+ uint64_t size = GUINT64_FROM_BE(dih->size);
+ const char *devname = get_header_str(vmar, devname_ptr);
+
+ if (size && devname) {
+ count++;
+ vmar->devinfo[i].size = size;
+ vmar->devinfo[i].devname = devname;
+
+ if (strcmp(devname, "vmstate") == 0) {
+ vmar->vmstate_stream = i;
+ }
+ }
+ }
+
+ if (!count) {
+ error_setg(errp, "vma does not contain data");
+ return -1;
+ }
+
+ for (i = 0; i < VMA_MAX_CONFIGS; i++) {
+ uint32_t name_ptr = GUINT32_FROM_BE(h->config_names[i]);
+ uint32_t data_ptr = GUINT32_FROM_BE(h->config_data[i]);
+
+ if (!(name_ptr && data_ptr)) {
+ continue;
+ }
+ const char *name = get_header_str(vmar, name_ptr);
+ const VmaBlob *blob = get_header_blob(vmar, data_ptr);
+
+ if (!(name && blob)) {
+ error_setg(errp, "vma contains invalid data pointers");
+ return -1;
+ }
+
+ VmaConfigData *cdata = g_new0(VmaConfigData, 1);
+ cdata->name = name;
+ cdata->data = blob->data;
+ cdata->len = blob->len;
+
+ vmar->cdata_list = g_list_append(vmar->cdata_list, cdata);
+ }
+
+ return ret;
+};
+
+VmaReader *vma_reader_create(const char *filename, Error **errp)
+{
+ assert(filename);
+ assert(errp);
+
+ VmaReader *vmar = g_new0(VmaReader, 1);
+
+ vmar->fd = open(filename, O_RDONLY);
+
+ if (vmar->fd < 0) {
+ error_setg(errp, "can't open file %s - %s\n", filename,
+ strerror(errno));
+ goto err;
+ }
+
+ vmar->md5csum = g_checksum_new(G_CHECKSUM_MD5);
+ if (!vmar->md5csum) {
+ error_setg(errp, "can't allocate cmsum\n");
+ goto err;
+ }
+
+ vmar->blob_hash = g_hash_table_new_full(g_int32_hash, g_int32_equal,
+ NULL, g_free);
+
+ if (vma_reader_read_head(vmar, errp) < 0) {
+ goto err;
+ }
+
+ return vmar;
+
+err:
+ if (vmar) {
+ vma_reader_destroy(vmar);
+ }
+
+ return NULL;
+}
+
+VmaHeader *vma_reader_get_header(VmaReader *vmar)
+{
+ assert(vmar);
+ assert(vmar->head_data);
+
+ return (VmaHeader *)(vmar->head_data);
+}
+
+GList *vma_reader_get_config_data(VmaReader *vmar)
+{
+ assert(vmar);
+ assert(vmar->head_data);
+
+ return vmar->cdata_list;
+}
+
+VmaDeviceInfo *vma_reader_get_device_info(VmaReader *vmar, guint8 dev_id)
+{
+ assert(vmar);
+ assert(dev_id);
+
+ if (vmar->devinfo[dev_id].size && vmar->devinfo[dev_id].devname) {
+ return &vmar->devinfo[dev_id];
+ }
+
+ return NULL;
+}
+
+int vma_reader_register_bs(VmaReader *vmar, guint8 dev_id, BlockDriverState *bs,
+ bool write_zeroes, Error **errp)
+{
+ assert(vmar);
+ assert(bs != NULL);
+ assert(dev_id);
+ assert(vmar->rstate[dev_id].bs == NULL);
+
+ int64_t size = bdrv_getlength(bs);
+ if (size != vmar->devinfo[dev_id].size) {
+ error_setg(errp, "vma_reader_register_bs for stream %s failed - "
+ "unexpected size %zd != %zd", vmar->devinfo[dev_id].devname,
+ size, vmar->devinfo[dev_id].size);
+ return -1;
+ }
+
+ vmar->rstate[dev_id].bs = bs;
+ vmar->rstate[dev_id].write_zeroes = write_zeroes;
+
+ int64_t bitmap_size = (size/BDRV_SECTOR_SIZE) +
+ (VMA_CLUSTER_SIZE/BDRV_SECTOR_SIZE) * BITS_PER_LONG - 1;
+ bitmap_size /= (VMA_CLUSTER_SIZE/BDRV_SECTOR_SIZE) * BITS_PER_LONG;
+
+ vmar->rstate[dev_id].bitmap_size = bitmap_size;
+ vmar->rstate[dev_id].bitmap = g_new0(unsigned long, bitmap_size);
+
+ return 0;
+}
+
+static ssize_t safe_write(int fd, void *buf, size_t count)
+{
+ ssize_t n;
+
+ do {
+ n = write(fd, buf, count);
+ } while (n < 0 && errno == EINTR);
+
+ return n;
+}
+
+static size_t full_write(int fd, void *buf, size_t len)
+{
+ ssize_t n;
+ size_t total;
+
+ total = 0;
+
+ while (len > 0) {
+ n = safe_write(fd, buf, len);
+ if (n < 0) {
+ return n;
+ }
+ buf += n;
+ total += n;
+ len -= n;
+ }
+
+ if (len) {
+ /* incomplete write ? */
+ return -1;
+ }
+
+ return total;
+}
+
+static int restore_write_data(VmaReader *vmar, guint8 dev_id,
+ BlockDriverState *bs, int vmstate_fd,
+ unsigned char *buf, int64_t sector_num,
+ int nb_sectors, Error **errp)
+{
+ assert(vmar);
+
+ if (dev_id == vmar->vmstate_stream) {
+ if (vmstate_fd >= 0) {
+ int len = nb_sectors * BDRV_SECTOR_SIZE;
+ int res = full_write(vmstate_fd, buf, len);
+ if (res < 0) {
+ error_setg(errp, "write vmstate failed %d", res);
+ return -1;
+ }
+ }
+ } else {
+ int res = bdrv_write(bs, sector_num, buf, nb_sectors);
+ if (res < 0) {
+ error_setg(errp, "bdrv_write to %s failed (%d)",
+ bdrv_get_device_name(bs), res);
+ return -1;
+ }
+ }
+ return 0;
+}
+static int restore_extent(VmaReader *vmar, unsigned char *buf,
+ int extent_size, int vmstate_fd, Error **errp)
+{
+ assert(vmar);
+ assert(buf);
+
+ VmaExtentHeader *ehead = (VmaExtentHeader *)buf;
+ int start = VMA_EXTENT_HEADER_SIZE;
+ int i;
+
+ for (i = 0; i < VMA_BLOCKS_PER_EXTENT; i++) {
+ uint64_t block_info = GUINT64_FROM_BE(ehead->blockinfo[i]);
+ uint32_t cluster_num = block_info & 0xffffffff;
+ uint8_t dev_id = (block_info >> 32) & 0xff;
+ uint16_t mask = block_info >> (32+16);
+ int64_t max_sector;
+
+ if (!dev_id) {
+ continue;
+ }
+
+ VmaRestoreState *rstate = &vmar->rstate[dev_id];
+ BlockDriverState *bs = NULL;
+
+ if (dev_id != vmar->vmstate_stream) {
+ bs = rstate->bs;
+ if (!bs) {
+ error_setg(errp, "got wrong dev id %d", dev_id);
+ return -1;
+ }
+
+ if (vma_reader_get_bitmap(rstate, cluster_num)) {
+ error_setg(errp, "found duplicated cluster %d for stream %s",
+ cluster_num, vmar->devinfo[dev_id].devname);
+ return -1;
+ }
+ vma_reader_set_bitmap(rstate, cluster_num, 1);
+
+ max_sector = vmar->devinfo[dev_id].size/BDRV_SECTOR_SIZE;
+ } else {
+ max_sector = G_MAXINT64;
+ if (cluster_num != vmar->vmstate_clusters) {
+ error_setg(errp, "found out of order vmstate data");
+ return -1;
+ }
+ vmar->vmstate_clusters++;
+ }
+
+ /* try to write whole clusters to speedup restore */
+ if (mask == 0xffff) {
+ if ((start + VMA_CLUSTER_SIZE) > extent_size) {
+ error_setg(errp, "short vma extent - too many blocks");
+ return -1;
+ }
+ int64_t sector_num = (cluster_num * VMA_CLUSTER_SIZE) /
+ BDRV_SECTOR_SIZE;
+ int64_t end_sector = sector_num +
+ VMA_CLUSTER_SIZE/BDRV_SECTOR_SIZE;
+
+ if (end_sector > max_sector) {
+ end_sector = max_sector;
+ }
+
+ if (end_sector <= sector_num) {
+ error_setg(errp, "got wrong block address - write bejond end");
+ return -1;
+ }
+
+ int nb_sectors = end_sector - sector_num;
+ if (restore_write_data(vmar, dev_id, bs, vmstate_fd, buf + start,
+ sector_num, nb_sectors, errp) < 0) {
+ return -1;
+ }
+
+ start += VMA_CLUSTER_SIZE;
+ } else {
+ int j;
+ int bit = 1;
+
+ for (j = 0; j < 16; j++) {
+ int64_t sector_num = (cluster_num*VMA_CLUSTER_SIZE +
+ j*VMA_BLOCK_SIZE)/BDRV_SECTOR_SIZE;
+
+ int64_t end_sector = sector_num +
+ VMA_BLOCK_SIZE/BDRV_SECTOR_SIZE;
+ if (end_sector > max_sector) {
+ end_sector = max_sector;
+ }
+
+ if (mask & bit) {
+ if ((start + VMA_BLOCK_SIZE) > extent_size) {
+ error_setg(errp, "short vma extent - too many blocks");
+ return -1;
+ }
+
+ if (end_sector <= sector_num) {
+ error_setg(errp, "got wrong block address - "
+ "write bejond end");
+ return -1;
+ }
+
+ int nb_sectors = end_sector - sector_num;
+ if (restore_write_data(vmar, dev_id, bs, vmstate_fd,
+ buf + start, sector_num,
+ nb_sectors, errp) < 0) {
+ return -1;
+ }
+
+ start += VMA_BLOCK_SIZE;
+
+ } else {
+
+ if (rstate->write_zeroes & (end_sector > sector_num)) {
+ /* Todo: use bdrv_co_write_zeroes (but that need to
+ * be run inside coroutine?)
+ */
+ int nb_sectors = end_sector - sector_num;
+ if (restore_write_data(vmar, dev_id, bs, vmstate_fd,
+ zero_vma_block, sector_num,
+ nb_sectors, errp) < 0) {
+ return -1;
+ }
+ }
+ }
+
+ bit = bit << 1;
+ }
+ }
+ }
+
+ if (start != extent_size) {
+ error_setg(errp, "vma extent error - missing blocks");
+ return -1;
+ }
+
+ return 0;
+}
+
+int vma_reader_restore(VmaReader *vmar, int vmstate_fd, Error **errp)
+{
+ assert(vmar);
+ assert(vmar->head_data);
+
+ int ret = 0;
+ unsigned char buf[VMA_MAX_EXTENT_SIZE];
+ int buf_pos = 0;
+ unsigned char md5sum[16];
+ VmaHeader *h = (VmaHeader *)vmar->head_data;
+
+
+ while (1) {
+ int bytes = full_read(vmar->fd, buf + buf_pos, sizeof(buf) - buf_pos);
+ if (bytes < 0) {
+ error_setg(errp, "read failed - %s", strerror(errno));
+ return -1;
+ }
+
+ buf_pos += bytes;
+
+ if (!buf_pos) {
+ break; /* EOF */
+ }
+
+ if (buf_pos < VMA_EXTENT_HEADER_SIZE) {
+ error_setg(errp, "read short extent (%d bytes)", buf_pos);
+ return -1;
+ }
+
+ VmaExtentHeader *ehead = (VmaExtentHeader *)buf;
+
+ /* extract md5sum */
+ memcpy(md5sum, ehead->md5sum, sizeof(ehead->md5sum));
+ memset(ehead->md5sum, 0, sizeof(ehead->md5sum));
+
+ g_checksum_reset(vmar->md5csum);
+ g_checksum_update(vmar->md5csum, buf, VMA_EXTENT_HEADER_SIZE);
+ gsize csize = 16;
+ g_checksum_get_digest(vmar->md5csum, ehead->md5sum, &csize);
+
+ if (memcmp(md5sum, ehead->md5sum, 16) != 0) {
+ error_setg(errp, "wrong vma extent header chechsum");
+ return -1;
+ }
+
+ if (memcmp(h->uuid, ehead->uuid, sizeof(ehead->uuid)) != 0) {
+ error_setg(errp, "wrong vma extent uuid");
+ return -1;
+ }
+
+ if (ehead->magic != VMA_EXTENT_MAGIC || ehead->reserved1 != 0) {
+ error_setg(errp, "wrong vma extent header magic");
+ return -1;
+ }
+
+ int block_count = GUINT16_FROM_BE(ehead->block_count);
+ int extent_size = VMA_EXTENT_HEADER_SIZE + block_count*VMA_BLOCK_SIZE;
+
+ if (buf_pos < extent_size) {
+ error_setg(errp, "short vma extent (%d < %d)", buf_pos,
+ extent_size);
+ return -1;
+ }
+
+ if (restore_extent(vmar, buf, extent_size, vmstate_fd, errp) < 0) {
+ return -1;
+ }
+
+ if (buf_pos > extent_size) {
+ memmove(buf, buf + extent_size, buf_pos - extent_size);
+ buf_pos = buf_pos - extent_size;
+ } else {
+ buf_pos = 0;
+ }
+ }
+
+ bdrv_drain_all();
+
+ int i;
+ for (i = 1; i < 256; i++) {
+ VmaRestoreState *rstate = &vmar->rstate[i];
+ if (!rstate->bs) {
+ continue;
+ }
+
+ if (bdrv_flush(rstate->bs) < 0) {
+ error_setg(errp, "vma bdrv_flush %s failed",
+ vmar->devinfo[i].devname);
+ return -1;
+ }
+
+ if (vmar->devinfo[i].size &&
+ (strcmp(vmar->devinfo[i].devname, "vmstate") != 0)) {
+ assert(rstate->bitmap);
+
+ int64_t cluster_num, end;
+
+ end = (vmar->devinfo[i].size + VMA_CLUSTER_SIZE - 1) /
+ VMA_CLUSTER_SIZE;
+
+ for (cluster_num = 0; cluster_num < end; cluster_num++) {
+ if (!vma_reader_get_bitmap(rstate, cluster_num)) {
+ error_setg(errp, "detected missing cluster %zd "
+ "for stream %s", cluster_num,
+ vmar->devinfo[i].devname);
+ return -1;
+ }
+ }
+ }
+ }
+
+ return ret;
+}
+
diff --git a/vma-writer.c b/vma-writer.c
new file mode 100644
index 0000000..717e25a
--- /dev/null
+++ b/vma-writer.c
@@ -0,0 +1,908 @@
+/*
+ * VMA: Virtual Machine Archive
+ *
+ * Copyright (C) 2012 Proxmox Server Solutions
+ *
+ * Authors:
+ * Dietmar Maurer (dietmar@proxmox.com)
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include <stdio.h>
+#include <errno.h>
+#include <unistd.h>
+#include <stdio.h>
+#include <string.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+#include <glib.h>
+#include <uuid/uuid.h>
+
+#include "qemu-common.h"
+#include "qemu_socket.h"
+#include "qemu-coroutine.h"
+#include "qemu-aio.h"
+#include "qemu/ratelimit.h"
+#include "vma.h"
+#include "block.h"
+
+#define DEBUG_VMA 0
+
+#define DPRINTF(fmt, ...)\
+ do { if (DEBUG_VMA) { printf("vma: " fmt, ## __VA_ARGS__); } } while (0)
+
+#define WRITE_BUFFERS 5
+
+typedef struct VmaAIOCB VmaAIOCB;
+struct VmaAIOCB {
+ VmaWriter *vmaw;
+ unsigned char buffer[VMA_MAX_EXTENT_SIZE];
+ size_t bytes;
+ Coroutine *co;
+};
+
+struct VmaWriter {
+ int fd;
+ FILE *cmd;
+ int status;
+ char errmsg[8192];
+ uuid_t uuid;
+ bool header_written;
+ bool closed;
+
+ /* we always write extents */
+ unsigned char outbuf[VMA_MAX_EXTENT_SIZE];
+ int outbuf_pos; /* in bytes */
+ int outbuf_count; /* in VMA_BLOCKS */
+ uint64_t outbuf_block_info[VMA_BLOCKS_PER_EXTENT];
+
+ VmaAIOCB aiocbs[WRITE_BUFFERS];
+ CoQueue wqueue;
+
+ GChecksum *md5csum;
+ CoMutex writer_lock;
+ CoMutex flush_lock;
+ Coroutine *co_writer;
+ RateLimit limit;
+ uint64_t delay_ns;
+
+ /* drive informations */
+ VmaStreamInfo stream_info[256];
+ guint stream_count;
+
+ guint8 vmstate_stream;
+ uint32_t vmstate_clusters;
+
+ /* header blob table */
+ char *header_blob_table;
+ uint32_t header_blob_table_size;
+ uint32_t header_blob_table_pos;
+
+ /* store for config blobs */
+ uint32_t config_names[VMA_MAX_CONFIGS]; /* offset into blob_buffer table */
+ uint32_t config_data[VMA_MAX_CONFIGS]; /* offset into blob_buffer table */
+ uint32_t config_count;
+};
+
+void vma_writer_set_error(VmaWriter *vmaw, const char *fmt, ...)
+{
+ va_list ap;
+
+ if (vmaw->status < 0) {
+ return;
+ }
+
+ vmaw->status = -1;
+
+ va_start(ap, fmt);
+ g_vsnprintf(vmaw->errmsg, sizeof(vmaw->errmsg), fmt, ap);
+ va_end(ap);
+
+ DPRINTF("vma_writer_set_error: %s\n", vmaw->errmsg);
+}
+
+static uint32_t allocate_header_blob(VmaWriter *vmaw, const char *data,
+ size_t len)
+{
+ if (len > 65535) {
+ return 0;
+ }
+
+ if (!vmaw->header_blob_table ||
+ (vmaw->header_blob_table_size <
+ (vmaw->header_blob_table_pos + len + 2))) {
+ int newsize = vmaw->header_blob_table_size + ((len + 2 + 511)/512)*512;
+
+ vmaw->header_blob_table = g_realloc(vmaw->header_blob_table, newsize);
+ memset(vmaw->header_blob_table + vmaw->header_blob_table_size,
+ 0, newsize - vmaw->header_blob_table_size);
+ vmaw->header_blob_table_size = newsize;
+ }
+
+ uint32_t cpos = vmaw->header_blob_table_pos;
+ vmaw->header_blob_table[cpos] = len & 255;
+ vmaw->header_blob_table[cpos+1] = (len >> 8) & 255;
+ memcpy(vmaw->header_blob_table + cpos + 2, data, len);
+ vmaw->header_blob_table_pos += len + 2;
+ return cpos;
+}
+
+static uint32_t allocate_header_string(VmaWriter *vmaw, const char *str)
+{
+ assert(vmaw);
+
+ size_t len = strlen(str) + 1;
+
+ return allocate_header_blob(vmaw, str, len);
+}
+
+int vma_writer_add_config(VmaWriter *vmaw, const char *name, gpointer data,
+ gsize len)
+{
+ assert(vmaw);
+ assert(!vmaw->header_written);
+ assert(vmaw->config_count < VMA_MAX_CONFIGS);
+ assert(name);
+ assert(data);
+ assert(len);
+
+ uint32_t name_ptr = allocate_header_string(vmaw, name);
+ if (!name_ptr) {
+ return -1;
+ }
+
+ uint32_t data_ptr = allocate_header_blob(vmaw, data, len);
+ if (!data_ptr) {
+ return -1;
+ }
+
+ vmaw->config_names[vmaw->config_count] = name_ptr;
+ vmaw->config_data[vmaw->config_count] = data_ptr;
+
+ vmaw->config_count++;
+
+ return 0;
+}
+
+int vma_writer_register_stream(VmaWriter *vmaw, const char *devname,
+ size_t size)
+{
+ assert(vmaw);
+ assert(devname);
+ assert(!vmaw->status);
+
+ if (vmaw->header_written) {
+ vma_writer_set_error(vmaw, "vma_writer_register_stream: header "
+ "already written");
+ return -1;
+ }
+
+ guint n = vmaw->stream_count + 1;
+
+ /* we can have dev_ids form 1 to 255 (0 reserved)
+ * 255(-1) reseverd for safety
+ */
+ if (n > 254) {
+ vma_writer_set_error(vmaw, "vma_writer_register_stream: "
+ "too many drives");
+ return -1;
+ }
+
+ if (size <= 0) {
+ vma_writer_set_error(vmaw, "vma_writer_register_stream: "
+ "got strange size %zd", size);
+ return -1;
+ }
+
+ DPRINTF("vma_writer_register_stream %s %zu %d\n", devname, size, n);
+
+ vmaw->stream_info[n].devname = g_strdup(devname);
+ vmaw->stream_info[n].size = size;
+
+ vmaw->stream_info[n].cluster_count = (size + VMA_CLUSTER_SIZE - 1) /
+ VMA_CLUSTER_SIZE;
+
+ vmaw->stream_count = n;
+
+ if (strcmp(devname, "vmstate") == 0) {
+ vmaw->vmstate_stream = n;
+ }
+
+ return n;
+}
+
+static void vma_co_continue_write(void *opaque)
+{
+ VmaWriter *vmaw = opaque;
+
+ qemu_aio_set_fd_handler(vmaw->fd, NULL, NULL, NULL, NULL);
+
+ DPRINTF("vma_co_continue_write\n");
+ qemu_coroutine_enter(vmaw->co_writer, NULL);
+}
+
+static ssize_t coroutine_fn
+vma_co_write(VmaWriter *vmaw, const void *buf, size_t bytes)
+{
+ size_t done = 0;
+ ssize_t ret;
+
+ /* atomic writes (we cannot interleave writes) */
+ qemu_co_mutex_lock(&vmaw->writer_lock);
+
+ DPRINTF("vma_co_write enter %zd\n", bytes);
+
+ while (done < bytes) {
+ ret = write(vmaw->fd, buf + done, bytes - done);
+ if (ret > 0) {
+ done += ret;
+ DPRINTF("vma_co_write written %zd %zd\n", done, ret);
+ } else if (ret < 0) {
+ if (errno == EAGAIN || errno == EWOULDBLOCK) {
+ DPRINTF("vma_co_write yield %zd\n", done);
+
+ vmaw->co_writer = qemu_coroutine_self();
+ qemu_aio_set_fd_handler(vmaw->fd, NULL, vma_co_continue_write,
+ NULL, vmaw);
+
+ qemu_coroutine_yield();
+ DPRINTF("vma_co_write restart %zd\n", done);
+ } else {
+ vma_writer_set_error(vmaw, "vma_co_write write error - %s",
+ strerror(errno));
+ done = -1; /* always return failure for partial writes */
+ break;
+ }
+ } else if (ret == 0) {
+ /* should not happen - simply try again */
+ }
+ }
+
+ qemu_co_mutex_unlock(&vmaw->writer_lock);
+
+ DPRINTF("vma_co_write leave %zd\n", done);
+ return done;
+}
+
+static void coroutine_fn vma_co_writer_task(void *opaque)
+{
+ VmaAIOCB *cb = opaque;
+
+ DPRINTF("vma_co_writer_task start\n");
+
+ int64_t done = vma_co_write(cb->vmaw, cb->buffer, cb->bytes);
+ DPRINTF("vma_co_writer_task write done %zd\n", done);
+
+ if (done != cb->bytes) {
+ DPRINTF("vma_co_writer_task failed write %zd %zd", cb->bytes, done);
+ vma_writer_set_error(cb->vmaw, "vma_co_writer_task failed write %zd",
+ done);
+ }
+
+ cb->bytes = 0;
+
+ qemu_co_queue_next(&cb->vmaw->wqueue);
+
+ DPRINTF("vma_co_writer_task end\n");
+}
+
+static void coroutine_fn vma_queue_flush(VmaWriter *vmaw)
+{
+ DPRINTF("vma_queue_flush enter\n");
+
+ assert(vmaw);
+
+ while (1) {
+ int i;
+ VmaAIOCB *cb = NULL;
+ for (i = 0; i < WRITE_BUFFERS; i++) {
+ if (vmaw->aiocbs[i].bytes) {
+ cb = &vmaw->aiocbs[i];
+ DPRINTF("FOUND USED AIO BUFFER %d %zd\n", i,
+ vmaw->aiocbs[i].bytes);
+ break;
+ }
+ }
+ if (!cb) {
+ break;
+ }
+ qemu_co_queue_wait(&vmaw->wqueue);
+ }
+
+ DPRINTF("vma_queue_flush leave\n");
+}
+
+/**
+ * NOTE: pipe buffer size in only 4096 bytes on linux (see 'ulimit -a')
+ * So we need to create a coroutione to allow 'parallel' execution.
+ */
+static ssize_t coroutine_fn
+vma_queue_write(VmaWriter *vmaw, const void *buf, size_t bytes)
+{
+ DPRINTF("vma_queue_write enter %zd\n", bytes);
+
+ assert(vmaw);
+ assert(buf);
+ assert(bytes <= VMA_MAX_EXTENT_SIZE);
+
+ VmaAIOCB *cb = NULL;
+ while (!cb) {
+ int i;
+ for (i = 0; i < WRITE_BUFFERS; i++) {
+ if (!vmaw->aiocbs[i].bytes) {
+ cb = &vmaw->aiocbs[i];
+ break;
+ }
+ }
+ if (!cb) {
+ qemu_co_queue_wait(&vmaw->wqueue);
+ }
+ }
+
+ memcpy(cb->buffer, buf, bytes);
+ cb->bytes = bytes;
+ cb->vmaw = vmaw;
+
+ DPRINTF("vma_queue_write start %zd\n", bytes);
+ cb->co = qemu_coroutine_create(vma_co_writer_task);
+ qemu_coroutine_enter(cb->co, cb);
+
+ DPRINTF("vma_queue_write leave\n");
+
+ return bytes;
+}
+
+VmaWriter *vma_writer_create(const char *filename, uuid_t uuid, int64_t speed,
+ Error **errp)
+{
+ const char *p;
+
+ assert(sizeof(VmaHeader) == (4096 + 8192));
+ assert(sizeof(VmaExtentHeader) == 512);
+
+ VmaWriter *vmaw = g_new0(VmaWriter, 1);
+ vmaw->fd = -1;
+
+ vmaw->md5csum = g_checksum_new(G_CHECKSUM_MD5);
+ if (!vmaw->md5csum) {
+ error_setg(errp, "can't allocate cmsum\n");
+ goto err;
+ }
+
+ if (strstart(filename, "exec:", &p)) {
+ vmaw->cmd = popen(p, "w");
+ if (vmaw->cmd == NULL) {
+ error_setg(errp, "can't popen command '%s' - %s\n", p,
+ strerror(errno));
+ goto err;
+ }
+ vmaw->fd = fileno(vmaw->cmd);
+ socket_set_nonblock(vmaw->fd);
+
+ } else {
+ vmaw->fd = open(filename, O_NONBLOCK|O_WRONLY|O_CREAT|O_EXCL, 0644);
+ if (vmaw->fd < 0) {
+ error_setg(errp, "can't open file %s - %s\n", filename,
+ strerror(errno));
+ goto err;
+ }
+ }
+
+ vmaw->outbuf_count = 0;
+ vmaw->outbuf_pos = VMA_EXTENT_HEADER_SIZE;
+
+ vmaw->header_blob_table_pos = 1; /* start at pos 1 */
+
+ qemu_co_mutex_init(&vmaw->writer_lock);
+ qemu_co_mutex_init(&vmaw->flush_lock);
+ qemu_co_queue_init(&vmaw->wqueue);
+
+ uuid_copy(vmaw->uuid, uuid);
+
+ if (speed <= 0) {
+ speed = 10*1024*1024*1024LLU; /* default 10GB/s */
+ }
+
+ ratelimit_set_speed(&vmaw->limit, speed, 100000000ULL /* 0.1 sec */);
+
+ return vmaw;
+
+err:
+ if (vmaw) {
+ if (vmaw->cmd) {
+ pclose(vmaw->cmd);
+ } else if (vmaw->fd >= 0) {
+ close(vmaw->fd);
+ }
+
+ if (vmaw->md5csum) {
+ g_checksum_free(vmaw->md5csum);
+ }
+
+ g_free(vmaw);
+ }
+
+ return NULL;
+}
+
+static int coroutine_fn vma_write_header(VmaWriter *vmaw)
+{
+ assert(vmaw);
+ int header_clusters = 8;
+ char buf[65536*header_clusters];
+ VmaHeader *head = (VmaHeader *)buf;
+
+ int i;
+
+ DPRINTF("VMA WRITE HEADER\n");
+
+ if (vmaw->status < 0) {
+ return vmaw->status;
+ }
+
+ memset(buf, 0, sizeof(buf));
+
+ head->magic = VMA_MAGIC;
+ head->version = GUINT32_TO_BE(1); /* v1 */
+ memcpy(head->uuid, vmaw->uuid, 16);
+
+ time_t ctime = time(NULL);
+ head->ctime = GUINT64_TO_BE(ctime);
+
+ if (!vmaw->stream_count) {
+ return -1;
+ }
+
+ for (i = 0; i < VMA_MAX_CONFIGS; i++) {
+ head->config_names[i] = GUINT32_TO_BE(vmaw->config_names[i]);
+ head->config_data[i] = GUINT32_TO_BE(vmaw->config_data[i]);
+ }
+
+ /* 32 bytes per device (12 used currently) = 8192 bytes max */
+ for (i = 1; i <= 254; i++) {
+ VmaStreamInfo *si = &vmaw->stream_info[i];
+ if (si->size) {
+ assert(si->devname);
+ uint32_t devname_ptr = allocate_header_string(vmaw, si->devname);
+ if (!devname_ptr) {
+ return -1;
+ }
+ head->dev_info[i].devname_ptr = GUINT32_TO_BE(devname_ptr);
+ head->dev_info[i].size = GUINT64_TO_BE(si->size);
+ }
+ }
+
+ uint32_t header_size = sizeof(VmaHeader) + vmaw->header_blob_table_size;
+ head->header_size = GUINT32_TO_BE(header_size);
+
+ if (header_size > sizeof(buf)) {
+ return -1; /* just to be sure */
+ }
+
+ uint32_t blob_buffer_offset = sizeof(VmaHeader);
+ memcpy(buf + blob_buffer_offset, vmaw->header_blob_table,
+ vmaw->header_blob_table_size);
+ head->blob_buffer_offset = GUINT32_TO_BE(blob_buffer_offset);
+ head->blob_buffer_size = GUINT32_TO_BE(vmaw->header_blob_table_pos);
+
+ g_checksum_reset(vmaw->md5csum);
+ g_checksum_update(vmaw->md5csum, (const guchar *)buf, header_size);
+ gsize csize = 16;
+ g_checksum_get_digest(vmaw->md5csum, (guint8 *)(head->md5sum), &csize);
+
+ return vma_queue_write(vmaw, buf, header_size);
+}
+
+static int coroutine_fn vma_writer_flush(VmaWriter *vmaw)
+{
+ assert(vmaw);
+
+ int ret;
+ int i;
+
+ if (vmaw->status < 0) {
+ return vmaw->status;
+ }
+
+ if (!vmaw->header_written) {
+ vmaw->header_written = true;
+ ret = vma_write_header(vmaw);
+ if (ret < 0) {
+ vma_writer_set_error(vmaw, "vma_writer_flush: write header failed");
+ return ret;
+ }
+ }
+
+ DPRINTF("VMA WRITE FLUSH %d %d\n", vmaw->outbuf_count, vmaw->outbuf_pos);
+
+
+ VmaExtentHeader *ehead = (VmaExtentHeader *)vmaw->outbuf;
+
+ ehead->magic = VMA_EXTENT_MAGIC;
+ ehead->reserved1 = 0;
+
+ for (i = 0; i < VMA_BLOCKS_PER_EXTENT; i++) {
+ ehead->blockinfo[i] = GUINT64_TO_BE(vmaw->outbuf_block_info[i]);
+ }
+
+ guint16 block_count = (vmaw->outbuf_pos - VMA_EXTENT_HEADER_SIZE) /
+ VMA_BLOCK_SIZE;
+
+ ehead->block_count = GUINT16_TO_BE(block_count);
+
+ memcpy(ehead->uuid, vmaw->uuid, sizeof(ehead->uuid));
+ memset(ehead->md5sum, 0, sizeof(ehead->md5sum));
+
+ g_checksum_reset(vmaw->md5csum);
+ g_checksum_update(vmaw->md5csum, vmaw->outbuf, VMA_EXTENT_HEADER_SIZE);
+ gsize csize = 16;
+ g_checksum_get_digest(vmaw->md5csum, ehead->md5sum, &csize);
+
+ int bytes = vmaw->outbuf_pos;
+ ret = vma_queue_write(vmaw, vmaw->outbuf, bytes);
+ if (ret != bytes) {
+ vma_writer_set_error(vmaw, "vma_writer_flush: failed write");
+ }
+
+ vmaw->outbuf_count = 0;
+ vmaw->outbuf_pos = VMA_EXTENT_HEADER_SIZE;
+
+ for (i = 0; i < VMA_BLOCKS_PER_EXTENT; i++) {
+ vmaw->outbuf_block_info[i] = 0;
+ }
+
+ return vmaw->status;
+}
+
+static int vma_count_open_streams(VmaWriter *vmaw)
+{
+ g_assert(vmaw != NULL);
+
+ int i;
+ int open_drives = 0;
+ for (i = 0; i <= 255; i++) {
+ if (vmaw->stream_info[i].size && !vmaw->stream_info[i].finished) {
+ open_drives++;
+ }
+ }
+
+ return open_drives;
+}
+
+/**
+ * all jobs should call this when there is no more data
+ * Returns: number of remaining stream (0 ==> finished)
+ */
+int coroutine_fn
+vma_writer_close_stream(VmaWriter *vmaw, uint8_t dev_id)
+{
+ g_assert(vmaw != NULL);
+
+ DPRINTF("vma_writer_set_status %d\n", dev_id);
+ if (!vmaw->stream_info[dev_id].size) {
+ vma_writer_set_error(vmaw, "vma_writer_close_stream: "
+ "no such stream %d", dev_id);
+ return -1;
+ }
+ if (vmaw->stream_info[dev_id].finished) {
+ vma_writer_set_error(vmaw, "vma_writer_close_stream: "
+ "stream already closed %d", dev_id);
+ return -1;
+ }
+
+ vmaw->stream_info[dev_id].finished = true;
+
+ int open_drives = vma_count_open_streams(vmaw);
+
+ if (open_drives <= 0) {
+ DPRINTF("vma_writer_set_status all drives completed\n");
+ qemu_co_mutex_lock(&vmaw->flush_lock);
+ int ret = vma_writer_flush(vmaw);
+ qemu_co_mutex_unlock(&vmaw->flush_lock);
+ if (ret < 0) {
+ vma_writer_set_error(vmaw, "vma_writer_close_stream: flush failed");
+ }
+ }
+
+ return open_drives;
+}
+
+int vma_writer_get_status(VmaWriter *vmaw, VmaStatus *status)
+{
+ int i;
+
+ g_assert(vmaw != NULL);
+
+ if (status) {
+ status->status = vmaw->status;
+ g_strlcpy(status->errmsg, vmaw->errmsg, sizeof(status->errmsg));
+ for (i = 0; i <= 255; i++) {
+ status->stream_info[i] = vmaw->stream_info[i];
+ }
+
+ uuid_unparse_lower(vmaw->uuid, status->uuid_str);
+ }
+
+ status->closed = vmaw->closed;
+
+ return vmaw->status;
+}
+
+static int vma_writer_get_buffer(VmaWriter *vmaw)
+{
+
+ /* wait until buffer is available */
+ while (vmaw->outbuf_count >= (VMA_BLOCKS_PER_EXTENT - 1)) {
+ int res = 0;
+
+ qemu_co_mutex_lock(&vmaw->flush_lock);
+ res = vma_writer_flush(vmaw);
+ qemu_co_mutex_unlock(&vmaw->flush_lock);
+
+ if (res < 0) {
+ vma_writer_set_error(vmaw, "vma_writer_get_buffer: flush failed");
+ return -1;
+ }
+ }
+
+ return 0;
+}
+
+
+int64_t coroutine_fn
+vma_writer_write(VmaWriter *vmaw, uint8_t dev_id, int64_t cluster_num,
+ unsigned char *buf, size_t *zero_bytes)
+{
+ g_assert(vmaw != NULL);
+ g_assert(zero_bytes != NULL);
+
+ *zero_bytes = 0;
+
+ if (vmaw->status < 0) {
+ return vmaw->status;
+ }
+
+ if (!dev_id || !vmaw->stream_info[dev_id].size) {
+ vma_writer_set_error(vmaw, "vma_writer_write: "
+ "no such stream %d", dev_id);
+ return -1;
+ }
+
+ if (vmaw->stream_info[dev_id].finished) {
+ vma_writer_set_error(vmaw, "vma_writer_write: "
+ "stream already closed %d", dev_id);
+ return -1;
+ }
+
+
+ if (cluster_num >= (((uint64_t)1)<<32)) {
+ vma_writer_set_error(vmaw, "vma_writer_write: "
+ "cluster number out of range");
+ return -1;
+ }
+
+ if (dev_id == vmaw->vmstate_stream) {
+ if (cluster_num != vmaw->vmstate_clusters) {
+ vma_writer_set_error(vmaw, "vma_writer_write: "
+ "non sequential vmstate write");
+ }
+ vmaw->vmstate_clusters++;
+ } else if (cluster_num >= vmaw->stream_info[dev_id].cluster_count) {
+ vma_writer_set_error(vmaw, "vma_writer_write: cluster number too big");
+ return -1;
+ }
+
+ /* wait until buffer is available */
+ if (vma_writer_get_buffer(vmaw) < 0) {
+ vma_writer_set_error(vmaw, "vma_writer_write: "
+ "vma_writer_get_buffer failed");
+ return -1;
+ }
+
+ DPRINTF("VMA WRITE %zd\n", cluster_num);
+
+ int i;
+ int bit = 1;
+ uint16_t mask = 0;
+ for (i = 0; i < 16; i++) {
+ unsigned char *vmablock = buf + (i*VMA_BLOCK_SIZE);
+ if (buffer_is_zero(vmablock, VMA_BLOCK_SIZE)) {
+ DPRINTF("VMA WRITE %zd ZERO BLOCK %d\n", cluster_num, i);
+ vmaw->stream_info[dev_id].zero_bytes += VMA_BLOCK_SIZE;
+ *zero_bytes += VMA_BLOCK_SIZE;
+ } else {
+ mask |= bit;
+ memcpy(vmaw->outbuf + vmaw->outbuf_pos, vmablock, VMA_BLOCK_SIZE);
+ vmaw->outbuf_pos += VMA_BLOCK_SIZE;
+
+ vmaw->delay_ns = ratelimit_calculate_delay(&vmaw->limit,
+ VMA_BLOCK_SIZE);
+ if (vmaw->delay_ns) {
+ co_sleep_ns(rt_clock, vmaw->delay_ns);
+ }
+ }
+
+ bit = bit << 1;
+ }
+
+ uint64_t block_info = ((uint64_t)mask) << (32+16);
+ block_info |= ((uint64_t)dev_id) << 32;
+ block_info |= (cluster_num & 0xffffffff);
+ vmaw->outbuf_block_info[vmaw->outbuf_count] = block_info;
+
+ DPRINTF("VMA WRITE MASK %zd %zx\n", cluster_num, block_info);
+
+ vmaw->outbuf_count++;
+
+ /** NOTE: We allways write whole clusters, but we correctly set
+ * transferred bytes. So transferred == size when when everything
+ * went OK.
+ */
+ size_t transferred = VMA_CLUSTER_SIZE;
+
+ if (dev_id != vmaw->vmstate_stream) {
+ uint64_t last = (cluster_num + 1) * VMA_CLUSTER_SIZE;
+ if (last > vmaw->stream_info[dev_id].size) {
+ uint64_t diff = last - vmaw->stream_info[dev_id].size;
+ if (diff >= VMA_CLUSTER_SIZE) {
+ vma_writer_set_error(vmaw, "vma_writer_write: "
+ "read after last cluster");
+ return -1;
+ }
+ transferred -= diff;
+ }
+ }
+
+ vmaw->stream_info[dev_id].transferred += transferred;
+
+ return transferred;
+}
+
+int vma_writer_close(VmaWriter *vmaw, Error **errp)
+{
+ g_assert(vmaw != NULL);
+
+ int i;
+
+ vma_queue_flush(vmaw);
+
+ /* this should not happen - just to be sure */
+ while (!qemu_co_queue_empty(&vmaw->wqueue)) {
+ DPRINTF("vma_writer_close wait\n");
+ co_sleep_ns(rt_clock, 1000000);
+ }
+
+ if (vmaw->cmd) {
+ if (pclose(vmaw->cmd) < 0) {
+ vma_writer_set_error(vmaw, "vma_writer_close: "
+ "pclose failed - %s", strerror(errno));
+ }
+ } else {
+ if (close(vmaw->fd) < 0) {
+ vma_writer_set_error(vmaw, "vma_writer_close: "
+ "close failed - %s", strerror(errno));
+ }
+ }
+
+ for (i = 0; i <= 255; i++) {
+ VmaStreamInfo *si = &vmaw->stream_info[i];
+ if (si->size) {
+ if (!si->finished) {
+ vma_writer_set_error(vmaw, "vma_writer_close: "
+ "detected open stream '%s'", si->devname);
+ } else if ((si->transferred != si->size) &&
+ (i != vmaw->vmstate_stream)) {
+ vma_writer_set_error(vmaw, "vma_writer_close: "
+ "incomplete stream '%s' (%zd != %zd)",
+ si->devname, si->transferred, si->size);
+ }
+ }
+ }
+
+ for (i = 0; i <= 255; i++) {
+ vmaw->stream_info[i].finished = 1; /* mark as closed */
+ }
+
+ vmaw->closed = 1;
+
+ if (vmaw->status < 0 && *errp == NULL) {
+ error_setg(errp, "%s", vmaw->errmsg);
+ }
+
+ return vmaw->status;
+}
+
+void vma_writer_destroy(VmaWriter *vmaw)
+{
+ assert(vmaw);
+
+ int i;
+
+ for (i = 0; i <= 255; i++) {
+ if (vmaw->stream_info[i].devname) {
+ g_free(vmaw->stream_info[i].devname);
+ }
+ }
+
+ if (vmaw->md5csum) {
+ g_checksum_free(vmaw->md5csum);
+ }
+
+ g_free(vmaw);
+}
+
+/* backup driver plugin */
+
+static int vma_dump_cb(void *opaque, uint8_t dev_id, int64_t cluster_num,
+ unsigned char *buf, size_t *zero_bytes)
+{
+ VmaWriter *vmaw = opaque;
+
+ return vma_writer_write(vmaw, dev_id, cluster_num, buf, zero_bytes);
+}
+
+static int vma_close_cb(void *opaque, Error **errp)
+{
+ VmaWriter *vmaw = opaque;
+
+ int res = vma_writer_close(vmaw, errp);
+ vma_writer_destroy(vmaw);
+
+ return res;
+}
+
+static int vma_complete_cb(void *opaque, uint8_t dev_id, int ret)
+{
+ VmaWriter *vmaw = opaque;
+
+ if (ret < 0) {
+ vma_writer_set_error(vmaw, "backup_complete_cb %d", ret);
+ }
+
+ return vma_writer_close_stream(vmaw, dev_id);
+}
+
+static void vma_cancel_cb(void *opaque)
+{
+ VmaWriter *vmaw = opaque;
+
+ vma_writer_set_error(vmaw, "backup canceled");
+}
+
+static int vma_register_stream_cb(void *opaque, const char *devname,
+ size_t size)
+{
+ VmaWriter *vmaw = opaque;
+
+ return vma_writer_register_stream(vmaw, devname, size);
+}
+
+static int vma_register_config_cb(void *opaque, const char *name,
+ gpointer data, size_t data_len)
+{
+ VmaWriter *vmaw = opaque;
+
+ return vma_writer_add_config(vmaw, name, data, data_len);
+}
+
+static void *vma_open_cb(const char *filename, uuid_t uuid, int64_t speed,
+ Error **errp)
+{
+ return vma_writer_create(filename, uuid, speed, errp);
+}
+
+const BackupDriver backup_vma_driver = {
+ .format = "vma",
+ .open_cb = vma_open_cb,
+ .close_cb = vma_close_cb,
+ .register_config_cb = vma_register_config_cb,
+ .register_stream_cb = vma_register_stream_cb,
+ .dump_cb = vma_dump_cb,
+ .cancel_cb = vma_cancel_cb,
+ .complete_cb = vma_complete_cb,
+};
+
diff --git a/vma.c b/vma.c
new file mode 100644
index 0000000..69af80c
--- /dev/null
+++ b/vma.c
@@ -0,0 +1,550 @@
+/*
+ * VMA: Virtual Machine Archive
+ *
+ * Copyright (C) 2012 Proxmox Server Solutions
+ *
+ * Authors:
+ * Dietmar Maurer (dietmar@proxmox.com)
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include <stdio.h>
+#include <errno.h>
+#include <unistd.h>
+#include <stdio.h>
+#include <string.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+#include <glib.h>
+
+#include "qemu-common.h"
+#include "qemu-option.h"
+#include "qemu-error.h"
+#include "osdep.h"
+#include "sysemu.h"
+#include "block_int.h"
+#include <stdio.h>
+#include "vma.h"
+
+static void help(void)
+{
+ const char *help_msg =
+ "usage: vma command [command options]\n"
+ "\n"
+ "vma list <filename>\n"
+ "vma create <filename> [-c config] <archive> pathname ...\n"
+ "vma extract <filename> [-r] <targetdir>\n"
+ ;
+
+ printf("%s", help_msg);
+ exit(1);
+}
+
+static const char *extract_devname(const char *path, char **devname, int index)
+{
+ assert(path);
+
+ const char *sep = strchr(path, '=');
+
+ if (sep) {
+ *devname = g_strndup(path, sep - path);
+ path = sep + 1;
+ } else {
+ if (index >= 0) {
+ *devname = g_strdup_printf("disk%d", index);
+ } else {
+ *devname = NULL;
+ }
+ }
+
+ return path;
+}
+
+static void print_content(VmaReader *vmar)
+{
+ assert(vmar);
+
+ VmaHeader *head = vma_reader_get_header(vmar);
+
+ printf("CTIME: %s", ctime(&head->ctime));
+
+ GList *l = vma_reader_get_config_data(vmar);
+ while (l && l->data) {
+ VmaConfigData *cdata = (VmaConfigData *)l->data;
+ l = g_list_next(l);
+ printf("CFG: size: %d name: %s\n", cdata->len, cdata->name);
+ }
+
+ int i;
+ VmaDeviceInfo *di;
+ for (i = 1; i < 255; i++) {
+ di = vma_reader_get_device_info(vmar, i);
+ if (di) {
+ if (strcmp(di->devname, "vmstate") == 0) {
+ printf("VMSTATE: dev_id=%d memory: %zd\n", i, di->size);
+ } else {
+ printf("DEV: dev_id=%d size: %zd devname: %s\n",
+ i, di->size, di->devname);
+ }
+ }
+ }
+}
+
+static int list_content(int argc, char **argv)
+{
+ int c, ret = 0;
+ const char *filename;
+
+ for (;;) {
+ c = getopt(argc, argv, "h");
+ if (c == -1) {
+ break;
+ }
+ switch (c) {
+ case '?':
+ case 'h':
+ help();
+ break;
+ default:
+ g_assert_not_reached();
+ }
+ }
+
+ /* Get the filename */
+ if ((optind + 1) != argc) {
+ help();
+ }
+ filename = argv[optind++];
+
+ Error *errp = NULL;
+ VmaReader *vmar = vma_reader_create(filename, &errp);
+
+ if (!vmar) {
+ g_error("%s", error_get_pretty(errp));
+ }
+
+ print_content(vmar);
+
+ vma_reader_destroy(vmar);
+
+ return ret;
+}
+
+typedef struct RestoreMap {
+ char *devname;
+ char *path;
+ bool write_zero;
+} RestoreMap;
+
+static int extract_content(int argc, char **argv)
+{
+ int c, ret = 0;
+ const char *filename;
+ const char *dirname;
+ int readmap = 0;
+
+ for (;;) {
+ c = getopt(argc, argv, "hr");
+ if (c == -1) {
+ break;
+ }
+ switch (c) {
+ case '?':
+ case 'h':
+ help();
+ break;
+ case 'r':
+ readmap = 1;
+ break;
+ default:
+ help();
+ }
+ }
+
+ /* Get the filename */
+ if ((optind + 2) != argc) {
+ help();
+ }
+ filename = argv[optind++];
+ dirname = argv[optind++];
+
+ Error *errp = NULL;
+ VmaReader *vmar = vma_reader_create(filename, &errp);
+
+ if (!vmar) {
+ g_error("%s", error_get_pretty(errp));
+ }
+
+ if (mkdir(dirname, 0777) < 0) {
+ g_error("unable to create target directory %s - %s",
+ dirname, strerror(errno));
+ }
+
+ GList *l = vma_reader_get_config_data(vmar);
+ while (l && l->data) {
+ VmaConfigData *cdata = (VmaConfigData *)l->data;
+ l = g_list_next(l);
+ char *cfgfn = g_strdup_printf("%s/%s", dirname, cdata->name);
+ GError *err = NULL;
+ if (!g_file_set_contents(cfgfn, (gchar *)cdata->data, cdata->len,
+ &err)) {
+ g_error("Unable to write file: %s", err->message);
+ }
+ }
+
+ GHashTable *devmap = g_hash_table_new(g_str_hash, g_str_equal);
+
+ if (readmap) {
+ print_content(vmar);
+
+ while (1) {
+ char inbuf[8192];
+ char *line = fgets(inbuf, sizeof(inbuf), stdin);
+ if (!line || line[0] == '\0' || !strcmp(line, "done\n")) {
+ break;
+ }
+ int len = strlen(line);
+ if (line[len - 1] == '\n') {
+ line[len - 1] = '\0';
+ if (len == 1) {
+ break;
+ }
+ }
+
+ const char *path;
+ bool write_zero;
+ if (line[0] == '0' && line[1] == ':') {
+ path = inbuf + 2;
+ write_zero = false;
+ } else if (line[0] == '1' && line[1] == ':') {
+ path = inbuf + 2;
+ write_zero = true;
+ } else {
+ g_error("read map failed - parse error ('%s')", inbuf);
+ }
+
+ char *devname = NULL;
+ path = extract_devname(path, &devname, -1);
+ if (!devname) {
+ g_error("read map failed - no dev name specified ('%s')",
+ inbuf);
+ }
+
+ printf("TEST %s %s\n", path, devname);
+
+ RestoreMap *map = g_new0(RestoreMap, 1);
+ map->devname = g_strdup(devname);
+ map->path = g_strdup(path);
+ map->write_zero = write_zero;
+
+ g_hash_table_insert(devmap, map->devname, map);
+
+ };
+ }
+
+ int i;
+ int vmstate_fd = -1;
+ guint8 vmstate_stream = 0;
+
+ for (i = 1; i < 255; i++) {
+ VmaDeviceInfo *di = vma_reader_get_device_info(vmar, i);
+ if (di && (strcmp(di->devname, "vmstate") == 0)) {
+ vmstate_stream = i;
+ char *statefn = g_strdup_printf("%s/vmstate.bin", dirname);
+ vmstate_fd = open(statefn, O_WRONLY|O_CREAT|O_EXCL, 0644);
+ if (vmstate_fd < 0) {
+ g_error("create vmstate file '%s' failed - %s", statefn,
+ strerror(errno));
+ }
+ g_free(statefn);
+ } else if (di) {
+ char *devfn = NULL;
+ int flags = BDRV_O_RDWR|BDRV_O_CACHE_WB;
+ bool write_zero = true;
+
+ if (readmap) {
+ RestoreMap *map;
+ map = (RestoreMap *)g_hash_table_lookup(devmap, di->devname);
+ if (map == NULL) {
+ g_error("no device name mapping for %s", di->devname);
+ }
+ devfn = map->path;
+ write_zero = map->write_zero;
+ } else {
+ devfn = g_strdup_printf("%s/tmp-disk-%s.raw",
+ dirname, di->devname);
+ printf("DEVINFO %s %zd\n", devfn, di->size);
+
+ if (bdrv_img_create(devfn, "raw", NULL, NULL, NULL,
+ di->size, flags)) {
+ g_error("can't create file %s", devfn);
+ }
+
+ /* Note: we created an empty file above, so there is no
+ * need to write zeroes (so we generate a sparse file)
+ */
+ write_zero = false;
+ }
+
+ BlockDriverState *bs = NULL;
+ if (bdrv_file_open(&bs, devfn, flags)) {
+ g_error("can't open file %s", devfn);
+ }
+ if (vma_reader_register_bs(vmar, i, bs, write_zero, &errp) < 0) {
+ g_error("%s", error_get_pretty(errp));
+ }
+
+ if (!readmap) {
+ g_free(devfn);
+ }
+ }
+ }
+
+ if (vma_reader_restore(vmar, vmstate_fd, &errp) < 0) {
+ g_error("restore failed - %s", error_get_pretty(errp));
+ }
+
+ if (!readmap) {
+ for (i = 1; i < 255; i++) {
+ VmaDeviceInfo *di = vma_reader_get_device_info(vmar, i);
+ if (di && (i != vmstate_stream)) {
+ char *tmpfn = g_strdup_printf("%s/tmp-disk-%s.raw",
+ dirname, di->devname);
+ char *fn = g_strdup_printf("%s/disk-%s.raw",
+ dirname, di->devname);
+ if (rename(tmpfn, fn) != 0) {
+ g_error("rename %s to %s failed - %s",
+ tmpfn, fn, strerror(errno));
+ }
+ }
+ }
+ }
+
+ vma_reader_destroy(vmar);
+
+ bdrv_close_all();
+
+ return ret;
+}
+
+typedef struct BackupCB {
+ VmaWriter *vmaw;
+ uint8_t dev_id;
+} BackupCB;
+
+static int backup_dump_cb(void *opaque, BlockDriverState *bs,
+ int64_t cluster_num, unsigned char *buf)
+{
+ BackupCB *bcb = opaque;
+ size_t zb = 0;
+ if (vma_writer_write(bcb->vmaw, bcb->dev_id, cluster_num, buf, &zb) < 0) {
+ g_warning("backup_dump_cb vma_writer_write failed");
+ return -1;
+ }
+
+ return 0;
+}
+
+static void backup_complete_cb(void *opaque, int ret)
+{
+ BackupCB *bcb = opaque;
+
+ if (ret < 0) {
+ vma_writer_set_error(bcb->vmaw, "backup_complete_cb %d", ret);
+ }
+
+ if (vma_writer_close_stream(bcb->vmaw, bcb->dev_id) <= 0) {
+ Error *err = NULL;
+ if (vma_writer_close(bcb->vmaw, &err) != 0) {
+ g_warning("vma_writer_close failed %s", error_get_pretty(err));
+ }
+ }
+}
+
+static int create_archive(int argc, char **argv)
+{
+ int i, c, res;
+ int verbose = 0;
+ const char *archivename;
+ GList *config_files = NULL;
+
+ for (;;) {
+ c = getopt(argc, argv, "hvc:");
+ if (c == -1) {
+ break;
+ }
+ switch (c) {
+ case '?':
+ case 'h':
+ help();
+ break;
+ case 'c':
+ config_files = g_list_append(config_files, optarg);
+ break;
+ case 'v':
+ verbose = 1;
+ break;
+ default:
+ g_assert_not_reached();
+ }
+ }
+
+
+ /* make sure we have archive name and at least one path */
+ if ((optind + 2) > argc) {
+ help();
+ }
+
+ archivename = argv[optind++];
+
+ uuid_t uuid;
+ uuid_generate(uuid);
+
+ Error *local_err = NULL;
+ VmaWriter *vmaw = vma_writer_create(archivename, uuid, 0, &local_err);
+
+ if (vmaw == NULL) {
+ g_error("%s", error_get_pretty(local_err));
+ }
+
+ GList *l = config_files;
+ while (l && l->data) {
+ char *name = l->data;
+ char *cdata = NULL;
+ gsize clen = 0;
+ GError *err = NULL;
+ if (!g_file_get_contents(name, &cdata, &clen, &err)) {
+ unlink(archivename);
+ g_error("Unable to read file: %s", err->message);
+ }
+
+ if (vma_writer_add_config(vmaw, name, cdata, clen) != 0) {
+ unlink(archivename);
+ g_error("Unable to append config data %s (len = %zd)",
+ name, clen);
+ }
+ l = g_list_next(l);
+ }
+
+ int ind = 0;
+ while (optind < argc) {
+ const char *path = argv[optind++];
+ char *devname = NULL;
+ path = extract_devname(path, &devname, ind++);
+
+ BlockDriver *drv = NULL;
+ BlockDriverState *bs = bdrv_new(devname);
+
+ res = bdrv_open(bs, path, BDRV_O_CACHE_WB , drv);
+ if (res < 0) {
+ unlink(archivename);
+ g_error("bdrv_open '%s' failed", path);
+ }
+ int64_t size = bdrv_getlength(bs);
+ int dev_id = vma_writer_register_stream(vmaw, devname, size);
+ if (dev_id <= 0) {
+ unlink(archivename);
+ g_error("vma_writer_register_stream '%s' failed", devname);
+ }
+
+ BackupCB *bcb = g_new0(BackupCB, 1);
+ bcb->vmaw = vmaw;
+ bcb->dev_id = dev_id;
+
+ if (backup_job_start(bs, backup_dump_cb, backup_complete_cb, bcb) < 0) {
+ unlink(archivename);
+ g_error("backup_job_start failed");
+ }
+ }
+
+ VmaStatus vmastat;
+ int percent = 0;
+ int last_percent = -1;
+
+ while (1) {
+ main_loop_wait(false);
+ vma_writer_get_status(vmaw, &vmastat);
+
+ if (verbose) {
+
+ uint64_t total = 0;
+ uint64_t transferred = 0;
+ uint64_t zero_bytes = 0;
+
+ int i;
+ for (i = 0; i < 256; i++) {
+ if (vmastat.stream_info[i].size) {
+ total += vmastat.stream_info[i].size;
+ transferred += vmastat.stream_info[i].transferred;
+ zero_bytes += vmastat.stream_info[i].zero_bytes;
+ }
+ }
+ percent = (transferred*100)/total;
+ if (percent != last_percent) {
+ printf("progress %d%% %zd/%zd %zd\n", percent,
+ transferred, total, zero_bytes);
+
+ last_percent = percent;
+ }
+ }
+
+ if (vmastat.closed) {
+ break;
+ }
+ }
+
+ bdrv_drain_all();
+
+ vma_writer_get_status(vmaw, &vmastat);
+
+ if (verbose) {
+ for (i = 0; i < 256; i++) {
+ VmaStreamInfo *si = &vmastat.stream_info[i];
+ if (si->size) {
+ printf("image %s: size=%zd zeros=%zd saved=%zd\n", si->devname,
+ si->size, si->zero_bytes, si->size - si->zero_bytes);
+ }
+ }
+ }
+
+ if (vmastat.status < 0) {
+ unlink(archivename);
+ g_error("creating vma archive failed");
+ }
+
+ return 0;
+}
+
+int main(int argc, char **argv)
+{
+ const char *cmdname;
+
+ error_set_progname(argv[0]);
+
+ qemu_init_main_loop();
+
+ bdrv_init();
+
+ if (argc < 2) {
+ help();
+ }
+
+ cmdname = argv[1];
+ argc--; argv++;
+
+
+ if (!strcmp(cmdname, "list")) {
+ return list_content(argc, argv);
+ } else if (!strcmp(cmdname, "create")) {
+ return create_archive(argc, argv);
+ } else if (!strcmp(cmdname, "extract")) {
+ return extract_content(argc, argv);
+ }
+
+ help();
+ return 0;
+}
diff --git a/vma.h b/vma.h
new file mode 100644
index 0000000..10800a1
--- /dev/null
+++ b/vma.h
@@ -0,0 +1,145 @@
+/*
+ * VMA: Virtual Machine Archive
+ *
+ * Copyright (C) Proxmox Server Solutions
+ *
+ * Authors:
+ * Dietmar Maurer (dietmar@proxmox.com)
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#ifndef BACKUP_VMA_H
+#define BACKUP_VMA_H
+
+#include "backup.h"
+#include "error.h"
+
+#define VMA_BLOCK_BITS 12
+#define VMA_BLOCK_SIZE (1<<VMA_BLOCK_BITS)
+#define VMA_CLUSTER_BITS (VMA_BLOCK_BITS+4)
+#define VMA_CLUSTER_SIZE (1<<VMA_CLUSTER_BITS)
+
+#if VMA_CLUSTER_SIZE != 65536
+#error unexpected cluster size
+#endif
+
+#define VMA_EXTENT_HEADER_SIZE 512
+#define VMA_BLOCKS_PER_EXTENT 59
+#define VMA_MAX_CONFIGS 256
+
+#define VMA_MAX_EXTENT_SIZE \
+ (VMA_EXTENT_HEADER_SIZE+VMA_CLUSTER_SIZE*VMA_BLOCKS_PER_EXTENT)
+#if VMA_MAX_EXTENT_SIZE != 3867136
+#error unexpected VMA_EXTENT_SIZE
+#endif
+
+/* File Format Definitions */
+
+#define VMA_MAGIC (GUINT32_TO_BE(('V'<<24)|('M'<<16)|('A'<<8)|0x00))
+#define VMA_EXTENT_MAGIC (GUINT32_TO_BE(('V'<<24)|('M'<<16)|('A'<<8)|'E'))
+
+typedef struct VmaDeviceInfoHeader {
+ uint32_t devname_ptr; /* offset into blob_buffer table */
+ uint32_t reserved0;
+ uint64_t size; /* device size in bytes */
+ uint64_t reserved1;
+ uint64_t reserved2;
+} VmaDeviceInfoHeader;
+
+typedef struct VmaHeader {
+ uint32_t magic;
+ uint32_t version;
+ unsigned char uuid[16];
+ int64_t ctime;
+ unsigned char md5sum[16];
+
+ uint32_t blob_buffer_offset;
+ uint32_t blob_buffer_size;
+ uint32_t header_size;
+
+ unsigned char reserved[1984];
+
+ uint32_t config_names[VMA_MAX_CONFIGS]; /* offset into blob_buffer table */
+ uint32_t config_data[VMA_MAX_CONFIGS]; /* offset into blob_buffer table */
+
+ VmaDeviceInfoHeader dev_info[256];
+} VmaHeader;
+
+typedef struct VmaExtentHeader {
+ uint32_t magic;
+ uint16_t reserved1;
+ uint16_t block_count;
+ unsigned char uuid[16];
+ unsigned char md5sum[16];
+ uint64_t blockinfo[VMA_BLOCKS_PER_EXTENT];
+} VmaExtentHeader;
+
+/* functions/definitions to read/write vma files */
+
+typedef struct VmaReader VmaReader;
+
+typedef struct VmaWriter VmaWriter;
+
+typedef struct VmaConfigData {
+ const char *name;
+ const void *data;
+ uint32_t len;
+} VmaConfigData;
+
+typedef struct VmaStreamInfo {
+ uint64_t size;
+ uint64_t cluster_count;
+ uint64_t transferred;
+ uint64_t zero_bytes;
+ int finished;
+ char *devname;
+} VmaStreamInfo;
+
+typedef struct VmaStatus {
+ int status;
+ bool closed;
+ char errmsg[8192];
+ char uuid_str[37];
+ VmaStreamInfo stream_info[256];
+} VmaStatus;
+
+typedef struct VmaDeviceInfo {
+ uint64_t size; /* device size in bytes */
+ const char *devname;
+} VmaDeviceInfo;
+
+extern const BackupDriver backup_vma_driver;
+
+VmaWriter *vma_writer_create(const char *filename, uuid_t uuid, int64_t speed,
+ Error **errp);
+int vma_writer_close(VmaWriter *vmaw, Error **errp);
+void vma_writer_destroy(VmaWriter *vmaw);
+int vma_writer_add_config(VmaWriter *vmaw, const char *name, gpointer data,
+ size_t len);
+int vma_writer_register_stream(VmaWriter *vmaw, const char *devname,
+ size_t size);
+
+int64_t coroutine_fn vma_writer_write(VmaWriter *vmaw, uint8_t dev_id,
+ int64_t cluster_num, unsigned char *buf,
+ size_t *zero_bytes);
+
+int coroutine_fn vma_writer_close_stream(VmaWriter *vmaw, uint8_t dev_id);
+
+int vma_writer_get_status(VmaWriter *vmaw, VmaStatus *status);
+void vma_writer_set_error(VmaWriter *vmaw, const char *fmt, ...);
+
+
+VmaReader *vma_reader_create(const char *filename, Error **errp);
+void vma_reader_destroy(VmaReader *vmar);
+VmaHeader *vma_reader_get_header(VmaReader *vmar);
+GList *vma_reader_get_config_data(VmaReader *vmar);
+VmaDeviceInfo *vma_reader_get_device_info(VmaReader *vmar, guint8 dev_id);
+int vma_reader_register_bs(VmaReader *vmar, guint8 dev_id,
+ BlockDriverState *bs, bool write_zeroes,
+ Error **errp);
+int vma_reader_restore(VmaReader *vmar, int vmstate_fd, Error **errp);
+
+#endif /* BACKUP_VMA_H */
--
1.7.2.5
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [Qemu-devel] [PATCH v2 5/6] add regression tests for backup
2012-11-30 9:22 [Qemu-devel] [PATCH v2 1/6] RFC: Efficient VM backup for qemu Dietmar Maurer
` (2 preceding siblings ...)
2012-11-30 9:22 ` [Qemu-devel] [PATCH v2 4/6] introduce new vma archive format Dietmar Maurer
@ 2012-11-30 9:22 ` Dietmar Maurer
2012-11-30 9:22 ` [Qemu-devel] [PATCH v2 6/6] add vm state to backups Dietmar Maurer
4 siblings, 0 replies; 9+ messages in thread
From: Dietmar Maurer @ 2012-11-30 9:22 UTC (permalink / raw)
To: qemu-devel; +Cc: Dietmar Maurer
Also added the ratelimit.h fix here, because it is still not upstream.
Signed-off-by: Dietmar Maurer <dietmar@proxmox.com>
---
include/qemu/ratelimit.h | 2 +-
tests/Makefile | 11 +-
tests/backup-test.c | 511 ++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 521 insertions(+), 3 deletions(-)
create mode 100644 tests/backup-test.c
diff --git a/include/qemu/ratelimit.h b/include/qemu/ratelimit.h
index c6ac281..d1610f1 100644
--- a/include/qemu/ratelimit.h
+++ b/include/qemu/ratelimit.h
@@ -42,7 +42,7 @@ static inline void ratelimit_set_speed(RateLimit *limit, uint64_t speed,
uint64_t slice_ns)
{
limit->slice_ns = slice_ns;
- limit->slice_quota = ((double)speed * 1000000000ULL) / slice_ns;
+ limit->slice_quota = ((double)speed * slice_ns)/1000000000ULL;
}
#endif
diff --git a/tests/Makefile b/tests/Makefile
index b60f0fb..cffbd22 100644
--- a/tests/Makefile
+++ b/tests/Makefile
@@ -20,6 +20,8 @@ check-unit-y += tests/test-thread-pool$(EXESUF)
check-block-$(CONFIG_POSIX) += tests/qemu-iotests-quick.sh
+check-backup-y = tests/backup-test$(EXESUF)
+
# All QTests for now are POSIX-only, but the dependencies are
# really in libqtest, not in the testcases themselves.
check-qtest-i386-y = tests/fdc-test$(EXESUF)
@@ -54,6 +56,7 @@ tests/test-coroutine$(EXESUF): tests/test-coroutine.o $(coroutine-obj-y) $(tools
tests/test-aio$(EXESUF): tests/test-aio.o $(coroutine-obj-y) $(tools-obj-y) $(block-obj-y) libqemustub.a
tests/test-thread-pool$(EXESUF): tests/test-thread-pool.o $(coroutine-obj-y) $(tools-obj-y) $(block-obj-y) libqemustub.a
tests/test-iov$(EXESUF): tests/test-iov.o iov.o
+tests/backup-test$(EXESUF): tests/backup-test.o vma-reader.o $(tools-obj-y) $(block-obj-y) libqemustub.a
tests/test-qapi-types.c tests/test-qapi-types.h :\
$(SRC_PATH)/qapi-schema-test.json $(SRC_PATH)/scripts/qapi-types.py
@@ -146,10 +149,14 @@ check-tests/qemu-iotests-quick.sh: tests/qemu-iotests-quick.sh qemu-img$(EXESUF)
# Consolidated targets
-.PHONY: check-qtest check-unit check
+.PHONY: check-backup check-qtest check-unit check
check-qtest: $(patsubst %,check-qtest-%, $(QTEST_TARGETS))
check-unit: $(patsubst %,check-%, $(check-unit-y))
check-block: $(patsubst %,check-%, $(check-block-y))
-check: check-unit check-qtest
+
+check-backup: tests/backup-test$(EXESUF)
+ $<
+
+check: check-unit check-qtest check-backup
-include $(wildcard tests/*.d)
diff --git a/tests/backup-test.c b/tests/backup-test.c
new file mode 100644
index 0000000..c434c5f
--- /dev/null
+++ b/tests/backup-test.c
@@ -0,0 +1,511 @@
+/*
+ * QEMU backup test suit
+ *
+ * Copyright (C) Proxmox Server Solutions
+ *
+ * Authors:
+ * Dietmar Maurer (dietmar@proxmox.com)
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2. See
+ * the COPYING file in the top-level directory.
+ *
+ * Fixme: running 'backup-test -l' trigger a bug in g_slice_alloc()
+ * Note: 'G_SLICE=always-malloc ./tests/backup-test -l' works
+ *
+ */
+
+#include <sys/time.h>
+#include <sys/types.h>
+#include <stdarg.h>
+#include <stdio.h>
+#include <getopt.h>
+#include <libgen.h>
+
+#include "qemu-common.h"
+#include "main-loop.h"
+#include "block_int.h"
+
+#include "vma.h"
+
+static int opt_debug;
+static int opt_loop;
+
+#define DPRINTF(fmt, ...) \
+ do { if (opt_debug) { printf(fmt, ## __VA_ARGS__); } } while (0)
+
+#define CLUSTER(x) (x*BACKUP_CLUSTER_SIZE)
+
+#define RUN_TEST(testfunc, speed) \
+ backup_test(#testfunc " speed " #speed, speed, testfunc);
+
+
+static unsigned char buf_sec_pattern_cd[BDRV_SECTOR_SIZE];
+static unsigned char buf_sec_pattern_32[BDRV_SECTOR_SIZE];
+
+#define TEST_IMG_SIZE (6*1024*1024+BDRV_SECTOR_SIZE)
+#define TEST_IMG_NAME "backuptest.raw"
+#define TEST_IMG_RESTORE_NAME "backuptest.raw.restore"
+#define TEST_VMA_NAME "backuptest.vma"
+
+typedef struct BackupCB {
+ VmaWriter *vmaw;
+ uint8_t dev_id;
+} BackupCB;
+
+static int backup_dump_cb(void *opaque, BlockDriverState *bs,
+ int64_t cluster_num, unsigned char *buf)
+{
+ BackupCB *bcb = opaque;
+
+ DPRINTF("backup_dump_cb C%zd %d\n", cluster_num, bcb->dev_id);
+
+ size_t zb = 0;
+ if (vma_writer_write(bcb->vmaw, bcb->dev_id, cluster_num, buf, &zb) < 0) {
+ printf("backup_dump_cb vma_writer_write failed\n");
+ return -1;
+ }
+
+ return 0;
+}
+
+static void backup_complete_cb(void *opaque, int ret)
+{
+ BackupCB *bcb = opaque;
+
+ DPRINTF("backup_complete_cb %d %d\n", bcb->dev_id, ret);
+
+ if (ret < 0) {
+ vma_writer_set_error(bcb->vmaw, "backup_complete_cb %d", ret);
+ }
+
+ if (vma_writer_close_stream(bcb->vmaw, bcb->dev_id) <= 0) {
+ Error *err = NULL;
+ if (vma_writer_close(bcb->vmaw, &err) != 0) {
+ g_error("vma_writer_close failed %s", error_get_pretty(err));
+ }
+ }
+ DPRINTF("backup_complete_cb finish\n");
+}
+
+static void write_sec_pattern_cd(BlockDriverState *bs, int64_t offset)
+{
+ int ret;
+
+ DPRINTF("write_sec_pattern_cd %zd\n", offset);
+
+ if (offset & 0x1ff) {
+ g_error("write_sec_pattern_cd offset %zd is not sector aligned\n",
+ offset);
+ }
+
+ ret = bdrv_write(bs, offset >> 9, buf_sec_pattern_cd, 1);
+ if (ret < 0) {
+ g_error("write_sec_pattern_cd %zd failed", offset);
+ }
+
+}
+
+static void read_sec(BlockDriverState *bs, int64_t offset, unsigned char *buf)
+{
+ DPRINTF("read_sec C%zd start %zd\n", offset>>VMA_CLUSTER_BITS, offset);
+
+ if (offset & 0x1ff) {
+ g_error("read_sec offset %zd is not sector aligned\n", offset);
+ }
+
+ if (bdrv_read(bs, offset >> 9, buf, 1) < 0) {
+ g_error("bdrv_read failed");
+ }
+}
+
+static bool request_term;
+
+typedef struct TestCB {
+ Coroutine *co;
+ BlockDriverState *bs;
+ bool finished;
+} TestCB;
+
+static TestCB *enter_test_co(BlockDriverState *bs, CoroutineEntry *entry)
+{
+ TestCB *cb = g_new0(TestCB, 1);
+ cb->bs = bs;
+ cb->co = qemu_coroutine_create(entry);
+ qemu_coroutine_enter(cb->co, cb);
+ return cb;
+}
+
+static void test_co_sleep(double sec)
+{
+ co_sleep_ns(rt_clock, (int64_t)(sec*1000000000));
+};
+
+static void test_co_yield(void)
+{
+ co_sleep_ns(rt_clock, (int64_t)(1000));
+};
+
+static void coroutine_fn run_co_test1(void *opaque)
+{
+ assert(opaque);
+ TestCB *cb = (TestCB *)opaque;
+
+ test_co_sleep(0.2);
+ write_sec_pattern_cd(cb->bs, 5*BACKUP_CLUSTER_SIZE);
+ test_co_sleep(0.2);
+ write_sec_pattern_cd(cb->bs, 10*BACKUP_CLUSTER_SIZE);
+ test_co_sleep(0.2);
+ write_sec_pattern_cd(cb->bs, 10*BACKUP_CLUSTER_SIZE);
+
+ cb->finished = true;
+}
+
+static void coroutine_fn run_co_test2(void *opaque)
+{
+ assert(opaque);
+ TestCB *cb = (TestCB *)opaque;
+ unsigned char buf[512];
+
+ test_co_sleep(0.2);
+ read_sec(cb->bs, 5*BACKUP_CLUSTER_SIZE, buf);
+ write_sec_pattern_cd(cb->bs, 6*BACKUP_CLUSTER_SIZE);
+
+ cb->finished = true;
+}
+
+static void coroutine_fn run_co_random_read(void *opaque)
+{
+ assert(opaque);
+ TestCB *cb = (TestCB *)opaque;
+ int64_t sectors = bdrv_getlength(cb->bs)/BDRV_SECTOR_SIZE - 1;
+ unsigned char buf[512];
+
+ while (1) {
+ test_co_yield();
+ if (request_term) {
+ DPRINTF("finish run_co_random_read\n");
+ break;
+ }
+ int64_t s = (rand()*sectors)/RAND_MAX;
+ read_sec(cb->bs, s*BDRV_SECTOR_SIZE, buf);
+ }
+
+ cb->finished = true;
+}
+
+static void coroutine_fn run_co_random_write(void *opaque)
+{
+ assert(opaque);
+ TestCB *cb = (TestCB *)opaque;
+ int64_t sectors = bdrv_getlength(cb->bs)/BDRV_SECTOR_SIZE;
+
+ while (1) {
+ test_co_yield();
+ if (request_term) {
+ DPRINTF("finish run_co_random_write\n");
+ break;
+ }
+ int64_t s = (rand()*sectors)/RAND_MAX;
+ write_sec_pattern_cd(cb->bs, s*BDRV_SECTOR_SIZE);
+ }
+
+ cb->finished = true;
+}
+
+static void fill_test_sector(void *buf, size_t sector_num)
+{
+ int64_t *i64buf = (int64_t *)buf;
+ int i;
+
+ int data = sector_num;
+ if (sector_num >= 8 && sector_num < 8*(2*16+2)) {
+ data = 0; /* add zero region for testing */
+ }
+
+ for (i = 0; i < (512/sizeof(int64_t)); i++) {
+ i64buf[i] = data;
+ }
+}
+
+static void verify_archive(const char *archive, size_t size)
+{
+ Error *errp = NULL;
+
+ VmaReader *vmar = vma_reader_create(archive, &errp);
+
+ if (!vmar) {
+ g_error("%s", error_get_pretty(errp));
+ }
+
+ VmaDeviceInfo *di = vma_reader_get_device_info(vmar, 1);
+ if (!di || strcmp((char *)di->devname, "hda") || di->size != size) {
+ g_error("got wrong device info");
+ }
+
+ unlink(TEST_IMG_RESTORE_NAME);
+
+ int flags = BDRV_O_NATIVE_AIO|BDRV_O_RDWR|BDRV_O_CACHE_WB;
+
+ if (bdrv_img_create(TEST_IMG_RESTORE_NAME, "raw", NULL, NULL, NULL,
+ size, flags)) {
+ g_error("can't create file %s", TEST_IMG_RESTORE_NAME);
+ }
+
+ BlockDriverState *bs = NULL;
+ if (bdrv_file_open(&bs, TEST_IMG_RESTORE_NAME, flags)) {
+ g_error("can't open file %s", TEST_IMG_RESTORE_NAME);
+ }
+ if (vma_reader_register_bs(vmar, 1, bs, false, &errp) < 0) {
+ g_error("%s", error_get_pretty(errp));
+ }
+
+ if (vma_reader_restore(vmar, -1, &errp) < 0) {
+ g_error("restore failed - %s", error_get_pretty(errp));
+ }
+
+ size_t i;
+ size_t sectors = size/BDRV_SECTOR_SIZE;
+ int64_t buf[512/sizeof(int64_t)];
+ int64_t buf2[512/sizeof(int64_t)];
+
+ for (i = 0; i < sectors; i++) {
+ if (bdrv_read(bs, i, (uint8_t *)buf, 1) != 0) {
+ g_error("bdrv_read failed");
+ }
+ fill_test_sector(buf2, i);
+ if (bcmp(buf, buf2, sizeof(buf))) {
+ g_error("data is different at sector %zd", i);
+ }
+ }
+
+ vma_reader_destroy(vmar);
+
+ unlink(TEST_IMG_RESTORE_NAME);
+}
+
+static void prepare_vm_image(const char *filename, size_t sectors)
+{
+ int fd = open(filename, O_RDWR|O_CREAT|O_TRUNC, 0644);
+ if (fd < 0) {
+ g_error("can't open file %s\n", filename);
+ }
+
+ size_t i;
+ int64_t buf[512/sizeof(int64_t)];
+
+ for (i = 0; i < sectors; i++) {
+ fill_test_sector(buf, i);
+
+ int res = 0;
+ while (1) {
+ res = write(fd, buf, sizeof(buf));
+ if (!(res < 0 && errno == EINTR)) {
+ break;
+ }
+ }
+ if (res != sizeof(buf)) {
+ g_error("can't initialize file %s - %s %d\n",
+ filename, strerror(errno), res);
+ }
+ }
+
+ if (close(fd) != 0) {
+ g_error("close failed");
+ }
+}
+
+static GList *simple_test(BlockDriverState *bs)
+{
+ GList *cb_list = NULL;
+
+ cb_list = g_list_append(cb_list, enter_test_co(bs, run_co_test1));
+ cb_list = g_list_append(cb_list, enter_test_co(bs, run_co_test2));
+
+ return cb_list;
+}
+
+static GList *random_read_write_test(BlockDriverState *bs)
+{
+ GList *cb_list = NULL;
+
+ cb_list = g_list_append(cb_list, enter_test_co(bs, run_co_random_read));
+ cb_list = g_list_append(cb_list, enter_test_co(bs, run_co_random_read));
+ cb_list = g_list_append(cb_list, enter_test_co(bs, run_co_random_write));
+ cb_list = g_list_append(cb_list, enter_test_co(bs, run_co_random_write));
+
+ return cb_list;
+}
+
+static void backup_test(const char *testname, int64_t speed,
+ GList *(*start_test_cb)(BlockDriverState *bs))
+{
+ BlockDriverState *bs = bdrv_new("hda");
+
+ static int test_counter;
+
+ test_counter++;
+
+ printf("starting test #%d '%s'\n", test_counter, testname);
+
+ const char *filename = TEST_IMG_NAME;
+
+ prepare_vm_image(TEST_IMG_NAME, TEST_IMG_SIZE/BDRV_SECTOR_SIZE);
+
+ int flags = BDRV_O_NATIVE_AIO|BDRV_O_RDWR|BDRV_O_CACHE_WB;
+
+ if (bdrv_open(bs, filename, flags, NULL) < 0) {
+ g_error("can't open device %s\n", filename);
+ }
+
+ Error *err = NULL;
+ uuid_t uuid;
+ uuid_generate(uuid);
+
+ unlink(TEST_VMA_NAME);
+
+ VmaWriter *vmaw = vma_writer_create(TEST_VMA_NAME, uuid, speed, &err);
+ if (!vmaw) {
+ g_error("%s", error_get_pretty(err));
+ }
+
+ BackupCB bcb;
+ bcb.vmaw = vmaw;
+ bcb.dev_id = vma_writer_register_stream(vmaw, bdrv_get_device_name(bs),
+ bdrv_getlength(bs));
+ if (backup_job_start(bs, backup_dump_cb, backup_complete_cb, &bcb) < 0) {
+ g_error("backup_job_start failed");
+ }
+
+ request_term = false;
+
+ GList *cb_list = start_test_cb(bs);
+
+ while (1) {
+ main_loop_wait(false);
+
+ VmaStatus vmastat;
+ vma_writer_get_status(vmaw, &vmastat);
+ if (vmastat.closed) {
+ break;
+ }
+ }
+
+ request_term = true;
+
+ while (1) {
+ GList *l = cb_list;
+ bool active = 0;
+ while (l && l->data) {
+ TestCB *cb = (TestCB *)l->data;
+ l = g_list_next(l);
+ if (!cb->finished) {
+ active = true;
+ break;
+ }
+ }
+ if (!active) {
+ DPRINTF("All test coroutines finished\n");
+ break;
+ }
+ main_loop_wait(false);
+ }
+
+ /* Make sure all outstanding requests complete */
+ bdrv_drain_all();
+
+ VmaStatus vmastat;
+ vma_writer_get_status(vmaw, &vmastat);
+ DPRINTF("statistic %zd %zd\n", vmastat.stream_info[1].size,
+ vmastat.stream_info[1].transferred);
+ assert(vmastat.stream_info[1].size == vmastat.stream_info[1].transferred);
+
+ vma_writer_destroy(vmaw);
+
+ bdrv_delete(bs);
+
+ /* start verification */
+ verify_archive(TEST_VMA_NAME, TEST_IMG_SIZE);
+
+ bdrv_close_all();
+
+ unlink(TEST_IMG_NAME);
+ unlink(TEST_VMA_NAME);
+
+ printf("finish test #%d '%s' OK\n", test_counter, testname);
+}
+
+static void help(void)
+{
+ const char *help_msg =
+ "usage: backup-test [options]\n"
+ "\n"
+ "backup-test run default regression test (fast)\n"
+ "backup-test -l run long running test loop (endless)\n"
+ "\n"
+ "use option -d to turn on verbose debug output\n"
+ ;
+
+ printf("%s", help_msg);
+ exit(1);
+}
+
+int main(int argc, char **argv)
+{
+ int c;
+
+ for (;;) {
+ c = getopt(argc, argv, "hdl");
+ if (c == -1) {
+ break;
+ }
+ switch (c) {
+ case '?':
+ case 'h':
+ help();
+ break;
+ case 'd':
+ opt_debug = 1;
+ break;
+ case 'l':
+ opt_loop = 1;
+ break;
+ default:
+ g_assert_not_reached();
+ }
+ }
+
+ memset(buf_sec_pattern_cd, 0xcd, sizeof(buf_sec_pattern_cd));
+ memset(buf_sec_pattern_32, 0x32, sizeof(buf_sec_pattern_32));
+
+ srand(1234);
+
+ /* ignore SIGPIPE */
+ struct sigaction act;
+ sigfillset(&act.sa_mask);
+ act.sa_flags = 0;
+ act.sa_handler = SIG_IGN;
+ sigaction(SIGPIPE, &act, NULL);
+
+ qemu_init_main_loop();
+
+ bdrv_init();
+
+ if (opt_loop) { /* endless test loop */
+ while (1) {
+ RUN_TEST(random_read_write_test, 0);
+ }
+ return 0;
+ }
+
+ if (opt_debug) { /* run simple test (rate limited) */
+ RUN_TEST(simple_test, 1024*1024);
+ return 0;
+ }
+
+ /* run default regression tests at full speed */
+
+ RUN_TEST(simple_test, 0);
+ RUN_TEST(random_read_write_test, 0);
+
+ return 0;
+}
--
1.7.2.5
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [Qemu-devel] [PATCH v2 6/6] add vm state to backups
2012-11-30 9:22 [Qemu-devel] [PATCH v2 1/6] RFC: Efficient VM backup for qemu Dietmar Maurer
` (3 preceding siblings ...)
2012-11-30 9:22 ` [Qemu-devel] [PATCH v2 5/6] add regression tests for backup Dietmar Maurer
@ 2012-11-30 9:22 ` Dietmar Maurer
2012-12-03 10:13 ` Wenchao Xia
4 siblings, 1 reply; 9+ messages in thread
From: Dietmar Maurer @ 2012-11-30 9:22 UTC (permalink / raw)
To: qemu-devel; +Cc: Dietmar Maurer
Signed-off-by: Dietmar Maurer <dietmar@proxmox.com>
---
blockdev.c | 196 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
hmp.c | 2 +-
qapi-schema.json | 6 +-
3 files changed, 199 insertions(+), 5 deletions(-)
diff --git a/blockdev.c b/blockdev.c
index 07db561..2e2050a 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -22,6 +22,8 @@
#include "arch_init.h"
#include "backup.h"
#include "vma.h"
+#include "qemu-file.h"
+#include "migration.h"
static QTAILQ_HEAD(drivelist, DriveInfo) drives = QTAILQ_HEAD_INITIALIZER(drives);
@@ -1338,6 +1340,10 @@ static struct GenericBackupState {
size_t total;
size_t transferred;
size_t zero_bytes;
+ unsigned char buf[BACKUP_CLUSTER_SIZE];
+ int buf_index;
+ size_t buf_cluster_num;
+ guint8 vmstate_dev_id;
} backup_state;
typedef struct BackupCB {
@@ -1430,10 +1436,170 @@ static void backup_start_jobs(void)
}
}
+static int backup_state_close(void *opaque)
+{
+ if (!backup_state.buf_index) {
+ return 0;
+ }
+
+ size_t zero_bytes = 0;
+ if (backup_state.buf_index < BACKUP_CLUSTER_SIZE) {
+ memset(backup_state.buf + backup_state.buf_index, 0,
+ BACKUP_CLUSTER_SIZE - backup_state.buf_index);
+ }
+ int bytes = backup_state.driver->dump_cb(
+ backup_state.writer, backup_state.vmstate_dev_id,
+ backup_state.buf_cluster_num,
+ backup_state.buf, &zero_bytes);
+ backup_state.buf_index = 0;
+
+ return bytes < 0 ? -1 : 0;
+}
+
+static int backup_state_put_buffer(void *opaque, const uint8_t *buf,
+ int64_t pos, int size)
+{
+ assert(backup_state.driver);
+ assert(backup_state.writer);
+ assert(backup_state.driver->dump_cb);
+
+ /* Note: our backup driver expects to get whole clusters (64KB) */
+
+ int ret = size;
+
+ while (size > 0) {
+ int l = BACKUP_CLUSTER_SIZE - backup_state.buf_index;
+ l = l > size ? size : l;
+ memcpy(backup_state.buf + backup_state.buf_index, buf, l);
+ backup_state.buf_index += l;
+ buf += l;
+ size -= l;
+ if (backup_state.buf_index == BACKUP_CLUSTER_SIZE) {
+ size_t zero_bytes = 0;
+ int bytes = backup_state.driver->dump_cb(
+ backup_state.writer, backup_state.vmstate_dev_id,
+ backup_state.buf_cluster_num++,
+ backup_state.buf, &zero_bytes);
+ backup_state.buf_index = 0;
+ if (bytes < 0) {
+ return -1;
+ }
+ }
+ }
+
+ return ret;
+}
+
+static const QEMUFileOps backup_file_ops = {
+ .put_buffer = backup_state_put_buffer,
+ .close = backup_state_close,
+};
+
+static void coroutine_fn backup_start_savevm(void *opaque)
+{
+ assert(backup_state.driver);
+ assert(backup_state.writer);
+ int ret;
+ char *err = NULL;
+ uint64_t remaining;
+ int64_t maxlen;
+ MigrationParams params = {
+ .blk = 0,
+ .shared = 0
+ };
+
+ int restart = 0;
+
+ QEMUFile *file = qemu_fopen_ops(NULL, &backup_file_ops);
+
+ ret = qemu_savevm_state_begin(file, ¶ms);
+ if (ret < 0) {
+ qemu_fclose(file);
+ err = g_strdup("qemu_savevm_state_begin failed");
+ goto abort;
+ }
+
+ while (1) {
+ ret = qemu_savevm_state_iterate(file);
+ remaining = ram_bytes_remaining();
+
+ if (ret < 0) {
+ qemu_fclose(file);
+ err = g_strdup_printf("qemu_savevm_state_iterate error %d", ret);
+ goto abort;
+ }
+
+ /* stop the VM if we use too much space,
+ * or if remaining is just a few MB
+ */
+ maxlen = ram_bytes_total();
+ size_t cpos = backup_state.buf_cluster_num * BACKUP_CLUSTER_SIZE;
+ if ((remaining < 100000) || ((cpos + remaining) >= maxlen)) {
+ if (runstate_is_running()) {
+ restart = 1;
+ vm_stop(RUN_STATE_SAVE_VM);
+ }
+ }
+
+ if (ret == 1) { /* finished */
+ if (runstate_is_running()) {
+ restart = 1;
+ vm_stop(RUN_STATE_SAVE_VM);
+ }
+
+ ret = qemu_savevm_state_complete(file);
+ if (ret < 0) {
+ qemu_fclose(file);
+ err = g_strdup("qemu_savevm_state_complete error");
+ goto abort;
+
+ } else {
+ if (qemu_fclose(file) < 0) {
+ error_setg(&backup_state.error,
+ "backup_start_savevm: qemu_fclose failed");
+ goto abort;
+ }
+ if (backup_state.driver->complete_cb(backup_state.writer,
+ backup_state.vmstate_dev_id, 0) < 0) {
+ err = g_strdup("backup_start_savevm: complete_cb failed");
+ goto abort;
+ }
+ backup_start_jobs();
+ goto out;
+ }
+ }
+ }
+
+out:
+ if (restart) {
+ vm_start();
+ }
+ return;
+
+abort:
+ backup_state.end_time = time(NULL);
+
+ Error *local_err = NULL;
+ backup_state.driver->close_cb(backup_state.writer, &local_err);
+ backup_state.writer = NULL;
+
+ error_propagate(&backup_state.error, local_err);
+
+ if (err) {
+ if (!backup_state.error) {
+ error_setg(&backup_state.error, "%s", err);
+ }
+ g_free(err);
+ }
+
+ goto out;
+}
+
char *qmp_backup(const char *backupfile, bool has_format, const char *format,
bool has_config_filename, const char *config_filename,
bool has_devlist, const char *devlist,
- bool has_speed, int64_t speed, Error **errp)
+ bool has_speed, int64_t speed,
+ bool has_state, bool state, Error **errp)
{
BlockDriverState *bs;
Error *local_err = NULL;
@@ -1442,6 +1608,8 @@ char *qmp_backup(const char *backupfile, bool has_format, const char *format,
gchar **devs = NULL;
GList *bcblist = NULL;
+ bool save_state = has_state ? state : false;
+
/* Todo: try to auto-detect format based on file name */
format = has_format ? format : "vma";
@@ -1523,6 +1691,22 @@ char *qmp_backup(const char *backupfile, bool has_format, const char *format,
size_t total = 0;
/* register all devices for vma writer */
+
+ guint8 vmstate_dev_id = 0;
+ if (save_state) {
+ /* Note: we pass ram_bytes_total() for vmstate size
+ * The backup driver needs to be aware of the fact
+ * that the real stream size can be different (we do
+ * not know that size in advance).
+ */
+ size_t ramsize = ram_bytes_total();
+ vmstate_dev_id = driver->register_stream_cb(writer, "vmstate", ramsize);
+ if (vmstate_dev_id <= 0) {
+ error_setg(errp, "register vmstate stream failed");
+ goto err;
+ }
+ }
+
l = bcblist;
while (l) {
BackupCB *bcb = l->data;
@@ -1586,8 +1770,16 @@ char *qmp_backup(const char *backupfile, bool has_format, const char *format,
backup_state.total = total;
backup_state.transferred = 0;
backup_state.zero_bytes = 0;
+ backup_state.buf_index = 0;
+ backup_state.buf_cluster_num = 0;
+ backup_state.vmstate_dev_id = vmstate_dev_id;
- backup_start_jobs();
+ if (save_state) {
+ Coroutine *co = qemu_coroutine_create(backup_start_savevm);
+ qemu_coroutine_enter(co, NULL);
+ } else {
+ backup_start_jobs();
+ }
return g_strdup(backup_state.uuid_str);
diff --git a/hmp.c b/hmp.c
index 600792f..e92bba9 100644
--- a/hmp.c
+++ b/hmp.c
@@ -1031,7 +1031,7 @@ void hmp_backup(Monitor *mon, const QDict *qdict)
Error *errp = NULL;
qmp_backup(backupfile, false, NULL, false, NULL, !!devlist, devlist,
- qdict_haskey(qdict, "speed"), speed, &errp);
+ qdict_haskey(qdict, "speed"), speed, false, false, &errp);
if (error_is_set(&errp)) {
monitor_printf(mon, "%s\n", error_get_pretty(errp));
diff --git a/qapi-schema.json b/qapi-schema.json
index 71e0ed9..2d7ecaa 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1812,13 +1812,15 @@
#
# @speed: #optional the maximum speed, in bytes per second
#
+# @state: #optional flag to include vm state
+#
# Returns: the uuid of the backup job
#
# Since: 1.4.0
##
{ 'command': 'backup', 'data': { 'backupfile': 'str', '*format': 'str',
- '*config-filename': 'str',
- '*devlist': 'str', '*speed': 'int' },
+ '*config-filename': 'str', '*devlist': 'str',
+ '*speed': 'int', '*state': 'bool' },
'returns': 'str' }
##
--
1.7.2.5
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] [PATCH v2 6/6] add vm state to backups
2012-11-30 9:22 ` [Qemu-devel] [PATCH v2 6/6] add vm state to backups Dietmar Maurer
@ 2012-12-03 10:13 ` Wenchao Xia
0 siblings, 0 replies; 9+ messages in thread
From: Wenchao Xia @ 2012-12-03 10:13 UTC (permalink / raw)
To: Dietmar Maurer; +Cc: qemu-devel
> +static void coroutine_fn backup_start_savevm(void *opaque)
> +{
> + assert(backup_state.driver);
> + assert(backup_state.writer);
> + int ret;
> + char *err = NULL;
> + uint64_t remaining;
> + int64_t maxlen;
> + MigrationParams params = {
> + .blk = 0,
> + .shared = 0
> + };
> +
> + int restart = 0;
> +
> + QEMUFile *file = qemu_fopen_ops(NULL, &backup_file_ops);
> +
> + ret = qemu_savevm_state_begin(file, ¶ms);
> + if (ret < 0) {
> + qemu_fclose(file);
> + err = g_strdup("qemu_savevm_state_begin failed");
> + goto abort;
> + }
> +
> + while (1) {
> + ret = qemu_savevm_state_iterate(file);
> + remaining = ram_bytes_remaining();
> +
> + if (ret < 0) {
> + qemu_fclose(file);
> + err = g_strdup_printf("qemu_savevm_state_iterate error %d", ret);
> + goto abort;
> + }
> +
> + /* stop the VM if we use too much space,
> + * or if remaining is just a few MB
> + */
> + maxlen = ram_bytes_total();
> + size_t cpos = backup_state.buf_cluster_num * BACKUP_CLUSTER_SIZE;
> + if ((remaining < 100000) || ((cpos + remaining) >= maxlen)) {
> + if (runstate_is_running()) {
> + restart = 1;
> + vm_stop(RUN_STATE_SAVE_VM);
> + }
> + }
> +
> + if (ret == 1) { /* finished */
> + if (runstate_is_running()) {
> + restart = 1;
> + vm_stop(RUN_STATE_SAVE_VM);
> + }
> +
> + ret = qemu_savevm_state_complete(file);
> + if (ret < 0) {
> + qemu_fclose(file);
> + err = g_strdup("qemu_savevm_state_complete error");
> + goto abort;
> +
> + } else {
> + if (qemu_fclose(file) < 0) {
> + error_setg(&backup_state.error,
> + "backup_start_savevm: qemu_fclose failed");
> + goto abort;
> + }
> + if (backup_state.driver->complete_cb(backup_state.writer,
> + backup_state.vmstate_dev_id, 0) < 0) {
> + err = g_strdup("backup_start_savevm: complete_cb failed");
> + goto abort;
> + }
> + backup_start_jobs();
backup_start_jobs() was called after qemu_savevm_state_complete(),
and then VM got resumed with block backup jobs alone, would this cause
saved vm state not sync with the saved block contents, if there is
vmstate changing after backup_start_jobs()?
> + goto out;
> + }
> + }
> + }
--
Best Regards
Wenchao Xia
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] [PATCH v2 3/6] add backup related monitor commands
2012-11-30 9:22 ` [Qemu-devel] [PATCH v2 3/6] add backup related monitor commands Dietmar Maurer
@ 2012-12-12 17:47 ` Eric Blake
2012-12-12 19:33 ` Dietmar Maurer
0 siblings, 1 reply; 9+ messages in thread
From: Eric Blake @ 2012-12-12 17:47 UTC (permalink / raw)
To: Dietmar Maurer; +Cc: qemu-devel
[-- Attachment #1: Type: text/plain, Size: 3821 bytes --]
On 11/30/2012 02:22 AM, Dietmar Maurer wrote:
> We use a generic BackupDriver struct to encaplulated all archive format
s/encaplulated/encapsulated/
> related function.
>
> Another option would be to simply dump <devid,cluster_num,cluster_data> to
> the output fh (pipe), and an external binary saves the data. That way we
> could move the whole archive format related code out of qemu.
I'm coming into this review late, so I'm not sure what your series is
adding that cannot already be done with external snapshots and migration
to file. But any time someone proposes new QMP commands that libvirt
might have to interact with, I try to chime in:
>
> Signed-off-by: Dietmar Maurer <dietmar@proxmox.com>
> ---
> backup.h | 14 +++
> blockdev.c | 337 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
> hmp-commands.hx | 31 +++++
> hmp.c | 63 ++++++++++
> hmp.h | 3 +
> monitor.c | 7 +
> qapi-schema.json | 81 +++++++++++++
> qmp-commands.hx | 27 +++++
> 8 files changed, 563 insertions(+), 0 deletions(-)
>
> +++ b/qapi-schema.json
> @@ -358,6 +358,39 @@
> { 'type': 'EventInfo', 'data': {'name': 'str'} }
>
> ##
> +# @BackupStatus:
> +#
> +# Detailed backup status.
> +#
> +# @status: #optional string describing the current backup status.
> +# Tthis can be 'active', 'done', 'error'. If this field is not
s/Tthis/This/
> +# returned, no backup process has been initiated
> +#
> +# @errmsg: #optional error message (only returned if status is 'error')
> +#
> +# @total: #optional total amount of bytes involved in the backup process
> +#
> +# @transferred: #optional amount of bytes already backed up.
> +#
> +# @zero-bytes: #optional amount of 'zero' bytes detected.
> +#
> +# @start-time: #optional time (epoch) when backup job started.
Should you account for sub-second resolution here
> +#
> +# @end-time: #optional time (epoch) when backup job finished.
and here?
> +#
> +# @backupfile: #optional backup file name
> +#
> +# @uuid: #optional uuid for this backup job
> +#
> +# Since: 1.4.0
> +##
> +{ 'type': 'BackupStatus',
> + 'data': {'*status': 'str', '*errmsg': 'str', '*total': 'int',
> + '*transferred': 'int', '*zero-bytes': 'int',
> + '*start-time': 'int', '*end-time': 'int',
> + '*backupfile': 'str', '*uuid': 'str' } }
> +
> +##
> # @query-events:
> #
> # Return a list of supported QMP events by this server
> @@ -1764,6 +1797,54 @@
> 'data': { 'path': 'str' },
> 'returns': [ 'ObjectPropertyInfo' ] }
>
> +
> +##
> +# @backup:
> +#
> +# Starts a VM backup.
> +#
> +# @backupfile: the backup file name
I guess the fdset mechanism is how libvirt would pass in a pipe fd
rather than supplying an actual file name. Does your implementation
allow for pipes, or must it be seekable?
> +#
> +# @format: format of the backup file
Rather than letting this be a free-form string, it would be nicer as an
enum of actually supported formats.
> +#
> +# @config-filename: #optional name of a configuration file to include into
> +# the backup archive.
> +#
> +# @speed: #optional the maximum speed, in bytes per second
> +#
> +# Returns: the uuid of the backup job
UUID in raw byte format, or in ASCII format? (Assuming the latter)
> +#
> +# Since: 1.4.0
> +##
> +{ 'command': 'backup', 'data': { 'backupfile': 'str', '*format': 'str',
backupfile sounds like a run-on; is it any better to name it backup-file?
> + '*config-filename': 'str',
especially when compared with config-filename
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 619 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] [PATCH v2 3/6] add backup related monitor commands
2012-12-12 17:47 ` Eric Blake
@ 2012-12-12 19:33 ` Dietmar Maurer
0 siblings, 0 replies; 9+ messages in thread
From: Dietmar Maurer @ 2012-12-12 19:33 UTC (permalink / raw)
To: Eric Blake; +Cc: qemu-devel@nongnu.org
> I'm coming into this review late, so I'm not sure what your series is adding
> that cannot already be done with external snapshots and migration to file.
This series try to do backup more efficient (see docu in [PATCH v2 1/6]).
> But any time someone proposes new QMP commands that libvirt might have
> to interact with, I try to chime in:
libvirt does not need to use any of those commands.
> > +# returned, no backup process has been initiated
> > +#
> > +# @errmsg: #optional error message (only returned if status is
> > +'error') # # @total: #optional total amount of bytes involved in the
> > +backup process # # @transferred: #optional amount of bytes already
> > +backed up.
> > +#
> > +# @zero-bytes: #optional amount of 'zero' bytes detected.
> > +#
> > +# @start-time: #optional time (epoch) when backup job started.
>
> Should you account for sub-second resolution here
Why (I doubt we can backup within a second).
> > +# @backupfile: the backup file name
>
> I guess the fdset mechanism is how libvirt would pass in a pipe fd rather than
> supplying an actual file name. Does your implementation allow for pipes, or
> must it be seekable?
You can pass /dev/fdset/XXX.
My implementation allows pipes.
(I will post the necessary changes for /dev/fdset/ next time).
> > +# @format: format of the backup file
>
> Rather than letting this be a free-form string, it would be nicer as an enum of
> actually supported formats.
ok
> > +# @config-filename: #optional name of a configuration file to include
> > +into # the backup archive.
> > +#
> > +# @speed: #optional the maximum speed, in bytes per second # #
> > +Returns: the uuid of the backup job
>
> UUID in raw byte format, or in ASCII format? (Assuming the latter)
ASCII
> > +#
> > +# Since: 1.4.0
> > +##
> > +{ 'command': 'backup', 'data': { 'backupfile': 'str', '*format':
> > +'str',
>
> backupfile sounds like a run-on; is it any better to name it backup-file?
>
> > + '*config-filename': 'str',
>
> especially when compared with config-filename
ok. Many thanks for the review.
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2012-12-12 19:33 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-11-30 9:22 [Qemu-devel] [PATCH v2 1/6] RFC: Efficient VM backup for qemu Dietmar Maurer
2012-11-30 9:22 ` [Qemu-devel] [PATCH v2 2/6] add basic backup support to block driver Dietmar Maurer
2012-11-30 9:22 ` [Qemu-devel] [PATCH v2 3/6] add backup related monitor commands Dietmar Maurer
2012-12-12 17:47 ` Eric Blake
2012-12-12 19:33 ` Dietmar Maurer
2012-11-30 9:22 ` [Qemu-devel] [PATCH v2 4/6] introduce new vma archive format Dietmar Maurer
2012-11-30 9:22 ` [Qemu-devel] [PATCH v2 5/6] add regression tests for backup Dietmar Maurer
2012-11-30 9:22 ` [Qemu-devel] [PATCH v2 6/6] add vm state to backups Dietmar Maurer
2012-12-03 10:13 ` Wenchao Xia
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).