From: Jeff Cody <jcody@redhat.com>
To: Kevin Wolf <kwolf@redhat.com>
Cc: pbonzini@redhat.com, eblake@redhat.com, qemu-devel@nongnu.org,
armbru@redhat.com, lcapitulino@redhat.com
Subject: Re: [Qemu-devel] [PATCH v2 1/2] qapi: Introduce blockdev-group-snapshot-sync command
Date: Mon, 27 Feb 2012 12:02:10 -0500 [thread overview]
Message-ID: <4F4BB712.8090005@redhat.com> (raw)
In-Reply-To: <4F4BA56B.7040608@redhat.com>
On 02/27/2012 10:46 AM, Kevin Wolf wrote:
> Am 26.02.2012 20:31, schrieb Jeff Cody:
>> This is a QAPI/QMP only command to take a snapshot of a group of
>> devices. This is similar to the blockdev-snapshot-sync command, except
>> blockdev-group-snapshot-sync accepts a list devices, filenames, and
>> formats.
>>
>> It is attempted to keep the snapshot of the group atomic; if the
>> creation or open of any of the new snapshots fails, then all of
>> the new snapshots are abandoned, and the name of the snapshot image
>> that failed is returned. The failure case should not interrupt
>> any operations.
>>
>> Rather than use bdrv_close() along with a subsequent bdrv_open() to
>> perform the pivot, the original image is never closed and the new
>> image is placed 'in front' of the original image via manipulation
>> of the BlockDriverState fields. Thus, once the new snapshot image
>> has been successfully created, there are no more failure points
>> before pivoting to the new snapshot.
>>
>> This allows the group of disks to remain consistent with each other,
>> even across snapshot failures.
>>
>> Signed-off-by: Jeff Cody <jcody@redhat.com>
>> ---
>> block.c | 47 ++++++++++++++++++++
>> block.h | 1 +
>> blockdev.c | 128 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> qapi-schema.json | 38 ++++++++++++++++
>> 4 files changed, 214 insertions(+), 0 deletions(-)
>>
>> diff --git a/block.c b/block.c
>> index 3621d11..0045ab1 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -880,6 +880,53 @@ void bdrv_make_anon(BlockDriverState *bs)
>> bs->device_name[0] = '\0';
>> }
>>
>> +/*
>> + * Add new bs contents at the top of an image chain while the chain is live,
>> + * while keeping required fields on the top layer.
>> + *
>> + * It is assumed that bs_new already points to an existing image,
>> + * with the correct backing filename of top->backing_file
>> + */
>> +void bdrv_append(BlockDriverState *bs_new, BlockDriverState *bs_top)
>> +{
>> + BlockDriverState tmp;
>> +
>> + /* the new bs must not be in bdrv_states */
>> + bdrv_make_anon(bs_new);
>> +
>> + tmp = *bs_new;
>> + tmp.backing_hd = bs_new;
>> +
>> + /* there are some fields that need to stay on the top layer: */
>> +
>> + /* dev info */
>> + tmp.dev_ops = bs_top->dev_ops;
>> + tmp.dev_opaque = bs_top->dev_opaque;
>> + tmp.dev = bs_top->dev;
>> + tmp.buffer_alignment = bs_top->buffer_alignment;
>> + tmp.copy_on_read = bs_top->copy_on_read;
>> +
>> + /* i/o timing parameters */
>> + tmp.slice_time = bs_top->slice_time;
>> + tmp.slice_start = bs_top->slice_start;
>> + tmp.slice_end = bs_top->slice_end;
>> + tmp.io_limits = bs_top->io_limits;
>> + tmp.io_base = bs_top->io_base;
>> + tmp.throttled_reqs = bs_top->throttled_reqs;
>> + tmp.block_timer = bs_top->block_timer;
>> + tmp.io_limits_enabled = bs_top->io_limits_enabled;
>
> What about iostatus_enabled/iostatus? Possibly on_read/write_error and
> geometry, too. (Maybe the correct answer is that you're doing the right
> thing, but I wanted to bring it up)
I think you are right, those should probably be preserved as well; I
don't know if it hurts to not preserve them, but since these are all set
from drive_init, I think it makes sense I think to keep them on the top
layer.
>
>> +
>> + /* keep the same entry in bdrv_states */
>> + pstrcpy(tmp.device_name, sizeof(tmp.device_name), bs_top->device_name);
>> + tmp.list = bs_top->list;
>> +
>> + /* swap contents of the fixed new bs and the current top */
>> + *bs_new = *bs_top;
>> + *bs_top = tmp;
>> +
>> + bdrv_detach_dev(bs_new, bs_new->dev);
>> +}
>
> The last line would actually deserve a comment /* clear the copied
> fields in the new backing file */, which makes clear that there are
> probably some more fields missing in this section.
OK, added.
>
>> +
>> void bdrv_delete(BlockDriverState *bs)
>> {
>> assert(!bs->dev);
>> diff --git a/block.h b/block.h
>> index cae289b..190a780 100644
>> --- a/block.h
>> +++ b/block.h
>> @@ -114,6 +114,7 @@ int bdrv_create(BlockDriver *drv, const char* filename,
>> int bdrv_create_file(const char* filename, QEMUOptionParameter *options);
>> BlockDriverState *bdrv_new(const char *device_name);
>> void bdrv_make_anon(BlockDriverState *bs);
>> +void bdrv_append(BlockDriverState *bs_new, BlockDriverState *bs_top);
>> void bdrv_delete(BlockDriverState *bs);
>> int bdrv_parse_cache_flags(const char *mode, int *flags);
>> int bdrv_file_open(BlockDriverState **pbs, const char *filename, int flags);
>> diff --git a/blockdev.c b/blockdev.c
>> index 05e7c5e..560f7e8 100644
>> --- a/blockdev.c
>> +++ b/blockdev.c
>> @@ -714,6 +714,134 @@ void qmp_blockdev_snapshot_sync(const char *device, const char *snapshot_file,
>> }
>> }
>>
>> +
>> +/* New and old BlockDriverState structs for group snapshots */
>> +typedef struct BlkGroupSnapshotStates {
>> + BlockDriverState *old_bs;
>> + BlockDriverState *new_bs;
>> + bool is_open;
>> + QSIMPLEQ_ENTRY(BlkGroupSnapshotStates) entry;
>> +} BlkGroupSnapshotStates;
>> +
>> +/*
>> + * 'Atomic' group snapshots. The snapshots are taken as a set, and if any fail
>> + * then we do not pivot any of the devices in the group, and abandon the
>> + * snapshots
>> + */
>> +void qmp_blockdev_group_snapshot_sync(SnapshotDevList *dev_list,
>> + Error **errp)
>
> Too much indentation?
Sure :) Fixed.
>
>> +{
>> + int ret = 0;
>> + SnapshotDevList *dev_entry = dev_list;
>> + SnapshotDev *dev_info = NULL;
>> + BlkGroupSnapshotStates *states;
>> + BlockDriver *proto_drv;
>> + BlockDriver *drv;
>> + int flags;
>> + const char *format;
>> + const char *snapshot_file;
>> +
>> + QSIMPLEQ_HEAD(snap_bdrv_states, BlkGroupSnapshotStates) snap_bdrv_states;
>> + QSIMPLEQ_INIT(&snap_bdrv_states);
>> +
>> + /* We don't do anything in this loop that commits us to the snapshot */
>> + while (NULL != dev_entry) {
>> + dev_info = dev_entry->value;
>> + dev_entry = dev_entry->next;
>> +
>> + states = g_malloc0(sizeof(BlkGroupSnapshotStates));
>> + states->is_open = false;
>> +
>> + states->old_bs = bdrv_find(dev_info->device);
>> +
>> + if (!states->old_bs) {
>> + error_set(errp, QERR_DEVICE_NOT_FOUND, dev_info->device);
>> + goto fail;
>> + }
>> +
>> + if (bdrv_in_use(states->old_bs)) {
>> + error_set(errp, QERR_DEVICE_IN_USE, dev_info->device);
>> + goto fail;
>> + }
>> +
>> + snapshot_file = dev_info->snapshot_file;
>> +
>> + flags = states->old_bs->open_flags;
>> +
>> + if (!dev_info->has_format) {
>> + format = "qcow2";
>> + } else {
>> + format = dev_info->format;
>> + }
>> +
>> + drv = bdrv_find_format(format);
>> + if (!drv) {
>> + error_set(errp, QERR_INVALID_BLOCK_FORMAT, format);
>> + goto fail;
>> + }
>> +
>> + proto_drv = bdrv_find_protocol(snapshot_file);
>> + if (!proto_drv) {
>> + error_set(errp, QERR_INVALID_BLOCK_FORMAT, format);
>> + goto fail;
>> + }
>> +
>> + /* create new image w/backing file */
>> + ret = bdrv_img_create(snapshot_file, format,
>> + states->old_bs->filename,
>> + drv->format_name, NULL, -1, flags);
>> + if (ret) {
>> + error_set(errp, QERR_OPEN_FILE_FAILED, snapshot_file);
>> + goto fail;
>> + }
>> +
>> + /* We will manually add the backing_hd field to the bs later */
>> + states->new_bs = bdrv_new("");
>> + ret = bdrv_open(states->new_bs, snapshot_file,
>> + flags | BDRV_O_NO_BACKING, drv);
>> + states->is_open = true;
>> +
>> + if (ret != 0) {
>> + error_set(errp, QERR_OPEN_FILE_FAILED, snapshot_file);
>> + goto close_and_fail;
>> + }
>> + QSIMPLEQ_INSERT_TAIL(&snap_bdrv_states, states, entry);
>
> Inserting it only here means that all the error cases above leak states.
Yes. Will move it up to immediately after the bdrv_open().
>
>> + }
>> +
>> + /*
>> + * Now we will drain, flush, & pivot everything - we are committed at this
>> + * point.
>> + */
>> + bdrv_drain_all();
>
> I would feel more comfortable if we could do the bdrv_drain_all() at the
> very beginning of the function. Not that I know of a specific scenario
> that would go wrong, but you have a nested main loop here that could do
> more or less anything.
I can move it up to the beginning if desired... My thought was that it
was best to drain closer to the point of commit.
>
>> + QSIMPLEQ_FOREACH(states, &snap_bdrv_states, entry) {
>> + bdrv_flush(states->old_bs);
>
> This can return an error which must be checked. And of course, we must
> do it before committing to the snapshot (but after bdrv_drain_all).
Hmm. If the flush returns error, do we abandon at this point? Perhaps it
would be best to loop through and flush each device first, and if no
flushes fail, _then_ loop through and perform the bdrv_append(). Once we
are calling bdrv_append(), we want no possible failure points.
>
>> + /* This removes our old bs from the bdrv_states, and adds the new bs */
>> + bdrv_append(states->new_bs, states->old_bs);
>> + }
>> +
>> + /* success */
>> + goto exit;
>> +
>> +close_and_fail:
>> + /*
>> + * failure, and it is all-or-none; abandon each new bs, and keep using
>> + * the original bs for all images
>> + */
>> + QSIMPLEQ_FOREACH(states, &snap_bdrv_states, entry) {
>> + if (states->is_open) {
>> + bdrv_close(states->new_bs);
>
> bdrv_delete, as Stefan already mentioned.
>
>> + }
>> + }
>> +fail:
>> +exit:
>> + QSIMPLEQ_FOREACH(states, &snap_bdrv_states, entry) {
>> + g_free(states);
>> + }
>> + return;
>> +}
>
> Kevin
next prev parent reply other threads:[~2012-02-27 17:02 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-02-26 19:31 [Qemu-devel] [PATCH v2 0/2] Group Live Snapshots Jeff Cody
2012-02-26 19:31 ` [Qemu-devel] [PATCH v2 1/2] qapi: Introduce blockdev-group-snapshot-sync command Jeff Cody
2012-02-27 11:13 ` Stefan Hajnoczi
2012-02-27 14:26 ` Jeff Cody
2012-02-27 14:40 ` Stefan Hajnoczi
2012-02-27 15:23 ` Jeff Cody
2012-02-27 15:31 ` Stefan Hajnoczi
2012-02-27 15:46 ` Kevin Wolf
2012-02-27 16:51 ` Paolo Bonzini
2012-02-27 17:02 ` Jeff Cody [this message]
2012-02-27 17:22 ` Kevin Wolf
2012-02-27 17:31 ` Jeff Cody
2012-02-26 19:31 ` [Qemu-devel] [PATCH v2 2/2] QMP: Add qmp command for blockdev-group-snapshot-sync Jeff Cody
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4F4BB712.8090005@redhat.com \
--to=jcody@redhat.com \
--cc=armbru@redhat.com \
--cc=eblake@redhat.com \
--cc=kwolf@redhat.com \
--cc=lcapitulino@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.