From: Eric Blake <eblake@redhat.com>
To: Max Reitz <mreitz@redhat.com>, qemu-devel@nongnu.org
Cc: Kevin Wolf <kwolf@redhat.com>, "open list:qcow2" <qemu-block@nongnu.org>
Subject: Re: [PATCH] qcow2: Allow resize of images with internal snapshots
Date: Thu, 23 Apr 2020 09:35:35 -0500 [thread overview]
Message-ID: <cee813a7-1540-aaa3-b8e1-98dbc84e90ff@redhat.com> (raw)
In-Reply-To: <af50e0f6-3d78-ee51-5540-97fb0fc08f9b@redhat.com>
On 4/23/20 8:55 AM, Max Reitz wrote:
> On 22.04.20 22:53, Eric Blake wrote:
>> We originally refused to allow resize of images with internal
>> snapshots because the v2 image format did not require the tracking of
>> snapshot size, making it impossible to safely revert to a snapshot
>> with a different size than the current view of the image. But the
>> snapshot size tracking was rectified in v3, and our recent fixes to
>> qemu-img amend (see 0a85af35) guarantee that we always have a valid
>> snapshot size. Thus, we no longer need to artificially limit image
>> resizes, but it does become one more thing that would prevent a
>> downgrade back to v2. And now that we support different-sized
>> snapshots, it's also easy to fix reverting to a snapshot to apply the
>> new size.
>>
>> Upgrade iotest 61 to cover this (we previously had NO coverage of
>> refusal to resize while snapshots exist). Note that the amend process
>> can fail but still have effects: in particular, since we break things
>> into upgrade, resize, downgrade, if a failure does not happen until a
>> later phase (such as the downgrade attempt), earlier steps are still
>> visible (a truncation and downgrade attempt will fail, but only after
>> truncating data). But even before this patch, an attempt to upgrade
>> and resize would fail but only after changing the image to v3. In
>> some sense, partial image changes on failure are inevitible, since we
>> can't avoid a mid-change EIO (if you are trying to amend more than one
>> thing at once, but something fails, I hope you have a backup image).
>>
>> @@ -775,10 +776,22 @@ int qcow2_snapshot_goto(BlockDriverState *bs, const char *snapshot_id)
>> }
>>
>> if (sn->disk_size != bs->total_sectors * BDRV_SECTOR_SIZE) {
>> - error_report("qcow2: Loading snapshots with different disk "
>> - "size is not implemented");
>> - ret = -ENOTSUP;
>> - goto fail;
>> + BlockBackend *blk = blk_new(bdrv_get_aio_context(bs),
>> + BLK_PERM_RESIZE, BLK_PERM_ALL);
>> + ret = blk_insert_bs(blk, bs, &local_err);
>
> I wonder whether maybe we should reintroduce blk_new_with_bs().
This code segment is copied from what 'qemu-img amend' does, so adding a
helper function would indeed make life a bit easier for more than one
spot in the code base. Separate patch, obviously.
>> +++ b/block/qcow2.c
>> @@ -3988,14 +3988,21 @@ static int coroutine_fn qcow2_co_truncate(BlockDriverState *bs, int64_t offset,
>>
>> qemu_co_mutex_lock(&s->lock);
>>
>> - /* cannot proceed if image has snapshots */
>> - if (s->nb_snapshots) {
>> - error_setg(errp, "Can't resize an image which has snapshots");
>> + /*
>> + * Even though we store snapshot size for all images, it was not
>> + * required until v3, so it is not safe to proceed for v2.
>> + */
>> + if (s->nb_snapshots && s->qcow_version < 3) {
>> + error_setg(errp, "Can't resize a v2 image which has snapshots");
>> ret = -ENOTSUP;
>> goto fail;
>> }
>>
>> - /* cannot proceed if image has bitmaps */
>> + /*
>> + * For now, it's easier to not proceed if image has bitmaps, even
>> + * though we could resize bitmaps, because it is not obvious
>> + * whether new bits should be set or clear.
>
> The previous comment was incorrect as well, but actually
> qcow2_truncate_bitmaps_check() doesn’t return an error when there is any
> bitmap, but only if there are non-active bitmaps, or active bitmaps that
> cannot be modified. So for non-disabled bitmaps, we generally do
> happily proceed.
The comment change is collateral (only because I noticed it in the
diff); but I could indeed reword it slightly more accurately as:
Check if bitmaps prevent a resize. Although bitmaps can be resized,
there are situations where we don't know whether to set or clear new
bits, so for now it's easiest to just prevent resize in those cases.
And since it is a collateral change, it may even be worth splitting into
a separate patch.
>> +++ b/tests/qemu-iotests/061
>> @@ -111,6 +111,29 @@ $PYTHON qcow2.py "$TEST_IMG" dump-header
>> $QEMU_IO -c "read -P 0x2a 42M 64k" "$TEST_IMG" | _filter_qemu_io
>> _check_test_img
>>
>> +echo
>> +echo "=== Testing resize with snapshots ==="
>> +echo
>> +_make_test_img -o "compat=0.10" 32M
>> +$QEMU_IO -c "write -P 0x2a 24M 64k" "$TEST_IMG" | _filter_qemu_io
>> +$QEMU_IMG snapshot -c foo "$TEST_IMG"
>> +$QEMU_IMG resize "$TEST_IMG" 64M # fails
>> +$PYTHON qcow2.py "$TEST_IMG" dump-header
>
> What am I looking for in the header dump?
I was looking primarily for version (2 vs. 3), size (did it change), and
number of snapshots. You're right that grepping for what changes will
make this easier to maintain.
> Also, I personally prefer self-testing tests, because I don’t trust
> myself when I have to interpret the reference output on my own... As
> such, I think it would make sense to not just put those “# fails”
> comments here, but an explicit test on $? instead. (E.g. by
> “|| echo ERROR”, although I can see that would be weird in the
> expected-failure case as “&& echo ERROR”.)
Good idea.
>
>> +$QEMU_IMG snapshot -c bar "$TEST_IMG"
>> +$QEMU_IMG resize --shrink "$TEST_IMG" 64M # succeeds
>> +$PYTHON qcow2.py "$TEST_IMG" dump-header
>> +$QEMU_IMG amend -o "compat=0.10,size=32M" "$TEST_IMG" # fails, image left v3
>> +$PYTHON qcow2.py "$TEST_IMG" dump-header
>
> Again, a grep for the image size would help focus the reference output.
>
> (In addition, _img_info would give us snapshot information. Might be
> interesting. So maybe the best thing would be to grep the image
> version, image size, and snapshot list from the image info every time.)
Yep, that's the same list I was noticing when writing the patch.
>
> Speaking of the image size. Is it intentional that the size is changed
> to 32 MB? Should amend work more like a transaction, in that we should
> at least do a loose check on whether the options can be changed before
> we touch the image?
Yes, the commit message tried to call it out. It's a pre-existing
problem - during amend, we DO make changes to the disk in one step, with
no way to roll back those changes, even if a later step fails.
Pre-patch, if you request an upgrade to v3 as well as a resize, but
resize fails, you still end up with the image being changed to v3.
That's no different from post-patch where if you request a resize and a
downgrade to v2, the resize happens but not the downgrade. On the
bright side, our current failure scenarios at least leave the resulting
image viable, even if it is not the same as it was pre-attempt.
>
> Also, there’s a problem of ordering here. The command as you’ve written
> doesn’t have this specific problem, but imagine the size was still 128
> MB before (just like the snapshot). Then what the command does depends
> on the order in which the operations are executed: If we change the
> version first, the size cannot be changed because of the internal
> snapshot. If we change the size first, the version can no longer be
> changed because the internal snapshot has a different size than the image.
Yes, it was a pain for me while writing the tests. At one point I even
considered swapping things to do the resize after the downgrade, but
that introduces a different problem: the downgrade depends on knowing
the post-transaction size (because downgrading is safe only when all
internal snapshots match the post-resize length), but we aren't passing
the desired size through to the upgrade and downgrade functions. Worse,
if we swap the downgrade first, and it succeeds but then resize fails,
the image is no longer viable; at least with the current ordering, even
though user data has been truncated, it remains v3 so the size
differences in snapshots don't break the image (and the user DID request
an image resize, so it's not like we are discarding data accidentally).
If we want to avoid truncating an image at all costs on any potential
failure path, we have to add a lot more plumbing (not to say it's a bad
thing, just that it's more work). And no matter how much plumbing we
add, or transaction-like rollback capabilities we add, there's still the
risk that we will hit a late EIO error leaving the image in a
half-changed state, even if none of our sanity checks failed. Or put
another way, without some sort of journaling, the best we can do is
defer all writes until we know the full set of changes, which limits the
likelihood of a half-baked change to a write failure. But since we can
only reduce, and not eliminate, the possibility of a half-baked change,
the question then becomes whether it is worth the engineering effort of
additional complexity for even more corner cases and less risk, or just
leave things as they are (if qemu-img amend fails, we make no guarantees
about the state of your image).
>
>
> (Hypothetical problems that turn out not to be any in practice:
>
> Or, maybe more interesting: What if we try to amend to
> compat=0.10,size=128M here?
>
> If the size is changed first, the command will succeed, because then the
> snapshot size matches the image size again. If qemu-img attempts to
> change the version first, the whole command will fail.
>
> As I noted above, the size is indeed changed before the version (hence
> us getting a 32 MB v3 image here), so the compat=0.10,size=128M amend
> would indeed succeed. But that’s luck.
>
> OTOH, that means that if you have a v2 image with a snapshot and want to
> make it a v3 image and resize it at the same time, that would fail by
> the same logic, because the size cannot be changed for v2 images. So
> for that case it’d be bad luck.
>
> But we always upgrade an image first in the amend process and downgrade
> it last, to address specifically such cases: Allow adding new features
> along with the upgrade, and strip unsupported features right before the
> downgrade. So that’s all good. But I think it still shows that the
> dependencies are getting a bit hairy.)
I'm not sure the work in making amend more transaction-like is worth it
- yes, we can add more validation code up front before making the first
write change, but as some of the later changes depend on information
that would be changed earlier, that becomes a lot more state we have to
collect during our initial probes (my example being that the downgrade
attempt depends on knowing the final image size, and that's a lot easier
when the image has already been resized rather than having to pass the
new size through).
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3226
Virtualization: qemu.org | libvirt.org
next prev parent reply other threads:[~2020-04-23 14:36 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-04-22 20:53 [PATCH] qcow2: Allow resize of images with internal snapshots Eric Blake
2020-04-23 13:55 ` Max Reitz
2020-04-23 14:35 ` Eric Blake [this message]
2020-04-23 17:21 ` Max Reitz
2020-04-23 17:41 ` Eric Blake
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cee813a7-1540-aaa3-b8e1-98dbc84e90ff@redhat.com \
--to=eblake@redhat.com \
--cc=kwolf@redhat.com \
--cc=mreitz@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).