From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.4 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3C7B6C54FCB for ; Thu, 23 Apr 2020 14:36:34 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id E148820728 for ; Thu, 23 Apr 2020 14:36:33 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="WoRDmFww" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E148820728 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:44652 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jRcxt-0007Ks-1Z for qemu-devel@archiver.kernel.org; Thu, 23 Apr 2020 10:36:33 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:33520) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jRcxE-0006sj-9x for qemu-devel@nongnu.org; Thu, 23 Apr 2020 10:35:53 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.90_1) (envelope-from ) id 1jRcxD-00027c-Do for qemu-devel@nongnu.org; Thu, 23 Apr 2020 10:35:52 -0400 Received: from us-smtp-delivery-1.mimecast.com ([207.211.31.120]:24205 helo=us-smtp-1.mimecast.com) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.90_1) (envelope-from ) id 1jRcxC-00026d-UC for qemu-devel@nongnu.org; Thu, 23 Apr 2020 10:35:51 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1587652550; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=rrSqBKxFDwvB2vvxAxdnOBMTLp6cExFIJPx3WdyHTJQ=; b=WoRDmFww6TJd80SD3ELSz/PT6ypGhGyYImsOJTnpXvgAvwTT2AZ0RAalp3XuI15VirKKWn FnyssKdIMGTPZKAJcstVRmHAg1On0HVrehMGwLIEmVZoMwQZqHhhxjWjeOgxwPdVfD7Ipw XxtTaIjPKkfVpkv/nISIlaYDZ2omQFg= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-443-87wQ3HdZOrOk-vvlTnFvyA-1; Thu, 23 Apr 2020 10:35:38 -0400 X-MC-Unique: 87wQ3HdZOrOk-vvlTnFvyA-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id F16B08014D8; Thu, 23 Apr 2020 14:35:36 +0000 (UTC) Received: from [10.10.116.80] (ovpn-116-80.rdu2.redhat.com [10.10.116.80]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 4275D5D70A; Thu, 23 Apr 2020 14:35:36 +0000 (UTC) Subject: Re: [PATCH] qcow2: Allow resize of images with internal snapshots To: Max Reitz , qemu-devel@nongnu.org References: <20200422205355.274706-1-eblake@redhat.com> From: Eric Blake Organization: Red Hat, Inc. Message-ID: Date: Thu, 23 Apr 2020 09:35:35 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.7.0 MIME-Version: 1.0 In-Reply-To: Content-Language: en-US X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable Received-SPF: pass client-ip=207.211.31.120; envelope-from=eblake@redhat.com; helo=us-smtp-1.mimecast.com X-detected-operating-system: by eggs.gnu.org: First seen = 2020/04/23 06:43:51 X-ACL-Warn: Detected OS = Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 207.211.31.120 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Kevin Wolf , "open list:qcow2" Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" On 4/23/20 8:55 AM, Max Reitz wrote: > On 22.04.20 22:53, Eric Blake wrote: >> We originally refused to allow resize of images with internal >> snapshots because the v2 image format did not require the tracking of >> snapshot size, making it impossible to safely revert to a snapshot >> with a different size than the current view of the image. But the >> snapshot size tracking was rectified in v3, and our recent fixes to >> qemu-img amend (see 0a85af35) guarantee that we always have a valid >> snapshot size. Thus, we no longer need to artificially limit image >> resizes, but it does become one more thing that would prevent a >> downgrade back to v2. And now that we support different-sized >> snapshots, it's also easy to fix reverting to a snapshot to apply the >> new size. >> >> Upgrade iotest 61 to cover this (we previously had NO coverage of >> refusal to resize while snapshots exist). Note that the amend process >> can fail but still have effects: in particular, since we break things >> into upgrade, resize, downgrade, if a failure does not happen until a >> later phase (such as the downgrade attempt), earlier steps are still >> visible (a truncation and downgrade attempt will fail, but only after >> truncating data). But even before this patch, an attempt to upgrade >> and resize would fail but only after changing the image to v3. In >> some sense, partial image changes on failure are inevitible, since we >> can't avoid a mid-change EIO (if you are trying to amend more than one >> thing at once, but something fails, I hope you have a backup image). >> >> @@ -775,10 +776,22 @@ int qcow2_snapshot_goto(BlockDriverState *bs, cons= t char *snapshot_id) >> } >> >> if (sn->disk_size !=3D bs->total_sectors * BDRV_SECTOR_SIZE) { >> - error_report("qcow2: Loading snapshots with different disk " >> - "size is not implemented"); >> - ret =3D -ENOTSUP; >> - goto fail; >> + BlockBackend *blk =3D blk_new(bdrv_get_aio_context(bs), >> + BLK_PERM_RESIZE, BLK_PERM_ALL); >> + ret =3D blk_insert_bs(blk, bs, &local_err); >=20 > I wonder whether maybe we should reintroduce blk_new_with_bs(). This code segment is copied from what 'qemu-img amend' does, so adding a=20 helper function would indeed make life a bit easier for more than one=20 spot in the code base. Separate patch, obviously. >> +++ b/block/qcow2.c >> @@ -3988,14 +3988,21 @@ static int coroutine_fn qcow2_co_truncate(BlockD= riverState *bs, int64_t offset, >> >> qemu_co_mutex_lock(&s->lock); >> >> - /* cannot proceed if image has snapshots */ >> - if (s->nb_snapshots) { >> - error_setg(errp, "Can't resize an image which has snapshots"); >> + /* >> + * Even though we store snapshot size for all images, it was not >> + * required until v3, so it is not safe to proceed for v2. >> + */ >> + if (s->nb_snapshots && s->qcow_version < 3) { >> + error_setg(errp, "Can't resize a v2 image which has snapshots")= ; >> ret =3D -ENOTSUP; >> goto fail; >> } >> >> - /* cannot proceed if image has bitmaps */ >> + /* >> + * For now, it's easier to not proceed if image has bitmaps, even >> + * though we could resize bitmaps, because it is not obvious >> + * whether new bits should be set or clear. >=20 > The previous comment was incorrect as well, but actually > qcow2_truncate_bitmaps_check() doesn=E2=80=99t return an error when there= is any > bitmap, but only if there are non-active bitmaps, or active bitmaps that > cannot be modified. So for non-disabled bitmaps, we generally do > happily proceed. The comment change is collateral (only because I noticed it in the=20 diff); but I could indeed reword it slightly more accurately as: Check if bitmaps prevent a resize. Although bitmaps can be resized,=20 there are situations where we don't know whether to set or clear new=20 bits, so for now it's easiest to just prevent resize in those cases. And since it is a collateral change, it may even be worth splitting into=20 a separate patch. >> +++ b/tests/qemu-iotests/061 >> @@ -111,6 +111,29 @@ $PYTHON qcow2.py "$TEST_IMG" dump-header >> $QEMU_IO -c "read -P 0x2a 42M 64k" "$TEST_IMG" | _filter_qemu_io >> _check_test_img >> >> +echo >> +echo "=3D=3D=3D Testing resize with snapshots =3D=3D=3D" >> +echo >> +_make_test_img -o "compat=3D0.10" 32M >> +$QEMU_IO -c "write -P 0x2a 24M 64k" "$TEST_IMG" | _filter_qemu_io >> +$QEMU_IMG snapshot -c foo "$TEST_IMG" >> +$QEMU_IMG resize "$TEST_IMG" 64M # fails >> +$PYTHON qcow2.py "$TEST_IMG" dump-header >=20 > What am I looking for in the header dump? I was looking primarily for version (2 vs. 3), size (did it change), and=20 number of snapshots. You're right that grepping for what changes will=20 make this easier to maintain. > Also, I personally prefer self-testing tests, because I don=E2=80=99t tru= st > myself when I have to interpret the reference output on my own... As > such, I think it would make sense to not just put those =E2=80=9C# fails= =E2=80=9D > comments here, but an explicit test on $? instead. (E.g. by > =E2=80=9C|| echo ERROR=E2=80=9D, although I can see that would be weird i= n the > expected-failure case as =E2=80=9C&& echo ERROR=E2=80=9D.) Good idea. >=20 >> +$QEMU_IMG snapshot -c bar "$TEST_IMG" >> +$QEMU_IMG resize --shrink "$TEST_IMG" 64M # succeeds >> +$PYTHON qcow2.py "$TEST_IMG" dump-header >> +$QEMU_IMG amend -o "compat=3D0.10,size=3D32M" "$TEST_IMG" # fails, i= mage left v3 >> +$PYTHON qcow2.py "$TEST_IMG" dump-header >=20 > Again, a grep for the image size would help focus the reference output. >=20 > (In addition, _img_info would give us snapshot information. Might be > interesting. So maybe the best thing would be to grep the image > version, image size, and snapshot list from the image info every time.) Yep, that's the same list I was noticing when writing the patch. >=20 > Speaking of the image size. Is it intentional that the size is changed > to 32 MB? Should amend work more like a transaction, in that we should > at least do a loose check on whether the options can be changed before > we touch the image? Yes, the commit message tried to call it out. It's a pre-existing=20 problem - during amend, we DO make changes to the disk in one step, with=20 no way to roll back those changes, even if a later step fails. Pre-patch, if you request an upgrade to v3 as well as a resize, but=20 resize fails, you still end up with the image being changed to v3.=20 That's no different from post-patch where if you request a resize and a=20 downgrade to v2, the resize happens but not the downgrade. On the=20 bright side, our current failure scenarios at least leave the resulting=20 image viable, even if it is not the same as it was pre-attempt. >=20 > Also, there=E2=80=99s a problem of ordering here. The command as you=E2= =80=99ve written > doesn=E2=80=99t have this specific problem, but imagine the size was stil= l 128 > MB before (just like the snapshot). Then what the command does depends > on the order in which the operations are executed: If we change the > version first, the size cannot be changed because of the internal > snapshot. If we change the size first, the version can no longer be > changed because the internal snapshot has a different size than the image= . Yes, it was a pain for me while writing the tests. At one point I even=20 considered swapping things to do the resize after the downgrade, but=20 that introduces a different problem: the downgrade depends on knowing=20 the post-transaction size (because downgrading is safe only when all=20 internal snapshots match the post-resize length), but we aren't passing=20 the desired size through to the upgrade and downgrade functions. Worse,=20 if we swap the downgrade first, and it succeeds but then resize fails,=20 the image is no longer viable; at least with the current ordering, even=20 though user data has been truncated, it remains v3 so the size=20 differences in snapshots don't break the image (and the user DID request=20 an image resize, so it's not like we are discarding data accidentally). If we want to avoid truncating an image at all costs on any potential=20 failure path, we have to add a lot more plumbing (not to say it's a bad=20 thing, just that it's more work). And no matter how much plumbing we=20 add, or transaction-like rollback capabilities we add, there's still the=20 risk that we will hit a late EIO error leaving the image in a=20 half-changed state, even if none of our sanity checks failed. Or put=20 another way, without some sort of journaling, the best we can do is=20 defer all writes until we know the full set of changes, which limits the=20 likelihood of a half-baked change to a write failure. But since we can=20 only reduce, and not eliminate, the possibility of a half-baked change,=20 the question then becomes whether it is worth the engineering effort of=20 additional complexity for even more corner cases and less risk, or just=20 leave things as they are (if qemu-img amend fails, we make no guarantees=20 about the state of your image). >=20 >=20 > (Hypothetical problems that turn out not to be any in practice: >=20 > Or, maybe more interesting: What if we try to amend to > compat=3D0.10,size=3D128M here? >=20 > If the size is changed first, the command will succeed, because then the > snapshot size matches the image size again. If qemu-img attempts to > change the version first, the whole command will fail. >=20 > As I noted above, the size is indeed changed before the version (hence > us getting a 32 MB v3 image here), so the compat=3D0.10,size=3D128M amend > would indeed succeed. But that=E2=80=99s luck. >=20 > OTOH, that means that if you have a v2 image with a snapshot and want to > make it a v3 image and resize it at the same time, that would fail by > the same logic, because the size cannot be changed for v2 images. So > for that case it=E2=80=99d be bad luck. >=20 > But we always upgrade an image first in the amend process and downgrade > it last, to address specifically such cases: Allow adding new features > along with the upgrade, and strip unsupported features right before the > downgrade. So that=E2=80=99s all good. But I think it still shows that = the > dependencies are getting a bit hairy.) I'm not sure the work in making amend more transaction-like is worth it=20 - yes, we can add more validation code up front before making the first=20 write change, but as some of the later changes depend on information=20 that would be changed earlier, that becomes a lot more state we have to=20 collect during our initial probes (my example being that the downgrade=20 attempt depends on knowing the final image size, and that's a lot easier=20 when the image has already been resized rather than having to pass the=20 new size through). --=20 Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org