* [Qemu-devel] block-stream/commit and mixing internal and external snapshots
@ 2018-04-06 22:16 Eric Blake
2018-04-09 9:11 ` [Qemu-devel] [Qemu-block] " Kevin Wolf
0 siblings, 1 reply; 3+ messages in thread
From: Eric Blake @ 2018-04-06 22:16 UTC (permalink / raw)
To: Qemu-devel@nongnu.org, qemu block
[-- Attachment #1: Type: text/plain, Size: 5402 bytes --]
Perhaps others have already known this, but I just realized that if you
mix internal and external snapshots, you can set yourself up for massive
failures when trying to use block-stream or block-commit to consolidate
data across the external backing chain, without also thinking about the
internal snapshots.
Here's a quick demonstration:
$ # create the backing file, with all 1's; 1M clusters for convenience
$ qemu-img create -f qcow2 -o cluster_size=1m base.qcow2 4M
Formatting 'base.qcow2', fmt=qcow2 size=4194304 cluster_size=1048576
lazy_refcounts=off refcount_bits=16
$ qemu-io -c 'w -P 1 0 4m' -f qcow2 base.qcow2
wrote 4194304/4194304 bytes at offset 0
4 MiB, 1 ops; 0.0050 sec (791.139 MiB/sec and 197.7848 ops/sec)
$ # create the wrapper file, write 2 to the first 2 clusters
$ qemu-img create -f qcow2 -o backing_file=base.qcow2,backing_fmt=qcow2
top.qcow2
Formatting 'top.qcow2', fmt=qcow2 size=4194304 backing_file=base.qcow2
backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
$ qemu-io -c 'w -P 2 0 2m' -f qcow2 top.qcow2
wrote 2097152/2097152 bytes at offset 0
2 MiB, 1 ops; 0.0009 sec (2.144 GiB/sec and 1097.6948 ops/sec)
$ # create an internal snapshot, then write 3 to the middle 2 clusters
$ qemu-img snapshot -c snap1 top.qcow2
$ qemu-io -c 'w -P 3 1m 2m' -f qcow2 top.qcow2
wrote 2097152/2097152 bytes at offset 1048576
2 MiB, 1 ops; 0.0009 sec (2.102 GiB/sec and 1076.4263 ops/sec)
$ # we've mixed internal and external; let's shorten the chain now
$ qemu-img info top.qcow2
image: top.qcow2
file format: qcow2
virtual size: 4.0M (4194304 bytes)
disk size: 2.3M
cluster_size: 65536
backing file: base.qcow2
backing file format: qcow2
Snapshot list:
ID TAG VM SIZE DATE VM CLOCK
1 snap1 0 2018-04-06 16:44:54 00:00:00.000
Format specific information:
compat: 1.1
lazy refcounts: false
refcount bits: 16
corrupt: false
$ qemu-img rebase -f qcow2 -b '' top.qcow2
$ # create second snapshot, then revert to the first
$ qemu-img snapshot -c snap2 top.qcow2
$ qemu-img snapshot -a snap1 top.qcow2
$ # contents at the time we took snap1 were 2211, right? OOOPS!
$ qemu-io -c 'r -P 2 0 2m' -c 'r -P 1 2m 2m' -f qcow2 top.qcow2
read 2097152/2097152 bytes at offset 0
2 MiB, 1 ops; 0.0004 sec (3.914 GiB/sec and 2004.0080 ops/sec)
Pattern verification failed at offset 2097152, 2097152 bytes
read 2097152/2097152 bytes at offset 2097152
2 MiB, 1 ops; 0.0000 sec (24.723 GiB/sec and 12658.2278 ops/sec)
$ # the last two clusters were rewritten from 1 to 0. :(
$ qemu-io -c 'r -P 0 2m 2m' -f qcow2 top.qcow2
read 2097152/2097152 bytes at offset 2097152
2 MiB, 1 ops; 0.0001 sec (13.754 GiB/sec and 7042.2535 ops/sec)
$ # repair the damage, for now, and write 4 into last cluster...
$ qemu-img rebase -u -f qcow2 -b base.qcow2 -F qcow2 top.qcow2
$ qemu-io -c 'w -P 4 3m 1m' -f qcow2 top.qcow2
wrote 1048576/1048576 bytes at offset 3145728
1 MiB, 1 ops; 0.0005 sec (1.713 GiB/sec and 1754.3860 ops/sec)
$ # now let's try committing instead
$ qemu-img commit -f qcow2 -d top.qcow2
Image committed.
$ # revert back to snap2, which had contents 2331, right? OOPS!
$ qemu-img snapshot -a snap2 top.qcow2
$ qemu-io -c 'r -P 2 0 1m' -c 'r -P 3 1m 1m' -c 'r -P 3 2m 1m' -c 'r -P
1 3m 1m' -f qcow2 top.qcow2
read 1048576/1048576 bytes at offset 0
1 MiB, 1 ops; 0.0002 sec (3.860 GiB/sec and 3952.5692 ops/sec)
read 1048576/1048576 bytes at offset 1048576
1 MiB, 1 ops; 0.0002 sec (3.577 GiB/sec and 3663.0037 ops/sec)
read 1048576/1048576 bytes at offset 2097152
1 MiB, 1 ops; 0.0002 sec (4.628 GiB/sec and 4739.3365 ops/sec)
Pattern verification failed at offset 3145728, 1048576 bytes
read 1048576/1048576 bytes at offset 3145728
1 MiB, 1 ops; 0.0007 sec (1.345 GiB/sec and 1377.4105 ops/sec)
$ # the last cluster was rewritten from 1 to 4. :(
$ qemu-io -c 'r -P 4 3m 1m' -f qcow2 top.qcow2
read 1048576/1048576 bytes at offset 3145728
1 MiB, 1 ops; 0.0011 sec (878.735 MiB/sec and 878.7346 ops/sec)
The root cause to all of this is that right now, ALL internal snapshots
share the same backing file information in the file header; but
block-stream operations only modify the active snapshot. The actions of
changing the backing file or of rewriting the clusters in the backing
file don't break the active snapshot, but DO bleed through to the
internal snapshots, for any cluster where the internal snapshot was
relying on the backing file.
Does this mean we should make it harder to perform external block
operations on a qcow2 file that has internal snapshots (either refuse
outright, or at least require a 'force' flag to let the user acknowledge
the risk)? Similarly, should it be harder to create an internal
snapshot when an image already has an external backing file, and/or
should we improve the qcow2 specification of internal snapshot
descriptors to record a per-snapshot backing file rather than the
current approach that all snapshots share the same backing file?
Whether or not we track a per-snapshot backing file, should the presence
of internal snapshots be used to request op-blockers for read
consistency on backing files?
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization: qemu.org | libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 619 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [Qemu-devel] [Qemu-block] block-stream/commit and mixing internal and external snapshots
2018-04-06 22:16 [Qemu-devel] block-stream/commit and mixing internal and external snapshots Eric Blake
@ 2018-04-09 9:11 ` Kevin Wolf
2018-04-09 14:13 ` Eric Blake
0 siblings, 1 reply; 3+ messages in thread
From: Kevin Wolf @ 2018-04-09 9:11 UTC (permalink / raw)
To: Eric Blake; +Cc: Qemu-devel@nongnu.org, qemu block
[-- Attachment #1: Type: text/plain, Size: 2757 bytes --]
Am 07.04.2018 um 00:16 hat Eric Blake geschrieben:
> Perhaps others have already known this, but I just realized that if you
> mix internal and external snapshots, you can set yourself up for massive
> failures when trying to use block-stream or block-commit to consolidate
> data across the external backing chain, without also thinking about the
> internal snapshots.
Yeah, internal and external snapshots don't mix well. Basically, the
only thing that will work reliably is having a qcow2 image with internal
snapshots at the top, and then an immutable backing chain without
internal snapshots below it.
> Here's a quick demonstration:
> [...]
>
> The root cause to all of this is that right now, ALL internal snapshots
> share the same backing file information in the file header; but
> block-stream operations only modify the active snapshot. The actions of
> changing the backing file or of rewriting the clusters in the backing
> file don't break the active snapshot, but DO bleed through to the
> internal snapshots, for any cluster where the internal snapshot was
> relying on the backing file.
>
> Does this mean we should make it harder to perform external block
> operations on a qcow2 file that has internal snapshots (either refuse
> outright, or at least require a 'force' flag to let the user acknowledge
> the risk)? Similarly, should it be harder to create an internal
> snapshot when an image already has an external backing file, and/or
> should we improve the qcow2 specification of internal snapshot
> descriptors to record a per-snapshot backing file rather than the
> current approach that all snapshots share the same backing file?
> Whether or not we track a per-snapshot backing file, should the presence
> of internal snapshots be used to request op-blockers for read
> consistency on backing files?
Op blockers can't really protect a node against itself. As far as the
backing file node is concerned, nothing bad has happened. It is still
fully consistent and it hasn't been written to. It just isn't used any
more by its parent node.
Possibly we can use a blocker to enforce that the backing file child
isn't changed, but that would be something like a BLK_PERM_GRAPH_MOD
permission that we failed to define precisely so far.
Other than that, if you want to make the merge of the external snapshots
fail, maybe the only thing you could do is returning an error in when
trying to change the backing file link in qcow2_change_backing_file()
while there are internal snapshots. I'm not sure that this will result
in a good state, though, and it is only called at the very end of the
block job (i.e. all data is already copied), so it's not a nice failure
mode.
Kevin
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [Qemu-devel] [Qemu-block] block-stream/commit and mixing internal and external snapshots
2018-04-09 9:11 ` [Qemu-devel] [Qemu-block] " Kevin Wolf
@ 2018-04-09 14:13 ` Eric Blake
0 siblings, 0 replies; 3+ messages in thread
From: Eric Blake @ 2018-04-09 14:13 UTC (permalink / raw)
To: Kevin Wolf; +Cc: Qemu-devel@nongnu.org, qemu block
[-- Attachment #1: Type: text/plain, Size: 997 bytes --]
On 04/09/2018 04:11 AM, Kevin Wolf wrote:
> Am 07.04.2018 um 00:16 hat Eric Blake geschrieben:
>> Perhaps others have already known this, but I just realized that if you
>> mix internal and external snapshots, you can set yourself up for massive
>> failures when trying to use block-stream or block-commit to consolidate
>> data across the external backing chain, without also thinking about the
>> internal snapshots.
>
> Yeah, internal and external snapshots don't mix well. Basically, the
> only thing that will work reliably is having a qcow2 image with internal
> snapshots at the top, and then an immutable backing chain without
> internal snapshots below it.
I may try to tackle some safety valve additions in 2.13; but as the
problem is pre-existing and not a late regression in 2.12, it is not a
candidate for rushing anything into -rc3.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization: qemu.org | libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 619 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2018-04-09 14:13 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-04-06 22:16 [Qemu-devel] block-stream/commit and mixing internal and external snapshots Eric Blake
2018-04-09 9:11 ` [Qemu-devel] [Qemu-block] " Kevin Wolf
2018-04-09 14:13 ` Eric Blake
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).