qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] Question about odd snapshot behaviour
@ 2015-10-19  5:27 Sam Bobroff
  2015-10-19 12:50 ` Kevin Wolf
  0 siblings, 1 reply; 3+ messages in thread
From: Sam Bobroff @ 2015-10-19  5:27 UTC (permalink / raw)
  To: qemu-devel

Hi all,

While working through the QEMU code that saves and loads snapshots, I've
noticed some confusing behaviour when using named VM snapshots that may need to
be fixed somehow. This is what I've noticed:

* If I create two empty qcow2 filesystems: fs1.qcow2 and fs2.qcow2:
	* e.g. qemu-image create -f qcow2 fs1.qcow2 10G
	* e.g. qemu-image create -f qcow2 fs2.qcow2 10G
* Then boot the guest with the above filesystems as hda and hdb (my guest's rootfs
  is an initramfs, not either of these):
	* e.g. qemu-system-ppc64 ... -hda fs1.qcow2 -hdb fs2.qcow2
* In the QEMU monitor, create a blockdev snapshot on hd2 called "foo":
	e.g. snapshot_blkdev_internal scsi0-hd1 foo
* Check fs2 on the host command line:
	e.g. qemu-img snapshot -l fs2.qcow2
	Snapshot list:
	ID        TAG                 VM SIZE                DATE       VM CLOCK
	1         foo                       0 2015-10-19 15:56:29   00:00:20.969
* In the QEMU monitor, save the vm state as "bar":
	e.g. savevm bar
* Check the snapshot on fs1:
	e.g. info snapshots
	ID        TAG                 VM SIZE                DATE       VM CLOCK
	1         bar                    215M 2015-10-19 04:57:13   00:01:05.589
* Check fs2 again on the host command line:
	e.g. qemu-img snapshot -l fs2.qcow2
	Snapshot list:
	ID        TAG                 VM SIZE                DATE       VM CLOCK
	1         foo                       0 2015-10-19 15:56:29   00:00:20.969
	2         bar                       0 2015-10-19 15:57:13   00:01:05.589

* Now the fun part: overwrite the bar snapshot on fs1 by ID:
	* savevm 1
* Examine the results from the monitor:
	* e.g. info snapshots
	There is no suitable snapshot available
* Examine fs1 and fs2 from the command line:
	* e.g. qemu-img snapshot -l fs1.qcow2 
	Snapshot list:
	ID        TAG                 VM SIZE                DATE       VM CLOCK
	1         bar                    215M 2015-10-19 16:00:45   00:04:36.065
	* e.g. qemu-img snapshot -l fs2.qcow2 
	Snapshot list:
	ID        TAG                 VM SIZE                DATE       VM CLOCK
	2         bar                       0 2015-10-19 15:57:13   00:01:05.589
	3         bar                       0 2015-10-19 16:00:45   00:04:36.065

So what seems odd is:

* QEMU has lost the bar snapshot on fs1, although qemu-img still shows
  it and it re-appears in QEMU if the guest is started with only
  hd1.qcow attached.

* QEMU has deleted the snapshot named "foo" on hd2, even though we issued
  "savevm 1" and the old snapshot (on hd1) with ID 1 was named "bar". This
  surprised me.

* QEMU has created a snapshot on hd2 with a duplicate name ("bar") and if we
  issue a "loadvm bar" QEMU loads the snapshot with ID 1 on hd1 and the first
  snapshot with the name "bar" on hd2 which is the one with ID 2. That is not
  the snapshot that matches the vm state (which is the one with ID 3) and
  presumably can lead to filesystem corruption on hd2.

Do these seem like bugs? Should I try to find a way to fix some or all of them?

(Note: I checked the launchpad bug list but didn't see these listed.)

Cheers,
Sam.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Qemu-devel] Question about odd snapshot behaviour
  2015-10-19  5:27 [Qemu-devel] Question about odd snapshot behaviour Sam Bobroff
@ 2015-10-19 12:50 ` Kevin Wolf
  2015-10-22 23:54   ` Sam Bobroff
  0 siblings, 1 reply; 3+ messages in thread
From: Kevin Wolf @ 2015-10-19 12:50 UTC (permalink / raw)
  To: Sam Bobroff; +Cc: qemu-devel, qemu-block

[ CC: qemu-block ]

Am 19.10.2015 um 07:27 hat Sam Bobroff geschrieben:
> Hi all,
> 
> While working through the QEMU code that saves and loads snapshots, I've
> noticed some confusing behaviour when using named VM snapshots that may need to
> be fixed somehow. This is what I've noticed:
> 
> * If I create two empty qcow2 filesystems: fs1.qcow2 and fs2.qcow2:
> 	* e.g. qemu-image create -f qcow2 fs1.qcow2 10G
> 	* e.g. qemu-image create -f qcow2 fs2.qcow2 10G
> * Then boot the guest with the above filesystems as hda and hdb (my guest's rootfs
>   is an initramfs, not either of these):
> 	* e.g. qemu-system-ppc64 ... -hda fs1.qcow2 -hdb fs2.qcow2
> * In the QEMU monitor, create a blockdev snapshot on hd2 called "foo":
> 	e.g. snapshot_blkdev_internal scsi0-hd1 foo
> * Check fs2 on the host command line:
> 	e.g. qemu-img snapshot -l fs2.qcow2
> 	Snapshot list:
> 	ID        TAG                 VM SIZE                DATE       VM CLOCK
> 	1         foo                       0 2015-10-19 15:56:29   00:00:20.969
> * In the QEMU monitor, save the vm state as "bar":
> 	e.g. savevm bar
> * Check the snapshot on fs1:
> 	e.g. info snapshots
> 	ID        TAG                 VM SIZE                DATE       VM CLOCK
> 	1         bar                    215M 2015-10-19 04:57:13   00:01:05.589
> * Check fs2 again on the host command line:
> 	e.g. qemu-img snapshot -l fs2.qcow2
> 	Snapshot list:
> 	ID        TAG                 VM SIZE                DATE       VM CLOCK
> 	1         foo                       0 2015-10-19 15:56:29   00:00:20.969
> 	2         bar                       0 2015-10-19 15:57:13   00:01:05.589
> 
> * Now the fun part: overwrite the bar snapshot on fs1 by ID:
> 	* savevm 1
> * Examine the results from the monitor:
> 	* e.g. info snapshots
> 	There is no suitable snapshot available
> * Examine fs1 and fs2 from the command line:
> 	* e.g. qemu-img snapshot -l fs1.qcow2 
> 	Snapshot list:
> 	ID        TAG                 VM SIZE                DATE       VM CLOCK
> 	1         bar                    215M 2015-10-19 16:00:45   00:04:36.065
> 	* e.g. qemu-img snapshot -l fs2.qcow2 
> 	Snapshot list:
> 	ID        TAG                 VM SIZE                DATE       VM CLOCK
> 	2         bar                       0 2015-10-19 15:57:13   00:01:05.589
> 	3         bar                       0 2015-10-19 16:00:45   00:04:36.065

I must admit that I wouldn't have been able to predict this result. :-)

Generally, internal snapshots with multiple disks are somewhat tricky
because we have individual names and IDs in every image file, and each
file can individually be snapshotted, but need to create a consistent
view from it.

> So what seems odd is:
> 
> * QEMU has lost the bar snapshot on fs1, although qemu-img still shows
>   it and it re-appears in QEMU if the guest is started with only
>   hd1.qcow attached.

This one I think is a feature: A snapshot can only be loaded if it's
present on all image files. Loading the snapshot only on some disks and
leaving the other disks alone is likely to result in an inconsistent
state.

> * QEMU has deleted the snapshot named "foo" on hd2, even though we issued
>   "savevm 1" and the old snapshot (on hd1) with ID 1 was named "bar". This
>   surprised me.

You mean we should check whether a snapshot identified by an ID has the
same name on all image files before it's considered valid? That might be
a reasonable constraint.

What I would have expected is that the old snapshot with ID 1 indeed
gets deleted, but a new one with ID 1 is created again. I'm not sure if
I would have expected "foo" or "bar" as its name, I could argue either
way.

> * QEMU has created a snapshot on hd2 with a duplicate name ("bar") and if we
>   issue a "loadvm bar" QEMU loads the snapshot with ID 1 on hd1 and the first
>   snapshot with the name "bar" on hd2 which is the one with ID 2. That is not
>   the snapshot that matches the vm state (which is the one with ID 3) and
>   presumably can lead to filesystem corruption on hd2.

Yup, this doesn't seem very nice. I'm not sure whether we can forbid
duplicate names, though, without breaking compatibility in an
unacceptable way.

On the other hand, IDs and names don't really mix well and make it too
easy to shoot yourself in the foot. Maybe it's bad enough to justify
even some incompatibilities.

> Do these seem like bugs? Should I try to find a way to fix some or all of them?
> 
> (Note: I checked the launchpad bug list but didn't see these listed.)

I'm not sure whether to call them bugs, but there's definitely a safety
check or two missing. Let's discuss what the exact behaviour should be,
and then you can go ahead and write the fixes to get that behaviour.

Kevin

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Qemu-devel] Question about odd snapshot behaviour
  2015-10-19 12:50 ` Kevin Wolf
@ 2015-10-22 23:54   ` Sam Bobroff
  0 siblings, 0 replies; 3+ messages in thread
From: Sam Bobroff @ 2015-10-22 23:54 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel, qemu-block

On Mon, Oct 19, 2015 at 02:50:17PM +0200, Kevin Wolf wrote:
> [ CC: qemu-block ]
> 
> Am 19.10.2015 um 07:27 hat Sam Bobroff geschrieben:
> > Hi all,
> > 
> > While working through the QEMU code that saves and loads snapshots, I've
> > noticed some confusing behaviour when using named VM snapshots that may need to
> > be fixed somehow. This is what I've noticed:
> > 
> > * If I create two empty qcow2 filesystems: fs1.qcow2 and fs2.qcow2:
> > 	* e.g. qemu-image create -f qcow2 fs1.qcow2 10G
> > 	* e.g. qemu-image create -f qcow2 fs2.qcow2 10G
> > * Then boot the guest with the above filesystems as hda and hdb (my guest's rootfs
> >   is an initramfs, not either of these):
> > 	* e.g. qemu-system-ppc64 ... -hda fs1.qcow2 -hdb fs2.qcow2
> > * In the QEMU monitor, create a blockdev snapshot on hd2 called "foo":
> > 	e.g. snapshot_blkdev_internal scsi0-hd1 foo
> > * Check fs2 on the host command line:
> > 	e.g. qemu-img snapshot -l fs2.qcow2
> > 	Snapshot list:
> > 	ID        TAG                 VM SIZE                DATE       VM CLOCK
> > 	1         foo                       0 2015-10-19 15:56:29   00:00:20.969
> > * In the QEMU monitor, save the vm state as "bar":
> > 	e.g. savevm bar
> > * Check the snapshot on fs1:
> > 	e.g. info snapshots
> > 	ID        TAG                 VM SIZE                DATE       VM CLOCK
> > 	1         bar                    215M 2015-10-19 04:57:13   00:01:05.589
> > * Check fs2 again on the host command line:
> > 	e.g. qemu-img snapshot -l fs2.qcow2
> > 	Snapshot list:
> > 	ID        TAG                 VM SIZE                DATE       VM CLOCK
> > 	1         foo                       0 2015-10-19 15:56:29   00:00:20.969
> > 	2         bar                       0 2015-10-19 15:57:13   00:01:05.589
> > 
> > * Now the fun part: overwrite the bar snapshot on fs1 by ID:
> > 	* savevm 1
> > * Examine the results from the monitor:
> > 	* e.g. info snapshots
> > 	There is no suitable snapshot available
> > * Examine fs1 and fs2 from the command line:
> > 	* e.g. qemu-img snapshot -l fs1.qcow2 
> > 	Snapshot list:
> > 	ID        TAG                 VM SIZE                DATE       VM CLOCK
> > 	1         bar                    215M 2015-10-19 16:00:45   00:04:36.065
> > 	* e.g. qemu-img snapshot -l fs2.qcow2 
> > 	Snapshot list:
> > 	ID        TAG                 VM SIZE                DATE       VM CLOCK
> > 	2         bar                       0 2015-10-19 15:57:13   00:01:05.589
> > 	3         bar                       0 2015-10-19 16:00:45   00:04:36.065
> 
> I must admit that I wouldn't have been able to predict this result. :-)
> 
> Generally, internal snapshots with multiple disks are somewhat tricky
> because we have individual names and IDs in every image file, and each
> file can individually be snapshotted, but need to create a consistent
> view from it.
> 
> > So what seems odd is:
> > 
> > * QEMU has lost the bar snapshot on fs1, although qemu-img still shows
> >   it and it re-appears in QEMU if the guest is started with only
> >   hd1.qcow attached.
> 
> This one I think is a feature: A snapshot can only be loaded if it's
> present on all image files. Loading the snapshot only on some disks and
> leaving the other disks alone is likely to result in an inconsistent
> state.

How about expanding the output from "info snapshots" to include the invalid
ones but with enough information so that you can see that they're invalid, and
why that is?

> > * QEMU has deleted the snapshot named "foo" on hd2, even though we issued
> >   "savevm 1" and the old snapshot (on hd1) with ID 1 was named "bar". This
> >   surprised me.
> 
> You mean we should check whether a snapshot identified by an ID has the
> same name on all image files before it's considered valid? That might be
> a reasonable constraint.
> 
> What I would have expected is that the old snapshot with ID 1 indeed
> gets deleted, but a new one with ID 1 is created again. I'm not sure if
> I would have expected "foo" or "bar" as its name, I could argue either
> way.

A large part of the confusion comes from the way the search functions work:
they take a string but will match either the ID or name, and there's no way to
specify which. Could we change the HMP command so that by default it works only
by name, but can optionally take an ID (via an explicit flag or something)?

> > * QEMU has created a snapshot on hd2 with a duplicate name ("bar") and if we
> >   issue a "loadvm bar" QEMU loads the snapshot with ID 1 on hd1 and the first
> >   snapshot with the name "bar" on hd2 which is the one with ID 2. That is not
> >   the snapshot that matches the vm state (which is the one with ID 3) and
> >   presumably can lead to filesystem corruption on hd2.
> 
> Yup, this doesn't seem very nice. I'm not sure whether we can forbid
> duplicate names, though, without breaking compatibility in an
> unacceptable way.

It seems like the code is currently trying to prevent duplicate names: if you
create a snapshot with the name of an existing one, existing snapshots with
that name are (silently) deleted. It's just that the code gets confused by
passing an ID and the above happens. So I don't think really forbidding it is
going to break anything.

There seems to be an assumption that the name or ID both uniquely identify a
snapshot "set" (both in the loadvm code and "info snapshots", see
hmp_info_snapshots()) but this just isn't the case. I think we need to change
the ID so that it really is unique to a "snapshot set" (maybe make it a UUID?)
and change the existing code to prevent duplicate names or the loading of
"mismatched" snapshots (with the changed ID this would just mean they need to
have matching IDs).

So I'm proposing:
* Change ID from an incrementing integer to something "fairly unique" like a
  UUID.
* Snapshots can only be created with a name if:
	* There are no snapshots on any device with that name, or
	* There are already snapshots on all devices and they currently all
	  have the same ID.
* Snapshots are only valid if other snapshots exist on all devices with the
  same name (if named) and the same ID.

Do you think it would be too incompatible to change the ID field? Would it be
better to leave the ID field (and deprecate it somehow) and add an additional
"set ID" (UUID)?

> On the other hand, IDs and names don't really mix well and make it too
> easy to shoot yourself in the foot. Maybe it's bad enough to justify
> even some incompatibilities.

Agreed.

> > Do these seem like bugs? Should I try to find a way to fix some or all of them?
> > 
> > (Note: I checked the launchpad bug list but didn't see these listed.)
> 
> I'm not sure whether to call them bugs, but there's definitely a safety
> check or two missing. Let's discuss what the exact behaviour should be,
> and then you can go ahead and write the fixes to get that behaviour.

Sounds good, thanks for your time :-)

> Kevin

Cheers,
Sam.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2015-10-22 23:55 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-10-19  5:27 [Qemu-devel] Question about odd snapshot behaviour Sam Bobroff
2015-10-19 12:50 ` Kevin Wolf
2015-10-22 23:54   ` Sam Bobroff

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).