From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1KT3ji-0003bW-Hk
	for qemu-devel@nongnu.org; Tue, 12 Aug 2008 19:57:10 -0400
Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1KT3jg-0003Ys-W6
	for qemu-devel@nongnu.org; Tue, 12 Aug 2008 19:57:10 -0400
Received: from [199.232.76.173] (port=37445 helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1KT3jg-0003Yd-Qh
	for qemu-devel@nongnu.org; Tue, 12 Aug 2008 19:57:08 -0400
Received: from mail2.shareable.org ([80.68.89.115]:33054)
	by monty-python.gnu.org with esmtps (TLS-1.0:RSA_AES_256_CBC_SHA1:32)
	(Exim 4.60) (envelope-from <jamie@shareable.org>) id 1KT3jg-0004lj-2D
	for qemu-devel@nongnu.org; Tue, 12 Aug 2008 19:57:08 -0400
Received: from jamie by mail2.shareable.org with local (Exim 4.63)
	(envelope-from <jamie@shareable.org>) id 1KT3jU-0007fr-Bf
	for qemu-devel@nongnu.org; Wed, 13 Aug 2008 00:56:56 +0100
Date: Wed, 13 Aug 2008 00:56:56 +0100
From: Jamie Lokier <jamie@shareable.org>
Subject: Re: [Qemu-devel] PATCH: v3 Allow control over drive file open mode
Message-ID: <20080812235655.GA29029@shareable.org>
References: <20080801102949.GN23993@redhat.com>
	<48932751.5040303@codemonkey.ws>
	<18579.16494.791024.25671@mariner.uk.xensource.com>
	<4893448C.1050508@codemonkey.ws>
	<18592.25422.257378.988970@mariner.uk.xensource.com>
	<48A190A9.1070902@codemonkey.ws>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <48A190A9.1070902@codemonkey.ws>
Reply-To: qemu-devel@nongnu.org
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: qemu-devel@nongnu.org

Anthony Liguori wrote:
> Which is fine, but you're missing my fundamental argument.  Having a 
> read-only flag exposed to the user should not translate to "open the 
> underlying files O_RDONLY".  That's an implementation detail.  If that's 
> what ends up happening, great.  However, I'll make the argument also 
> that for certain circumstances that's not actually what we want.

And when the file is not owned by the user running QEMU?
I.e. it's shared.

> >>qemu-system-x86_64 -drive file=foo.img -drive file=boo.img,read-only=on
> >>
> >>Down the road, you do a savevm.  savevm has to create a checkpoint in 
> >>all disk images.  The checkpoint will be empty in boo.img but it still 
> >>needs to create one.
> >
> >Perhaps I don't understand clearly enough how you imagine this
> >scenario.  Surely when the snapshot is resumed it is sufficient for
> >the file boo.img to be identical ?
> 
> Not really.  When the snapshot is restored, what do you do with 
> boo.img?  Do you just use the main L1 table if no properly named 
> snapshots are available?  That seems quite error prone to me.

That's a fair question.

But if boo.img is used with several concurrent QEMUs - a legitimate
use of a read-only disk image - how can writing snapshot metadata to
it be safe?

TBH, the snapshot behaviour is really confusing and the snapshot
behaviour is not well documented.  Let's see:

   1. It will write a snapshot record to read only qcow2 images, but
      not to raw images?  So they *behave* differently - it's not
      merely a different format, it has side effects.  What if I don't
      want side effects, I just want a compact format?

   2. You *need* the snapshot record stored in qcow2, yet it's ok that
      raw doesn't store it?  Seems to me sometimes I don't need the
      snapshot record, it would be nice if I could request not to have
      it.  I always resume from the last saved snapshot anyway - which
      was always made with the CPU stopped.  (Simulated suspend/resume).

   3. The documentation (that I found) does not explain that snapshot
      records the *disk* state as well as the machine state.  This was
      a big surprise to me.  It does say you need at least one qcow2
      file before snapshot is possible.

   4. Which file is the machine state stored in?  The first one on the
      command line, or the first disk index?

   5. As the disk state is snapshotted - how do I extract a
      snapshotted disk e.g. to "qemu-img convert" it or transport it
      into something else?  Can I delete a snapshot without starting
      qemu with the *exact same arguments* as before, except -S, and
      doing it from the monitor?

   6. What do "commit" or "qemu-img commit" do to snapshots?  Do they
      break all snapshots but the current one?

   7. What happens if qemu dies / is killed / host crashes / power
      fails during "commit" or "savevm"?  Does it leave the files
      inconsistent and the VM wrecked?  Both functions can take quite
      a long time.

   8. Sometimes I want a (machine-state) snapshot and I *don't* want
      to use qcow2 for the disk image.  It seems non-orthogonal that I
      can use raw images (or other formats) for all but one disk - ok
      I have to be careful to only resume from that particular
      snapshot, or by rebooting afresh (simulated unclean boot) - but
      I can't use raw images for all disks.

   9. Sometimes I want a disk-state snapshot (now that I know about
      them :-) and I *don't* want a machine-state snapshot.  In other
      words, I may want to boot using a disk snapshotted earlier,
      without initialising device state from that snapshot -
      especially when using a much different version of QEMU, KVM or
      Xen.  There is no harm in using the disk - it just looks like a
      CPU reset to the guest, which is acceptable - even clean if the
      save happened with the guest in a safe state.  Currently I am
      using "qemu-img -b" branches to get a similar effect -
      snapshotting disks seems much better, since you don't have long
      commit pauses to tidy up.

> Another example is introducing a copy-on-read disk.  This would be 
> useful for image stream.  Think a qcow2 file that backs an http block 
> device.  The perfect use-case for something like this is an ISO image 
> which a user would want to export to the guest read-only.  However, we 
> need to modify the qcow2 image as the guest reads it (to do the 
> copy-on-read).

That's a good example.  If copy-on-read is implemented, you won't see
me or anyone objecting to it opening the file writable!

> N.B. I've said before that there's no reason that a read-only disk 
> cannot result in the file being opened O_RDONLY (for raw in particular) 
> but that is a detail of each block device and I don't think it should be 
> the case for qcow2.

Another reason why I've begun recommending to clients to stop using
qcow2 then for important VMs (the other is possible corruption on qemu
death / power failure), unless they have really tight space issues.

-- Jamie