From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1KT3ji-0003bW-Hk for qemu-devel@nongnu.org; Tue, 12 Aug 2008 19:57:10 -0400 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1KT3jg-0003Ys-W6 for qemu-devel@nongnu.org; Tue, 12 Aug 2008 19:57:10 -0400 Received: from [199.232.76.173] (port=37445 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1KT3jg-0003Yd-Qh for qemu-devel@nongnu.org; Tue, 12 Aug 2008 19:57:08 -0400 Received: from mail2.shareable.org ([80.68.89.115]:33054) by monty-python.gnu.org with esmtps (TLS-1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1KT3jg-0004lj-2D for qemu-devel@nongnu.org; Tue, 12 Aug 2008 19:57:08 -0400 Received: from jamie by mail2.shareable.org with local (Exim 4.63) (envelope-from ) id 1KT3jU-0007fr-Bf for qemu-devel@nongnu.org; Wed, 13 Aug 2008 00:56:56 +0100 Date: Wed, 13 Aug 2008 00:56:56 +0100 From: Jamie Lokier Subject: Re: [Qemu-devel] PATCH: v3 Allow control over drive file open mode Message-ID: <20080812235655.GA29029@shareable.org> References: <20080801102949.GN23993@redhat.com> <48932751.5040303@codemonkey.ws> <18579.16494.791024.25671@mariner.uk.xensource.com> <4893448C.1050508@codemonkey.ws> <18592.25422.257378.988970@mariner.uk.xensource.com> <48A190A9.1070902@codemonkey.ws> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <48A190A9.1070902@codemonkey.ws> Reply-To: qemu-devel@nongnu.org List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel@nongnu.org Anthony Liguori wrote: > Which is fine, but you're missing my fundamental argument. Having a > read-only flag exposed to the user should not translate to "open the > underlying files O_RDONLY". That's an implementation detail. If that's > what ends up happening, great. However, I'll make the argument also > that for certain circumstances that's not actually what we want. And when the file is not owned by the user running QEMU? I.e. it's shared. > >>qemu-system-x86_64 -drive file=foo.img -drive file=boo.img,read-only=on > >> > >>Down the road, you do a savevm. savevm has to create a checkpoint in > >>all disk images. The checkpoint will be empty in boo.img but it still > >>needs to create one. > > > >Perhaps I don't understand clearly enough how you imagine this > >scenario. Surely when the snapshot is resumed it is sufficient for > >the file boo.img to be identical ? > > Not really. When the snapshot is restored, what do you do with > boo.img? Do you just use the main L1 table if no properly named > snapshots are available? That seems quite error prone to me. That's a fair question. But if boo.img is used with several concurrent QEMUs - a legitimate use of a read-only disk image - how can writing snapshot metadata to it be safe? TBH, the snapshot behaviour is really confusing and the snapshot behaviour is not well documented. Let's see: 1. It will write a snapshot record to read only qcow2 images, but not to raw images? So they *behave* differently - it's not merely a different format, it has side effects. What if I don't want side effects, I just want a compact format? 2. You *need* the snapshot record stored in qcow2, yet it's ok that raw doesn't store it? Seems to me sometimes I don't need the snapshot record, it would be nice if I could request not to have it. I always resume from the last saved snapshot anyway - which was always made with the CPU stopped. (Simulated suspend/resume). 3. The documentation (that I found) does not explain that snapshot records the *disk* state as well as the machine state. This was a big surprise to me. It does say you need at least one qcow2 file before snapshot is possible. 4. Which file is the machine state stored in? The first one on the command line, or the first disk index? 5. As the disk state is snapshotted - how do I extract a snapshotted disk e.g. to "qemu-img convert" it or transport it into something else? Can I delete a snapshot without starting qemu with the *exact same arguments* as before, except -S, and doing it from the monitor? 6. What do "commit" or "qemu-img commit" do to snapshots? Do they break all snapshots but the current one? 7. What happens if qemu dies / is killed / host crashes / power fails during "commit" or "savevm"? Does it leave the files inconsistent and the VM wrecked? Both functions can take quite a long time. 8. Sometimes I want a (machine-state) snapshot and I *don't* want to use qcow2 for the disk image. It seems non-orthogonal that I can use raw images (or other formats) for all but one disk - ok I have to be careful to only resume from that particular snapshot, or by rebooting afresh (simulated unclean boot) - but I can't use raw images for all disks. 9. Sometimes I want a disk-state snapshot (now that I know about them :-) and I *don't* want a machine-state snapshot. In other words, I may want to boot using a disk snapshotted earlier, without initialising device state from that snapshot - especially when using a much different version of QEMU, KVM or Xen. There is no harm in using the disk - it just looks like a CPU reset to the guest, which is acceptable - even clean if the save happened with the guest in a safe state. Currently I am using "qemu-img -b" branches to get a similar effect - snapshotting disks seems much better, since you don't have long commit pauses to tidy up. > Another example is introducing a copy-on-read disk. This would be > useful for image stream. Think a qcow2 file that backs an http block > device. The perfect use-case for something like this is an ISO image > which a user would want to export to the guest read-only. However, we > need to modify the qcow2 image as the guest reads it (to do the > copy-on-read). That's a good example. If copy-on-read is implemented, you won't see me or anyone objecting to it opening the file writable! > N.B. I've said before that there's no reason that a read-only disk > cannot result in the file being opened O_RDONLY (for raw in particular) > but that is a detail of each block device and I don't think it should be > the case for qcow2. Another reason why I've begun recommending to clients to stop using qcow2 then for important VMs (the other is possible corruption on qemu death / power failure), unless they have really tight space issues. -- Jamie