From: Jamie Lokier <jamie@shareable.org>
To: Marc Bevand <m.bevand@gmail.com>
Cc: qemu-devel@nongnu.org, Gleb Natapov <gleb@redhat.com>,
kvm@vger.kernel.org
Subject: Re: [Qemu-devel] Re: qcow2 corruption observed, fixed by reverting old change
Date: Sun, 15 Feb 2009 02:37:18 +0000 [thread overview]
Message-ID: <20090215023718.GD9281@shareable.org> (raw)
In-Reply-To: <aaccfcb60902132231v53b54070sf7a0151ee565214@mail.gmail.com>
Marc Bevand wrote:
> On Fri, Feb 13, 2009 at 8:23 AM, Jamie Lokier <jamie@shareable.org> wrote:
> >
> > Marc.. this is quite a serious bug you've reported. Is there a
> > reason you didn't report it earlier?
>
> Because I only started hitting that bug a couple weeks ago after
> having upgraded to a buggy kvm version.
>
> > Is there a way to restructure the code and/or how it works so it's
> > more clearly correct?
>
> I am seriously concerned about the general design of qcow2. The code
> base is more complex than it needs to be, the format itself is
> susceptible to race conditions causing cluster leaks when updating
> some internal datastructures, it gets easily fragmented, etc.
When I read it, I thought the code was remarkably compact for what it
does, although I agree that the leaks, fragmentation and inconsistency
on crashes are serious. From elsewhere it sounds like the refcount
update cost is significant too.
> I am considering implementing a new disk image format that supports
> base images, snapshots (of the guest state), clones (of the disk
> content); that has a radically simpler design & code base; that is
> always consistent "on disk"; that is friendly to delta diffing (ie.
> space-efficient when used with ZFS snapshots or rsync); and that makes
> use of checksumming & replication to detect & fix corruption of
> critical data structures (ideally this should be implemented by the
> filesystem, unfortunately ZFS is not available everywhere :D).
You have just described a high quality modern filesystem or database
engine; both would certainly be far more complex than qcow2's code.
Especially with checksumming and replication :)
ZFS isn't everywhere, but it looks like everyone wants to clone ZFS's
best features everywhere (but not it's worst feature: lots of memory
required).
I've had similar thoughts myself, by the way :-)
> I believe the key to achieve these (seemingly utopian) goals is to
> represent a disk "image" as a set of sparse files, 1 per
> snapshot/clone.
You can already do this, if your filesystem supports snapshotting. On
Linux hosts, any filesystem can snapshot by using LVM underneath it
(although it's not pretty to do). A few experimental Linux
filesystems let you snapshot at the filesystem level.
A feature you missed in the utopian vision is sharing backing store
for equal parts of files between different snapshots _after_ they've
been written in separate branches (with the same data), and also among
different VMs. It's becoming stylish to put similarity detection in
the filesystem somewhere too :-)
-- Jamie
next prev parent reply other threads:[~2009-02-15 2:37 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-02-11 7:00 [Qemu-devel] qcow2 corruption observed, fixed by reverting old change Jamie Lokier
2009-02-11 9:57 ` Kevin Wolf
2009-02-11 11:27 ` Jamie Lokier
2009-02-11 11:41 ` Jamie Lokier
2009-02-11 12:41 ` Kevin Wolf
2009-02-11 16:48 ` Jamie Lokier
2009-02-12 22:57 ` Consul
2009-02-12 23:19 ` [Qemu-devel] " Consul
2009-02-13 7:50 ` Marc Bevand
2009-02-16 12:44 ` [Qemu-devel] " Kevin Wolf
2009-02-17 0:43 ` Jamie Lokier
2009-03-06 22:37 ` Filip Navara
2009-02-12 5:45 ` Chris Wright
2009-02-12 11:08 ` Johannes Schindelin
[not found] ` <loom.20090213T060937-534@post.gmane.org>
2009-02-13 11:16 ` [Qemu-devel] " Kevin Wolf
2009-02-13 16:23 ` Jamie Lokier
2009-02-13 18:43 ` Chris Wright
2009-02-14 6:31 ` Marc Bevand
2009-02-14 22:28 ` Dor Laor
2009-02-15 2:27 ` Jamie Lokier
2009-02-15 7:56 ` Marc Bevand
2009-02-15 2:37 ` Jamie Lokier [this message]
2009-02-15 10:57 ` Gleb Natapov
2009-02-15 11:46 ` Marc Bevand
2009-02-15 11:54 ` Marc Bevand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090215023718.GD9281@shareable.org \
--to=jamie@shareable.org \
--cc=gleb@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=m.bevand@gmail.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).