From: Anthony Liguori <anthony@codemonkey.ws>
To: qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] qcow2 - safe on kill? safe on power fail?
Date: Mon, 21 Jul 2008 17:14:50 -0500 [thread overview]
Message-ID: <48850A5A.3070106@codemonkey.ws> (raw)
In-Reply-To: <20080721212604.GA2823@shareable.org>
Jamie Lokier wrote:
>> If the sector hasn't been previously allocated, then a new sector in the
>> file needs to be allocated. This is going to change metadata within the
>> QCOW2 file and this is where it is possible to corrupt a disk image.
>> The operation of allocating a new disk sector is completely synchronous
>> so no other code runs until this completes. Once the disk sector is
>> allocated, you're safe again[1].
>>
>
> My main concern is corruption of the QCOW2 sector allocation map, and
> subsequently QEMU/KVM breaking or going wildly haywire with that file.
>
> With a normal filesystem, sure, there are lots of ways to get
> corruption when certain events happen. But you don't lose the _whole_
> filesystem.
>
Sure you can. If you don't have a battery backed disk cache and are
using write-back (which is usually the default), you can definitely get
corruption of the journal. Likewise, under the right scenarios, you
will get journal corruption with the default mount options of ext3
because it doesn't use barriers.
This is very hard to see happen in practice though because these windows
are very small--just like with QEMU.
> My concern is that if the QCOW2 sector allocation map is corrupted by
> these events, you may lose the _whole_ virtual machine, which can be a
> pretty big loss.
>
> Is the format robust enough to prevent that from being a problem?
>
It could be extended to contain a journal. But that doesn't guarantee
that you won't lose data because of your file system failing, that's the
point I'm making.
> (Backups help (but not good enough for things like a mail or database
> server). But how do you safely backup the image of a VM that is
> running 24x7? LVM snapshots are the only way I've thought of, and
> they have a barrier problem, see below.)
>
>
>> you have a file system that supports barriers and barriers
>> are enabled by default (they aren't enabled by default with ext2/3)
>>
>
> There was recent talk of enabling them by default for ext3.
>
It's not going to happen.
>> you are running QEMU with cache=off to disable host write caching.
>>
>
> Doesn't that use O_DIRECT? O_DIRECT writes don't use barriers, and
> fsync() does not deterministically issue a disk barrier if there's no
> metadata change, so O_DIRECT writes are _less_ safe with disks which
> have write-cache enabled than using normal writes.
>
It depends on the filesystem. ext3 never issues any barriers by default
:-)
I would think a good filesystem would issue a barrier after an O_DIRECT
write.
> What about using a partition, such as an LVM volume (so it can be
> snapshotted without having to take down the VM)? I'm under the
> impression there is no way to issue disk barrier flushes to a
> partition, so that's screwed too. (Besides, LVM doesn't propagate
> barrier requests from filesystems either...)
>
Unfortunately there is no userspace API to inject barriers in a disk.
fdatasync() maybe but that's not the same behavior as a barrier? I
don't think IDE supports barriers at all FWIW. It only has a write-back
and write-through mode so if you care about data, you would have to
enable write-through in your guest.
> The last two paragraphs apply when using _any_ file format and break
> the integrity of guest journalling filesystems, not just qcow2.
>
>
>> Since no other code runs during this period, bugs in the device
>> emulation, a user closing the SDL window, and issuing quit in the
>> monitor, will not corrupt the disk image. Your guest may require an
>> fsck but the QCOW2 image will be fine.
>>
>
> Does this apply to KVM as well? I thought KVM had a separate threads
> for I/O, so problems in another subsystem might crash an I/O thread in
> mid action. Is that work in progress?
>
Not really. There is a big lock that prevents two threads from every
running at the same time within QEMU.
Regards,
Anthony Liguori
> Thanks again,
> -- Jamie
>
>
>
next prev parent reply other threads:[~2008-07-21 22:15 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-03-05 21:18 [Qemu-devel] Signal handling and qcow2 image corruption David Barrett
2008-03-05 21:55 ` Anthony Liguori
2008-03-05 23:48 ` David Barrett
2008-03-06 6:57 ` Avi Kivity
2008-07-21 18:10 ` [Qemu-devel] qcow2 - safe on kill? safe on power fail? Jamie Lokier
2008-07-21 19:43 ` Anthony Liguori
2008-07-21 21:26 ` Jamie Lokier
2008-07-21 22:14 ` Anthony Liguori [this message]
2008-07-21 23:47 ` Jamie Lokier
2008-07-22 6:06 ` Avi Kivity
2008-07-22 14:08 ` Anthony Liguori
2008-07-22 14:46 ` Jamie Lokier
2008-07-22 19:11 ` Avi Kivity
2008-07-22 14:32 ` Jamie Lokier
2008-07-21 22:00 ` Andreas Schwab
2008-07-21 22:15 ` Anthony Liguori
2008-07-21 22:22 ` David Barrett
2008-07-21 22:50 ` Anthony Liguori
2008-07-22 6:07 ` Avi Kivity
2008-07-22 14:11 ` Anthony Liguori
2008-07-22 14:36 ` Avi Kivity
2008-07-22 16:16 ` Jamie Lokier
2008-07-22 19:13 ` Avi Kivity
2008-07-22 20:04 ` Jamie Lokier
2008-07-22 21:25 ` Avi Kivity
2008-07-22 14:22 ` Jamie Lokier
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=48850A5A.3070106@codemonkey.ws \
--to=anthony@codemonkey.ws \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).