qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Anthony Liguori <anthony@codemonkey.ws>
To: qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] qcow2 - safe on kill?  safe on power fail?
Date: Mon, 21 Jul 2008 14:43:45 -0500	[thread overview]
Message-ID: <4884E6F1.5020205@codemonkey.ws> (raw)
In-Reply-To: <20080721181031.GA31773@shareable.org>

Jamie Lokier wrote:
> Quite a while ago, Anthony Liguori wrote:
>   
>> David Barrett wrote:
>>     
>>> I'm tracking down a image corruption issue and I'm curious if you can 
>>> answer the following:
>>>
>>> 1) Is there any difference between sending a "TERM" signal to the QEMU 
>>> process and typing "quit" at the monitor?
>>>       
>> Yes.  Since QEMU is single threaded, when you issue a quit, you know you 
>> aren't in the middle of writing qcow2 meta data to disk.
>>
>>     
>>> 2) Will sending TERM corrupt the 'gcow2' image (in ways other than 
>>> normal guest OS dirty shutdown)?
>>>       
>> Possibly, yes.
>>
>>     
>>> 3) Assuming I always start QEMU using "-loadvm", is there any risk in 
>>> using 'kill' to send SIGTERM to the QMEU process when done?
>>>       
>> Yes.  If you want to SIGTERM QEMU, the safest thing to do is use -snapshot.
>>     
>
> Just today, I had a running KVM instance for an important server (our
> busy mail server) lock up.  It was accepting VNC
> connections, but sending keystrokes, mouse movements and so on didn't
> do anything.  It had been running for several weeks without any problem.
> I don't have a report on whether there was a picture from VNC.
>
> Our system manager decided there was nothing else to do, and killed
> that process (SIGTERM), then restarted it.
>   

SIGTERM is about the worse thing you could do, but you're probably okay.

QCOW2 files have no journal, so they are not safe against unexpected 
power outages or hard crashes.  If you need a great deal of reliability, 
you should use a raw image.

With that said, let me explain exactly what circumstances corruption can 
occur in as it turns out that, in practice, the corruption window isn't 
that big.

Obviously there are no issues on the read path, so we'll stick strictly 
to the write path.

QEMU is single-threaded and QCOW2 supports asynchronous write 
operations.  There are two parts in this operation.  The first discovers 
what offset within the QCOW2 file to write to.  If the sector has been 
previously allocated, this will consist only of read operations.  It 
will then issue an asynchronous write operation to the allocated sector.

Since your guest probably is using a journalled file system, you will be 
okay if something happens before that data gets written to disk[1].

If the sector hasn't been previously allocated, then a new sector in the 
file needs to be allocated.  This is going to change metadata within the 
QCOW2 file and this is where it is possible to corrupt a disk image.  
The operation of allocating a new disk sector is completely synchronous 
so no other code runs until this completes.  Once the disk sector is 
allocated, you're safe again[1].

Since no other code runs during this period, bugs in the device 
emulation, a user closing the SDL window, and issuing quit in the 
monitor, will not corrupt the disk image.  Your guest may require an 
fsck but the QCOW2 image will be fine.

The only ways that you can cause corruption is if the QCOW2 sector 
allocation code is faulty (and you would be screwed no matter what here) 
or if you issue a SIGTERM/SIGKILL that interrupts the code while it's 
allocating a new sector.  If your guest is hung, chances are it's not 
actively writing to disk but this is why SIGTERM/SIGKILL is really a 
terrible thing to do.  It's really the only practical way to corrupt a 
disk image (short of a hard power outage).

If someone was sufficiently concerned, it's probably relatively straight 
forward to implement an fsck or journal for QCOW2.  This would allow the 
image to be recovered if the meta data somehow got corrupted.

With all this said, I've definitely seen corruption in QCOW2 images that 
were caused by crashing my host kernel.  I beat up on QEMU pretty badly 
though.  I think under normal circumstances, it's unlikely a user would 
see this in practice.

[1] It's not quite that simple.  Your host doesn't necessarily guarantee 
integrity unless 1) you've got battery backed cache on your disks 
(commodity disks aren't battery backed typically) or you've disabled 
write-back 2) you have a file system that supports barriers and barriers 
are enabled by default (they aren't enabled by default with ext2/3) 3) 
you are running QEMU with cache=off to disable host write caching.  
Basically, chances are your data is not as safe as you assume it is and 
QEMU adds very little additional uncertainty to that unless you do 
something nasty like SIGKILL/SIGTERM while doing heavy disk IO.

Regards,

Anthony Liguori

> (Unfortunately, he didn't know about the monitor and "quit".)
>
> So far, it's looking ok, but I'm concerned about the possibility of
> qcow2 corruption which the above mail says is possible.
>
> Even if we could have used the monitor *this* time, QEMU is quite a
> complex piece of software which we can't assume to be bug free.  what
> happens if KVM/QEMU locks up or crashes, in the following ways:
>
>     - Some emulated driver crashes.  I *have* seen this happen.
>       (Try '-net user -net user' on the command line.  Ok, now we know not
>       to do it...).  The process dies.
>
>     - Some emulated driver gets stuck in a loop.  You know, a bug.
>       No choice but to kill the process.
>
>     - The host machine loses power.  Host's journalled filesystem is
>       fine, but what about the qcow2 images of guests?
>
> I'm imagining that qcow2 is like a very simple filesystem format.
> Real filesystems have "fsck" and/or use journalling or similar to be
> robust.  Is there a "fsck" equivalent for qcow2?  (Maybe running
> qemu-img convert is that?)  Does it use journalling or other
> techniques internally to make sure it is difficult to corrupt, even if
> the host dies unexpectedly?
>
> If qcow2 is not resistant to sudden failures, would it be difficult to
> make it more robust?
>
> (One method which comes to mind is to use a daemon process just to
> handle the disk image, communicating with QEMU.  QEMU is complex and
> may occasionally have problems, but the daemon would do just one
> thing, so quite likely to survive.  It won't be robust against power
> failure, though, and it sounds like performance might suck.)
>
> Or should we avoid using qcow2, for important guest servers that would
> be expensive or impossible to reconstruct?
>
> If not qcow2, are any of the other supported incremental formats
> robust in these ways, e.g. the VMware one?
>
> Thanks,
> -- Jamie
>
>
>   

  reply	other threads:[~2008-07-21 19:44 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-03-05 21:18 [Qemu-devel] Signal handling and qcow2 image corruption David Barrett
2008-03-05 21:55 ` Anthony Liguori
2008-03-05 23:48   ` David Barrett
2008-03-06  6:57   ` Avi Kivity
2008-07-21 18:10   ` [Qemu-devel] qcow2 - safe on kill? safe on power fail? Jamie Lokier
2008-07-21 19:43     ` Anthony Liguori [this message]
2008-07-21 21:26       ` Jamie Lokier
2008-07-21 22:14         ` Anthony Liguori
2008-07-21 23:47           ` Jamie Lokier
2008-07-22  6:06           ` Avi Kivity
2008-07-22 14:08             ` Anthony Liguori
2008-07-22 14:46               ` Jamie Lokier
2008-07-22 19:11               ` Avi Kivity
2008-07-22 14:32             ` Jamie Lokier
2008-07-21 22:00       ` Andreas Schwab
2008-07-21 22:15         ` Anthony Liguori
2008-07-21 22:22           ` David Barrett
2008-07-21 22:50             ` Anthony Liguori
2008-07-22  6:07           ` Avi Kivity
2008-07-22 14:11             ` Anthony Liguori
2008-07-22 14:36               ` Avi Kivity
2008-07-22 16:16                 ` Jamie Lokier
2008-07-22 19:13                   ` Avi Kivity
2008-07-22 20:04                     ` Jamie Lokier
2008-07-22 21:25                       ` Avi Kivity
2008-07-22 14:22             ` Jamie Lokier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4884E6F1.5020205@codemonkey.ws \
    --to=anthony@codemonkey.ws \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).