public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
* Win2003 disk corruption with kvm-1.0. and virtio
@ 2013-02-12 14:30 Sylvain Bauza
  2013-02-13  7:21 ` Philipp Hahn
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Sylvain Bauza @ 2013-02-12 14:30 UTC (permalink / raw)
  To: kvm

Hi,

We currently run Openstack Essex hosts with KVM-1.0 (Ubuntu 12.04) 
instances with qcow2,virtio,cache=none

For Linux VMs, no trouble at all but we do observe filesystem corruption 
and inconsistency (missing DLLs, CHKDSK asked by EventViewer, failure at 
reboot) with some of our Windows 2003 SP2 64b images.

At first boot, stress tests (CrystalDiskMark 3.0.2 and intensive CHKDSK) 
don't show up problems. It is only appearing 6 or 12h later.

Do you have any idea on how to prevent it ? Is cache=writethrough an 
acceptable solution ? We don't want to leave qcow2 image format as it 
does allow to do live snapshots et al.

Thanks for your inputs,
-Sylvain Bauza

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Win2003 disk corruption with kvm-1.0. and virtio
  2013-02-12 14:30 Win2003 disk corruption with kvm-1.0. and virtio Sylvain Bauza
@ 2013-02-13  7:21 ` Philipp Hahn
  2013-02-13  9:56   ` Sylvain Bauza
  2013-02-14  8:23   ` Sylvain Bauza
  2013-02-13  9:03 ` Stefan Hajnoczi
  2013-02-14  8:17 ` Stefan Hajnoczi
  2 siblings, 2 replies; 13+ messages in thread
From: Philipp Hahn @ 2013-02-13  7:21 UTC (permalink / raw)
  To: kvm

Hello,

On Tuesday 12 February 2013 15:30:37 Sylvain Bauza wrote:
> We currently run Openstack Essex hosts with KVM-1.0 (Ubuntu 12.04)
> instances with qcow2,virtio,cache=none

The default answer is to update your qemu-kvm version: 1.0 is very old, qemu-
kvm is fully merged into upstream qemu, which is currently preparing its 1.4 
release.
There have been many fixes to qemi and the qcow2 handling: I know of at least 
one serious problem not fixed up to qemu-1.1.

Sincerely
Philipp
-- 
Philipp Hahn           Open Source Software Engineer      hahn@univention.de
Univention GmbH        be open.                       fon: +49 421 22 232- 0
Mary-Somerville-Str.1  D-28359 Bremen                 fax: +49 421 22 232-99
                                                   http://www.univention.de/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Win2003 disk corruption with kvm-1.0. and virtio
  2013-02-12 14:30 Win2003 disk corruption with kvm-1.0. and virtio Sylvain Bauza
  2013-02-13  7:21 ` Philipp Hahn
@ 2013-02-13  9:03 ` Stefan Hajnoczi
  2013-02-13  9:53   ` Sylvain Bauza
  2013-02-14  8:17 ` Stefan Hajnoczi
  2 siblings, 1 reply; 13+ messages in thread
From: Stefan Hajnoczi @ 2013-02-13  9:03 UTC (permalink / raw)
  To: Sylvain Bauza; +Cc: kvm

On Tue, Feb 12, 2013 at 03:30:37PM +0100, Sylvain Bauza wrote:
> We currently run Openstack Essex hosts with KVM-1.0 (Ubuntu 12.04)
> instances with qcow2,virtio,cache=none
> 
> For Linux VMs, no trouble at all but we do observe filesystem
> corruption and inconsistency (missing DLLs, CHKDSK asked by
> EventViewer, failure at reboot) with some of our Windows 2003 SP2
> 64b images.
> 
> At first boot, stress tests (CrystalDiskMark 3.0.2 and intensive
> CHKDSK) don't show up problems. It is only appearing 6 or 12h later.
> 
> Do you have any idea on how to prevent it ? Is cache=writethrough an
> acceptable solution ? We don't want to leave qcow2 image format as
> it does allow to do live snapshots et al.

How are you taking live snapshots?  qemu-img should not be used on a
disk image that is currently open by a running guest, it may lead to
corruption.

Stefan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Win2003 disk corruption with kvm-1.0. and virtio
  2013-02-13  9:03 ` Stefan Hajnoczi
@ 2013-02-13  9:53   ` Sylvain Bauza
  2013-02-14  8:15     ` Stefan Hajnoczi
  0 siblings, 1 reply; 13+ messages in thread
From: Sylvain Bauza @ 2013-02-13  9:53 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: kvm

Hi Stefan,
As per documentation, Nova (Openstack Compute layer) is doing a 
'qemu-img convert -s' against a running instance.
http://docs.openstack.org/trunk/openstack-compute/admin/content/creating-images-from-running-instances.html

Do you think it could be our root cause ?
Btw, I tested cache=writethrough and I observed image corruption after 
some time ('qemu-img check' returns errors)

Thanks for your input,
-Sylvain

Le 13/02/2013 10:03, Stefan Hajnoczi a écrit :
> On Tue, Feb 12, 2013 at 03:30:37PM +0100, Sylvain Bauza wrote:
>> We currently run Openstack Essex hosts with KVM-1.0 (Ubuntu 12.04)
>> instances with qcow2,virtio,cache=none
>>
>> For Linux VMs, no trouble at all but we do observe filesystem
>> corruption and inconsistency (missing DLLs, CHKDSK asked by
>> EventViewer, failure at reboot) with some of our Windows 2003 SP2
>> 64b images.
>>
>> At first boot, stress tests (CrystalDiskMark 3.0.2 and intensive
>> CHKDSK) don't show up problems. It is only appearing 6 or 12h later.
>>
>> Do you have any idea on how to prevent it ? Is cache=writethrough an
>> acceptable solution ? We don't want to leave qcow2 image format as
>> it does allow to do live snapshots et al.
> How are you taking live snapshots?  qemu-img should not be used on a
> disk image that is currently open by a running guest, it may lead to
> corruption.
>
> Stefan


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Win2003 disk corruption with kvm-1.0. and virtio
  2013-02-13  7:21 ` Philipp Hahn
@ 2013-02-13  9:56   ` Sylvain Bauza
  2013-02-13 16:03     ` weber
  2013-02-14  8:23   ` Sylvain Bauza
  1 sibling, 1 reply; 13+ messages in thread
From: Sylvain Bauza @ 2013-02-13  9:56 UTC (permalink / raw)
  To: Philipp Hahn; +Cc: kvm

Hi Philipp,

Indeed. Qemu-kvm.1.0 is pretty old but this version is the stable one 
for Ubuntu Precise (12.04 LTS).
No backport is available for later versions, I need to install by hand.

Do you know if qemu-1.3 (with KVM support) is fully compatible with 
qemu-kvm.1.0 ?
As I'm relying on Openstack Nova for upper hypervisor layer, it needs to 
be 100% matching.

Thanks,
-Sylvain


Le 13/02/2013 08:21, Philipp Hahn a écrit :
> Hello,
>
> On Tuesday 12 February 2013 15:30:37 Sylvain Bauza wrote:
>> We currently run Openstack Essex hosts with KVM-1.0 (Ubuntu 12.04)
>> instances with qcow2,virtio,cache=none
> The default answer is to update your qemu-kvm version: 1.0 is very old, qemu-
> kvm is fully merged into upstream qemu, which is currently preparing its 1.4
> release.
> There have been many fixes to qemi and the qcow2 handling: I know of at least
> one serious problem not fixed up to qemu-1.1.
>
> Sincerely
> Philipp


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Win2003 disk corruption with kvm-1.0. and virtio
  2013-02-13  9:56   ` Sylvain Bauza
@ 2013-02-13 16:03     ` weber
  2013-02-14  5:27       ` Michael Tokarev
  0 siblings, 1 reply; 13+ messages in thread
From: weber @ 2013-02-13 16:03 UTC (permalink / raw)
  To: Kvm


there are known problems, WHEN  I/O "native" and cache=writethrough.

On I/O "native" put cache to "none" otherwise your data can get broken.
Check Redhat Pages for that.

marko

Am 2013-02-13 10:56, schrieb Sylvain Bauza:
> Hi Philipp,
>
> Indeed. Qemu-kvm.1.0 is pretty old but this version is the stable one
> for Ubuntu Precise (12.04 LTS).
> No backport is available for later versions, I need to install by 
> hand.
>
> Do you know if qemu-1.3 (with KVM support) is fully compatible with
> qemu-kvm.1.0 ?
> As I'm relying on Openstack Nova for upper hypervisor layer, it needs
> to be 100% matching.
>
> Thanks,
> -Sylvain
>
>
> Le 13/02/2013 08:21, Philipp Hahn a écrit :
>> Hello,
>>
>> On Tuesday 12 February 2013 15:30:37 Sylvain Bauza wrote:
>>> We currently run Openstack Essex hosts with KVM-1.0 (Ubuntu 12.04)
>>> instances with qcow2,virtio,cache=none
>> The default answer is to update your qemu-kvm version: 1.0 is very 
>> old, qemu-
>> kvm is fully merged into upstream qemu, which is currently preparing 
>> its 1.4
>> release.
>> There have been many fixes to qemi and the qcow2 handling: I know of 
>> at least
>> one serious problem not fixed up to qemu-1.1.
>>
>> Sincerely
>> Philipp
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Win2003 disk corruption with kvm-1.0. and virtio
  2013-02-13 16:03     ` weber
@ 2013-02-14  5:27       ` Michael Tokarev
  0 siblings, 0 replies; 13+ messages in thread
From: Michael Tokarev @ 2013-02-14  5:27 UTC (permalink / raw)
  To: weber; +Cc: Kvm, Sylvain Bauza

[Please stop top-posting.  Thank you]

13.02.2013 20:03, weber@zackbummfertig.de wrote:
>
> there are known problems, WHEN  I/O "native" and cache=writethrough.

> On I/O "native" put cache to "none" otherwise your data can get broken.
> Check Redhat Pages for that.

Which problem is that?

And what is "I/O native" ?  Maybe you mean "aio", not "I/O" ?

If the talk is about aio=native, that mode does not work for regular
files, it gets "downgraded" to aio=threads automatically.  So there
should be nothing to change already.

Please elaborate.

/mjt


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Win2003 disk corruption with kvm-1.0. and virtio
  2013-02-13  9:53   ` Sylvain Bauza
@ 2013-02-14  8:15     ` Stefan Hajnoczi
  2013-02-14 10:11       ` Sylvain Bauza
  0 siblings, 1 reply; 13+ messages in thread
From: Stefan Hajnoczi @ 2013-02-14  8:15 UTC (permalink / raw)
  To: Sylvain Bauza; +Cc: kvm

On Wed, Feb 13, 2013 at 10:53:14AM +0100, Sylvain Bauza wrote:
> As per documentation, Nova (Openstack Compute layer) is doing a
> 'qemu-img convert -s' against a running instance.
> http://docs.openstack.org/trunk/openstack-compute/admin/content/creating-images-from-running-instances.html

That command will not corrupt the running instance because it opens the
image read-only.

It is possible that the new image is corrupted since qemu-img is reading
from a qcow2 file that is changing underneath it.  However, the chance
is small as long as the snapshot isn't deleted while qemu-img convert is
running.

So this doesn't sound like the cause of the problems you are seeing.

Stefan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Win2003 disk corruption with kvm-1.0. and virtio
  2013-02-12 14:30 Win2003 disk corruption with kvm-1.0. and virtio Sylvain Bauza
  2013-02-13  7:21 ` Philipp Hahn
  2013-02-13  9:03 ` Stefan Hajnoczi
@ 2013-02-14  8:17 ` Stefan Hajnoczi
  2 siblings, 0 replies; 13+ messages in thread
From: Stefan Hajnoczi @ 2013-02-14  8:17 UTC (permalink / raw)
  To: Sylvain Bauza; +Cc: kvm

On Tue, Feb 12, 2013 at 03:30:37PM +0100, Sylvain Bauza wrote:
> We currently run Openstack Essex hosts with KVM-1.0 (Ubuntu 12.04)
> instances with qcow2,virtio,cache=none
> 
> For Linux VMs, no trouble at all but we do observe filesystem
> corruption and inconsistency (missing DLLs, CHKDSK asked by
> EventViewer, failure at reboot) with some of our Windows 2003 SP2
> 64b images.
> 
> At first boot, stress tests (CrystalDiskMark 3.0.2 and intensive
> CHKDSK) don't show up problems. It is only appearing 6 or 12h later.

Are you running the latest virtio-win drivers?  See
http://www.linux-kvm.org/page/WindowsGuestDrivers/Download_Drivers.

Have you tested with IDE instead of virtio on the Windows guests?

Stefan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Win2003 disk corruption with kvm-1.0. and virtio
  2013-02-13  7:21 ` Philipp Hahn
  2013-02-13  9:56   ` Sylvain Bauza
@ 2013-02-14  8:23   ` Sylvain Bauza
  1 sibling, 0 replies; 13+ messages in thread
From: Sylvain Bauza @ 2013-02-14  8:23 UTC (permalink / raw)
  To: Philipp Hahn; +Cc: kvm

Hi,
Latest updates, I tried using :
  - cache=writethrough / kvm-1.0 : errors in qcow2
  - cache=none/kvm-1.3 : no errors using 'qemu-img check', but 
EventViewer is complaining

I have to admit I'm lost. I cannot understand what is causing this 
corruption, only appearing on some Windows instances...
Please find below the executable path :
117      13781     1  4 Feb13 ?        00:41:43 /usr/bin/kvm -S -M 
pc-1.3 -enable-kvm -m 2048 -smp 1,sockets=1,cores=1,threads=1 -name 
instance-0000004f -uuid 26801166-aa03-4bbc-b062-da47168a664c 
-nodefconfig -nodefaults -chardev 
socket,id=charmonitor,path=/var/lib/libvirt/qemu/instance-0000004f.monitor,server,nowait 
-mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc 
-no-shutdown -boot c -drive 
file=/var/lib/nova/instances/instance-0000004f/disk,if=none,id=drive-virtio-disk0,format=qcow2,cache=none 
-device 
virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 
-netdev tap,fd=21,id=hostnet0,vhost=on,vhostfd=22 -device 
virtio-net-pci,netdev=hostnet0,id=net0,mac=fa:16:3e:7a:a1:61,bus=pci.0,addr=0x3 
-chardev 
file,id=charserial0,path=/var/lib/nova/instances/instance-0000004f/console.log 
-device isa-serial,chardev=charserial0,id=serial0 -chardev 
pty,id=charserial1 -device isa-serial,chardev=charserial1,id=serial1 
-usb -device usb-tablet,id=input0 -vnc 192.168.1.155:2 -k fr -vga cirrus 
-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5

Last try, I googled and found that virtio network can be buggy. I will 
try to switch back to another driver and see.
By the way, all these Windows instances do have virtio SCSI drivers up 
to date.



Le 13/02/2013 08:21, Philipp Hahn a écrit :
> Hello,
>
> On Tuesday 12 February 2013 15:30:37 Sylvain Bauza wrote:
>> We currently run Openstack Essex hosts with KVM-1.0 (Ubuntu 12.04)
>> instances with qcow2,virtio,cache=none
> The default answer is to update your qemu-kvm version: 1.0 is very old, qemu-
> kvm is fully merged into upstream qemu, which is currently preparing its 1.4
> release.
> There have been many fixes to qemi and the qcow2 handling: I know of at least
> one serious problem not fixed up to qemu-1.1.
>
> Sincerely
> Philipp


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Win2003 disk corruption with kvm-1.0. and virtio
  2013-02-14  8:15     ` Stefan Hajnoczi
@ 2013-02-14 10:11       ` Sylvain Bauza
  2013-03-12 15:48         ` Sylvain Bauza
  0 siblings, 1 reply; 13+ messages in thread
From: Sylvain Bauza @ 2013-02-14 10:11 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: kvm

Interesting point you mention. Even if qcow2 is read only, the image is 
changing (especially, I'm running IIS with ASP support and VB DLLs) 
while the snapshot is taken.

As asked in a second post, I'm running with latest Windows virtio 
drivers, but I only apply a virtio driver update *after* running an 
instance, not before taking the snapshot.

What I'll try : run an instance, update the driver, stop the instance, 
do a qemu-img convert once the instance is stopped.


Le 14/02/2013 09:15, Stefan Hajnoczi a écrit :
> On Wed, Feb 13, 2013 at 10:53:14AM +0100, Sylvain Bauza wrote:
>> As per documentation, Nova (Openstack Compute layer) is doing a
>> 'qemu-img convert -s' against a running instance.
>> http://docs.openstack.org/trunk/openstack-compute/admin/content/creating-images-from-running-instances.html
> That command will not corrupt the running instance because it opens the
> image read-only.
>
> It is possible that the new image is corrupted since qemu-img is reading
> from a qcow2 file that is changing underneath it.  However, the chance
> is small as long as the snapshot isn't deleted while qemu-img convert is
> running.
>
> So this doesn't sound like the cause of the problems you are seeing.
>
> Stefan


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Win2003 disk corruption with kvm-1.0. and virtio
  2013-02-14 10:11       ` Sylvain Bauza
@ 2013-03-12 15:48         ` Sylvain Bauza
  2013-03-12 21:10           ` Jorge Armando Medina
  0 siblings, 1 reply; 13+ messages in thread
From: Sylvain Bauza @ 2013-03-12 15:48 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: kvm

Long lasting bug and huge update, but I think I got the root cause.
FYI, Windows 2003 is having a write cache enabled by default on disk 
drivers. Even with virtio (see driver details, policies).

As a consequence, any DLL which is open could be corrupted if we try a 
simple 'qemu-img convert' against the VM.
The proper way to do a live snapshot is to disable the writecache (and 
goodbye good perfs!) and do the convert.
The other way is to stop the VM, perform a 'qemu-img snapshot', then 
convert the snapshot.

Hope it can help other people.
-Sylvain


Le 14/02/2013 11:11, Sylvain Bauza a écrit :
> Interesting point you mention. Even if qcow2 is read only, the image 
> is changing (especially, I'm running IIS with ASP support and VB DLLs) 
> while the snapshot is taken.
>
> As asked in a second post, I'm running with latest Windows virtio 
> drivers, but I only apply a virtio driver update *after* running an 
> instance, not before taking the snapshot.
>
> What I'll try : run an instance, update the driver, stop the instance, 
> do a qemu-img convert once the instance is stopped.
>
>
> Le 14/02/2013 09:15, Stefan Hajnoczi a écrit :
>> On Wed, Feb 13, 2013 at 10:53:14AM +0100, Sylvain Bauza wrote:
>>> As per documentation, Nova (Openstack Compute layer) is doing a
>>> 'qemu-img convert -s' against a running instance.
>>> http://docs.openstack.org/trunk/openstack-compute/admin/content/creating-images-from-running-instances.html 
>>>
>> That command will not corrupt the running instance because it opens the
>> image read-only.
>>
>> It is possible that the new image is corrupted since qemu-img is reading
>> from a qcow2 file that is changing underneath it.  However, the chance
>> is small as long as the snapshot isn't deleted while qemu-img convert is
>> running.
>>
>> So this doesn't sound like the cause of the problems you are seeing.
>>
>> Stefan
>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Win2003 disk corruption with kvm-1.0. and virtio
  2013-03-12 15:48         ` Sylvain Bauza
@ 2013-03-12 21:10           ` Jorge Armando Medina
  0 siblings, 0 replies; 13+ messages in thread
From: Jorge Armando Medina @ 2013-03-12 21:10 UTC (permalink / raw)
  To: Sylvain Bauza; +Cc: Stefan Hajnoczi, kvm

On 12/03/13 09:48, Sylvain Bauza wrote:
> Long lasting bug and huge update, but I think I got the root cause.
> FYI, Windows 2003 is having a write cache enabled by default on disk
> drivers. Even with virtio (see driver details, policies).

Hi there,

That option did you use in driver policy?

Thanks


>
> As a consequence, any DLL which is open could be corrupted if we try a
> simple 'qemu-img convert' against the VM.
> The proper way to do a live snapshot is to disable the writecache (and
> goodbye good perfs!) and do the convert.
> The other way is to stop the VM, perform a 'qemu-img snapshot', then
> convert the snapshot.
>
> Hope it can help other people.
> -Sylvain
>
>
> Le 14/02/2013 11:11, Sylvain Bauza a écrit :
>> Interesting point you mention. Even if qcow2 is read only, the image
>> is changing (especially, I'm running IIS with ASP support and VB
>> DLLs) while the snapshot is taken.
>>
>> As asked in a second post, I'm running with latest Windows virtio
>> drivers, but I only apply a virtio driver update *after* running an
>> instance, not before taking the snapshot.
>>
>> What I'll try : run an instance, update the driver, stop the
>> instance, do a qemu-img convert once the instance is stopped.
>>
>>
>> Le 14/02/2013 09:15, Stefan Hajnoczi a écrit :
>>> On Wed, Feb 13, 2013 at 10:53:14AM +0100, Sylvain Bauza wrote:
>>>> As per documentation, Nova (Openstack Compute layer) is doing a
>>>> 'qemu-img convert -s' against a running instance.
>>>> http://docs.openstack.org/trunk/openstack-compute/admin/content/creating-images-from-running-instances.html
>>>>
>>> That command will not corrupt the running instance because it opens the
>>> image read-only.
>>>
>>> It is possible that the new image is corrupted since qemu-img is
>>> reading
>>> from a qcow2 file that is changing underneath it.  However, the chance
>>> is small as long as the snapshot isn't deleted while qemu-img
>>> convert is
>>> running.
>>>
>>> So this doesn't sound like the cause of the problems you are seeing.
>>>
>>> Stefan
>>
>
> -- 
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2013-03-12 21:19 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-02-12 14:30 Win2003 disk corruption with kvm-1.0. and virtio Sylvain Bauza
2013-02-13  7:21 ` Philipp Hahn
2013-02-13  9:56   ` Sylvain Bauza
2013-02-13 16:03     ` weber
2013-02-14  5:27       ` Michael Tokarev
2013-02-14  8:23   ` Sylvain Bauza
2013-02-13  9:03 ` Stefan Hajnoczi
2013-02-13  9:53   ` Sylvain Bauza
2013-02-14  8:15     ` Stefan Hajnoczi
2013-02-14 10:11       ` Sylvain Bauza
2013-03-12 15:48         ` Sylvain Bauza
2013-03-12 21:10           ` Jorge Armando Medina
2013-02-14  8:17 ` Stefan Hajnoczi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox