* Win2003 disk corruption with kvm-1.0. and virtio
@ 2013-02-12 14:30 Sylvain Bauza
2013-02-13 7:21 ` Philipp Hahn
` (2 more replies)
0 siblings, 3 replies; 13+ messages in thread
From: Sylvain Bauza @ 2013-02-12 14:30 UTC (permalink / raw)
To: kvm
Hi,
We currently run Openstack Essex hosts with KVM-1.0 (Ubuntu 12.04)
instances with qcow2,virtio,cache=none
For Linux VMs, no trouble at all but we do observe filesystem corruption
and inconsistency (missing DLLs, CHKDSK asked by EventViewer, failure at
reboot) with some of our Windows 2003 SP2 64b images.
At first boot, stress tests (CrystalDiskMark 3.0.2 and intensive CHKDSK)
don't show up problems. It is only appearing 6 or 12h later.
Do you have any idea on how to prevent it ? Is cache=writethrough an
acceptable solution ? We don't want to leave qcow2 image format as it
does allow to do live snapshots et al.
Thanks for your inputs,
-Sylvain Bauza
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: Win2003 disk corruption with kvm-1.0. and virtio 2013-02-12 14:30 Win2003 disk corruption with kvm-1.0. and virtio Sylvain Bauza @ 2013-02-13 7:21 ` Philipp Hahn 2013-02-13 9:56 ` Sylvain Bauza 2013-02-14 8:23 ` Sylvain Bauza 2013-02-13 9:03 ` Stefan Hajnoczi 2013-02-14 8:17 ` Stefan Hajnoczi 2 siblings, 2 replies; 13+ messages in thread From: Philipp Hahn @ 2013-02-13 7:21 UTC (permalink / raw) To: kvm Hello, On Tuesday 12 February 2013 15:30:37 Sylvain Bauza wrote: > We currently run Openstack Essex hosts with KVM-1.0 (Ubuntu 12.04) > instances with qcow2,virtio,cache=none The default answer is to update your qemu-kvm version: 1.0 is very old, qemu- kvm is fully merged into upstream qemu, which is currently preparing its 1.4 release. There have been many fixes to qemi and the qcow2 handling: I know of at least one serious problem not fixed up to qemu-1.1. Sincerely Philipp -- Philipp Hahn Open Source Software Engineer hahn@univention.de Univention GmbH be open. fon: +49 421 22 232- 0 Mary-Somerville-Str.1 D-28359 Bremen fax: +49 421 22 232-99 http://www.univention.de/ ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Win2003 disk corruption with kvm-1.0. and virtio 2013-02-13 7:21 ` Philipp Hahn @ 2013-02-13 9:56 ` Sylvain Bauza 2013-02-13 16:03 ` weber 2013-02-14 8:23 ` Sylvain Bauza 1 sibling, 1 reply; 13+ messages in thread From: Sylvain Bauza @ 2013-02-13 9:56 UTC (permalink / raw) To: Philipp Hahn; +Cc: kvm Hi Philipp, Indeed. Qemu-kvm.1.0 is pretty old but this version is the stable one for Ubuntu Precise (12.04 LTS). No backport is available for later versions, I need to install by hand. Do you know if qemu-1.3 (with KVM support) is fully compatible with qemu-kvm.1.0 ? As I'm relying on Openstack Nova for upper hypervisor layer, it needs to be 100% matching. Thanks, -Sylvain Le 13/02/2013 08:21, Philipp Hahn a écrit : > Hello, > > On Tuesday 12 February 2013 15:30:37 Sylvain Bauza wrote: >> We currently run Openstack Essex hosts with KVM-1.0 (Ubuntu 12.04) >> instances with qcow2,virtio,cache=none > The default answer is to update your qemu-kvm version: 1.0 is very old, qemu- > kvm is fully merged into upstream qemu, which is currently preparing its 1.4 > release. > There have been many fixes to qemi and the qcow2 handling: I know of at least > one serious problem not fixed up to qemu-1.1. > > Sincerely > Philipp ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Win2003 disk corruption with kvm-1.0. and virtio 2013-02-13 9:56 ` Sylvain Bauza @ 2013-02-13 16:03 ` weber 2013-02-14 5:27 ` Michael Tokarev 0 siblings, 1 reply; 13+ messages in thread From: weber @ 2013-02-13 16:03 UTC (permalink / raw) To: Kvm there are known problems, WHEN I/O "native" and cache=writethrough. On I/O "native" put cache to "none" otherwise your data can get broken. Check Redhat Pages for that. marko Am 2013-02-13 10:56, schrieb Sylvain Bauza: > Hi Philipp, > > Indeed. Qemu-kvm.1.0 is pretty old but this version is the stable one > for Ubuntu Precise (12.04 LTS). > No backport is available for later versions, I need to install by > hand. > > Do you know if qemu-1.3 (with KVM support) is fully compatible with > qemu-kvm.1.0 ? > As I'm relying on Openstack Nova for upper hypervisor layer, it needs > to be 100% matching. > > Thanks, > -Sylvain > > > Le 13/02/2013 08:21, Philipp Hahn a écrit : >> Hello, >> >> On Tuesday 12 February 2013 15:30:37 Sylvain Bauza wrote: >>> We currently run Openstack Essex hosts with KVM-1.0 (Ubuntu 12.04) >>> instances with qcow2,virtio,cache=none >> The default answer is to update your qemu-kvm version: 1.0 is very >> old, qemu- >> kvm is fully merged into upstream qemu, which is currently preparing >> its 1.4 >> release. >> There have been many fixes to qemi and the qcow2 handling: I know of >> at least >> one serious problem not fixed up to qemu-1.1. >> >> Sincerely >> Philipp > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Win2003 disk corruption with kvm-1.0. and virtio 2013-02-13 16:03 ` weber @ 2013-02-14 5:27 ` Michael Tokarev 0 siblings, 0 replies; 13+ messages in thread From: Michael Tokarev @ 2013-02-14 5:27 UTC (permalink / raw) To: weber; +Cc: Kvm, Sylvain Bauza [Please stop top-posting. Thank you] 13.02.2013 20:03, weber@zackbummfertig.de wrote: > > there are known problems, WHEN I/O "native" and cache=writethrough. > On I/O "native" put cache to "none" otherwise your data can get broken. > Check Redhat Pages for that. Which problem is that? And what is "I/O native" ? Maybe you mean "aio", not "I/O" ? If the talk is about aio=native, that mode does not work for regular files, it gets "downgraded" to aio=threads automatically. So there should be nothing to change already. Please elaborate. /mjt ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Win2003 disk corruption with kvm-1.0. and virtio 2013-02-13 7:21 ` Philipp Hahn 2013-02-13 9:56 ` Sylvain Bauza @ 2013-02-14 8:23 ` Sylvain Bauza 1 sibling, 0 replies; 13+ messages in thread From: Sylvain Bauza @ 2013-02-14 8:23 UTC (permalink / raw) To: Philipp Hahn; +Cc: kvm Hi, Latest updates, I tried using : - cache=writethrough / kvm-1.0 : errors in qcow2 - cache=none/kvm-1.3 : no errors using 'qemu-img check', but EventViewer is complaining I have to admit I'm lost. I cannot understand what is causing this corruption, only appearing on some Windows instances... Please find below the executable path : 117 13781 1 4 Feb13 ? 00:41:43 /usr/bin/kvm -S -M pc-1.3 -enable-kvm -m 2048 -smp 1,sockets=1,cores=1,threads=1 -name instance-0000004f -uuid 26801166-aa03-4bbc-b062-da47168a664c -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/instance-0000004f.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot c -drive file=/var/lib/nova/instances/instance-0000004f/disk,if=none,id=drive-virtio-disk0,format=qcow2,cache=none -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -netdev tap,fd=21,id=hostnet0,vhost=on,vhostfd=22 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=fa:16:3e:7a:a1:61,bus=pci.0,addr=0x3 -chardev file,id=charserial0,path=/var/lib/nova/instances/instance-0000004f/console.log -device isa-serial,chardev=charserial0,id=serial0 -chardev pty,id=charserial1 -device isa-serial,chardev=charserial1,id=serial1 -usb -device usb-tablet,id=input0 -vnc 192.168.1.155:2 -k fr -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 Last try, I googled and found that virtio network can be buggy. I will try to switch back to another driver and see. By the way, all these Windows instances do have virtio SCSI drivers up to date. Le 13/02/2013 08:21, Philipp Hahn a écrit : > Hello, > > On Tuesday 12 February 2013 15:30:37 Sylvain Bauza wrote: >> We currently run Openstack Essex hosts with KVM-1.0 (Ubuntu 12.04) >> instances with qcow2,virtio,cache=none > The default answer is to update your qemu-kvm version: 1.0 is very old, qemu- > kvm is fully merged into upstream qemu, which is currently preparing its 1.4 > release. > There have been many fixes to qemi and the qcow2 handling: I know of at least > one serious problem not fixed up to qemu-1.1. > > Sincerely > Philipp ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Win2003 disk corruption with kvm-1.0. and virtio 2013-02-12 14:30 Win2003 disk corruption with kvm-1.0. and virtio Sylvain Bauza 2013-02-13 7:21 ` Philipp Hahn @ 2013-02-13 9:03 ` Stefan Hajnoczi 2013-02-13 9:53 ` Sylvain Bauza 2013-02-14 8:17 ` Stefan Hajnoczi 2 siblings, 1 reply; 13+ messages in thread From: Stefan Hajnoczi @ 2013-02-13 9:03 UTC (permalink / raw) To: Sylvain Bauza; +Cc: kvm On Tue, Feb 12, 2013 at 03:30:37PM +0100, Sylvain Bauza wrote: > We currently run Openstack Essex hosts with KVM-1.0 (Ubuntu 12.04) > instances with qcow2,virtio,cache=none > > For Linux VMs, no trouble at all but we do observe filesystem > corruption and inconsistency (missing DLLs, CHKDSK asked by > EventViewer, failure at reboot) with some of our Windows 2003 SP2 > 64b images. > > At first boot, stress tests (CrystalDiskMark 3.0.2 and intensive > CHKDSK) don't show up problems. It is only appearing 6 or 12h later. > > Do you have any idea on how to prevent it ? Is cache=writethrough an > acceptable solution ? We don't want to leave qcow2 image format as > it does allow to do live snapshots et al. How are you taking live snapshots? qemu-img should not be used on a disk image that is currently open by a running guest, it may lead to corruption. Stefan ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Win2003 disk corruption with kvm-1.0. and virtio 2013-02-13 9:03 ` Stefan Hajnoczi @ 2013-02-13 9:53 ` Sylvain Bauza 2013-02-14 8:15 ` Stefan Hajnoczi 0 siblings, 1 reply; 13+ messages in thread From: Sylvain Bauza @ 2013-02-13 9:53 UTC (permalink / raw) To: Stefan Hajnoczi; +Cc: kvm Hi Stefan, As per documentation, Nova (Openstack Compute layer) is doing a 'qemu-img convert -s' against a running instance. http://docs.openstack.org/trunk/openstack-compute/admin/content/creating-images-from-running-instances.html Do you think it could be our root cause ? Btw, I tested cache=writethrough and I observed image corruption after some time ('qemu-img check' returns errors) Thanks for your input, -Sylvain Le 13/02/2013 10:03, Stefan Hajnoczi a écrit : > On Tue, Feb 12, 2013 at 03:30:37PM +0100, Sylvain Bauza wrote: >> We currently run Openstack Essex hosts with KVM-1.0 (Ubuntu 12.04) >> instances with qcow2,virtio,cache=none >> >> For Linux VMs, no trouble at all but we do observe filesystem >> corruption and inconsistency (missing DLLs, CHKDSK asked by >> EventViewer, failure at reboot) with some of our Windows 2003 SP2 >> 64b images. >> >> At first boot, stress tests (CrystalDiskMark 3.0.2 and intensive >> CHKDSK) don't show up problems. It is only appearing 6 or 12h later. >> >> Do you have any idea on how to prevent it ? Is cache=writethrough an >> acceptable solution ? We don't want to leave qcow2 image format as >> it does allow to do live snapshots et al. > How are you taking live snapshots? qemu-img should not be used on a > disk image that is currently open by a running guest, it may lead to > corruption. > > Stefan ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Win2003 disk corruption with kvm-1.0. and virtio 2013-02-13 9:53 ` Sylvain Bauza @ 2013-02-14 8:15 ` Stefan Hajnoczi 2013-02-14 10:11 ` Sylvain Bauza 0 siblings, 1 reply; 13+ messages in thread From: Stefan Hajnoczi @ 2013-02-14 8:15 UTC (permalink / raw) To: Sylvain Bauza; +Cc: kvm On Wed, Feb 13, 2013 at 10:53:14AM +0100, Sylvain Bauza wrote: > As per documentation, Nova (Openstack Compute layer) is doing a > 'qemu-img convert -s' against a running instance. > http://docs.openstack.org/trunk/openstack-compute/admin/content/creating-images-from-running-instances.html That command will not corrupt the running instance because it opens the image read-only. It is possible that the new image is corrupted since qemu-img is reading from a qcow2 file that is changing underneath it. However, the chance is small as long as the snapshot isn't deleted while qemu-img convert is running. So this doesn't sound like the cause of the problems you are seeing. Stefan ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Win2003 disk corruption with kvm-1.0. and virtio 2013-02-14 8:15 ` Stefan Hajnoczi @ 2013-02-14 10:11 ` Sylvain Bauza 2013-03-12 15:48 ` Sylvain Bauza 0 siblings, 1 reply; 13+ messages in thread From: Sylvain Bauza @ 2013-02-14 10:11 UTC (permalink / raw) To: Stefan Hajnoczi; +Cc: kvm Interesting point you mention. Even if qcow2 is read only, the image is changing (especially, I'm running IIS with ASP support and VB DLLs) while the snapshot is taken. As asked in a second post, I'm running with latest Windows virtio drivers, but I only apply a virtio driver update *after* running an instance, not before taking the snapshot. What I'll try : run an instance, update the driver, stop the instance, do a qemu-img convert once the instance is stopped. Le 14/02/2013 09:15, Stefan Hajnoczi a écrit : > On Wed, Feb 13, 2013 at 10:53:14AM +0100, Sylvain Bauza wrote: >> As per documentation, Nova (Openstack Compute layer) is doing a >> 'qemu-img convert -s' against a running instance. >> http://docs.openstack.org/trunk/openstack-compute/admin/content/creating-images-from-running-instances.html > That command will not corrupt the running instance because it opens the > image read-only. > > It is possible that the new image is corrupted since qemu-img is reading > from a qcow2 file that is changing underneath it. However, the chance > is small as long as the snapshot isn't deleted while qemu-img convert is > running. > > So this doesn't sound like the cause of the problems you are seeing. > > Stefan ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Win2003 disk corruption with kvm-1.0. and virtio 2013-02-14 10:11 ` Sylvain Bauza @ 2013-03-12 15:48 ` Sylvain Bauza 2013-03-12 21:10 ` Jorge Armando Medina 0 siblings, 1 reply; 13+ messages in thread From: Sylvain Bauza @ 2013-03-12 15:48 UTC (permalink / raw) To: Stefan Hajnoczi; +Cc: kvm Long lasting bug and huge update, but I think I got the root cause. FYI, Windows 2003 is having a write cache enabled by default on disk drivers. Even with virtio (see driver details, policies). As a consequence, any DLL which is open could be corrupted if we try a simple 'qemu-img convert' against the VM. The proper way to do a live snapshot is to disable the writecache (and goodbye good perfs!) and do the convert. The other way is to stop the VM, perform a 'qemu-img snapshot', then convert the snapshot. Hope it can help other people. -Sylvain Le 14/02/2013 11:11, Sylvain Bauza a écrit : > Interesting point you mention. Even if qcow2 is read only, the image > is changing (especially, I'm running IIS with ASP support and VB DLLs) > while the snapshot is taken. > > As asked in a second post, I'm running with latest Windows virtio > drivers, but I only apply a virtio driver update *after* running an > instance, not before taking the snapshot. > > What I'll try : run an instance, update the driver, stop the instance, > do a qemu-img convert once the instance is stopped. > > > Le 14/02/2013 09:15, Stefan Hajnoczi a écrit : >> On Wed, Feb 13, 2013 at 10:53:14AM +0100, Sylvain Bauza wrote: >>> As per documentation, Nova (Openstack Compute layer) is doing a >>> 'qemu-img convert -s' against a running instance. >>> http://docs.openstack.org/trunk/openstack-compute/admin/content/creating-images-from-running-instances.html >>> >> That command will not corrupt the running instance because it opens the >> image read-only. >> >> It is possible that the new image is corrupted since qemu-img is reading >> from a qcow2 file that is changing underneath it. However, the chance >> is small as long as the snapshot isn't deleted while qemu-img convert is >> running. >> >> So this doesn't sound like the cause of the problems you are seeing. >> >> Stefan > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Win2003 disk corruption with kvm-1.0. and virtio 2013-03-12 15:48 ` Sylvain Bauza @ 2013-03-12 21:10 ` Jorge Armando Medina 0 siblings, 0 replies; 13+ messages in thread From: Jorge Armando Medina @ 2013-03-12 21:10 UTC (permalink / raw) To: Sylvain Bauza; +Cc: Stefan Hajnoczi, kvm On 12/03/13 09:48, Sylvain Bauza wrote: > Long lasting bug and huge update, but I think I got the root cause. > FYI, Windows 2003 is having a write cache enabled by default on disk > drivers. Even with virtio (see driver details, policies). Hi there, That option did you use in driver policy? Thanks > > As a consequence, any DLL which is open could be corrupted if we try a > simple 'qemu-img convert' against the VM. > The proper way to do a live snapshot is to disable the writecache (and > goodbye good perfs!) and do the convert. > The other way is to stop the VM, perform a 'qemu-img snapshot', then > convert the snapshot. > > Hope it can help other people. > -Sylvain > > > Le 14/02/2013 11:11, Sylvain Bauza a écrit : >> Interesting point you mention. Even if qcow2 is read only, the image >> is changing (especially, I'm running IIS with ASP support and VB >> DLLs) while the snapshot is taken. >> >> As asked in a second post, I'm running with latest Windows virtio >> drivers, but I only apply a virtio driver update *after* running an >> instance, not before taking the snapshot. >> >> What I'll try : run an instance, update the driver, stop the >> instance, do a qemu-img convert once the instance is stopped. >> >> >> Le 14/02/2013 09:15, Stefan Hajnoczi a écrit : >>> On Wed, Feb 13, 2013 at 10:53:14AM +0100, Sylvain Bauza wrote: >>>> As per documentation, Nova (Openstack Compute layer) is doing a >>>> 'qemu-img convert -s' against a running instance. >>>> http://docs.openstack.org/trunk/openstack-compute/admin/content/creating-images-from-running-instances.html >>>> >>> That command will not corrupt the running instance because it opens the >>> image read-only. >>> >>> It is possible that the new image is corrupted since qemu-img is >>> reading >>> from a qcow2 file that is changing underneath it. However, the chance >>> is small as long as the snapshot isn't deleted while qemu-img >>> convert is >>> running. >>> >>> So this doesn't sound like the cause of the problems you are seeing. >>> >>> Stefan >> > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Win2003 disk corruption with kvm-1.0. and virtio 2013-02-12 14:30 Win2003 disk corruption with kvm-1.0. and virtio Sylvain Bauza 2013-02-13 7:21 ` Philipp Hahn 2013-02-13 9:03 ` Stefan Hajnoczi @ 2013-02-14 8:17 ` Stefan Hajnoczi 2 siblings, 0 replies; 13+ messages in thread From: Stefan Hajnoczi @ 2013-02-14 8:17 UTC (permalink / raw) To: Sylvain Bauza; +Cc: kvm On Tue, Feb 12, 2013 at 03:30:37PM +0100, Sylvain Bauza wrote: > We currently run Openstack Essex hosts with KVM-1.0 (Ubuntu 12.04) > instances with qcow2,virtio,cache=none > > For Linux VMs, no trouble at all but we do observe filesystem > corruption and inconsistency (missing DLLs, CHKDSK asked by > EventViewer, failure at reboot) with some of our Windows 2003 SP2 > 64b images. > > At first boot, stress tests (CrystalDiskMark 3.0.2 and intensive > CHKDSK) don't show up problems. It is only appearing 6 or 12h later. Are you running the latest virtio-win drivers? See http://www.linux-kvm.org/page/WindowsGuestDrivers/Download_Drivers. Have you tested with IDE instead of virtio on the Windows guests? Stefan ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2013-03-12 21:19 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-02-12 14:30 Win2003 disk corruption with kvm-1.0. and virtio Sylvain Bauza 2013-02-13 7:21 ` Philipp Hahn 2013-02-13 9:56 ` Sylvain Bauza 2013-02-13 16:03 ` weber 2013-02-14 5:27 ` Michael Tokarev 2013-02-14 8:23 ` Sylvain Bauza 2013-02-13 9:03 ` Stefan Hajnoczi 2013-02-13 9:53 ` Sylvain Bauza 2013-02-14 8:15 ` Stefan Hajnoczi 2013-02-14 10:11 ` Sylvain Bauza 2013-03-12 15:48 ` Sylvain Bauza 2013-03-12 21:10 ` Jorge Armando Medina 2013-02-14 8:17 ` Stefan Hajnoczi
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox