From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Brian J. Murrell" Subject: disk corruption after virsh destroy Date: Tue, 02 Jul 2013 10:40:11 -0400 Message-ID: Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit To: kvm@vger.kernel.org Return-path: Received: from plane.gmane.org ([80.91.229.3]:40308 "EHLO plane.gmane.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751139Ab3GBOk2 (ORCPT ); Tue, 2 Jul 2013 10:40:28 -0400 Received: from list by plane.gmane.org with local (Exim 4.69) (envelope-from ) id 1Uu1kp-0000P6-Nv for kvm@vger.kernel.org; Tue, 02 Jul 2013 16:40:27 +0200 Received: from d67-193-232-12.home3.cgocable.net ([67.193.232.12]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Tue, 02 Jul 2013 16:40:27 +0200 Received: from brian by d67-193-232-12.home3.cgocable.net with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Tue, 02 Jul 2013 16:40:27 +0200 Sender: kvm-owner@vger.kernel.org List-ID: I have a cluster of VMs setup with shared virtio-scsi disks. The purpose of sharing a disk is that if a VM goes down, another can pick up and mount the (ext4) filesystem on shared disk a provide service to it. But just to be super clear, only one VM ever has a filesystem mounted at a time even though multiple VMs technically can access the device at the same time. A VM mounting a filesystem ensures absolutely that no other node has it mounted before mounting it. That said, what I am finding is that when one a node dies and another node tries to mount the (ext4) filesystem, it is found dirty and needs an fsck. My understanding is that with ext{3,4}, this should not be the case and indeed it is my experience, on real hardware with coherent disk caching (i.e. no non-battery-backed caching disk controllers lying to the O/S about what has been written to physical disk) that this is the case. That is, a node failing does not leave an ext{3,4} filesystem dirty such that it needs an fsck. So, clearly, somewhere between the KVM VM and the physical disk, there is a cache that is resulting in the guest O/S believing data is being written to physical disk that is not actually being written there. To that end, I have ensured that on these shared disks that I set "cache=none", but this does not seem to have fixed the problem. Here is my KVM commandline. Please bear with the unfortunate line wrapping since my MUA (Thunderbird) doesn't allow for one to specify lines which shouldn't be wrapped. I have tried to ameliorate that by indenting all of the lines that start command line options with two spaces. /usr/bin/qemu-kvm -name wtm-60vm5 -S -M pc-0.14 -enable-kvm -m 8192 \ -smp 1,sockets=1,cores=1,threads=1 \ -uuid 5cbc2568-e32d-11e2-9c1f-001e67293bea -no-user-config \ -nodefaults \ -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/wtm-60vm5.monitor,server,nowait \ -mon chardev=charmonitor,id=monitor,mode=control \ -rtc base=utc -no-shutdown \ -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x5 -drive file=/var/lib/libvirt/images/wtm-60vm5.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,serial=node1-root -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive file=/dev/vg_00/disk1,if=none,id=drive-scsi0-0-0-0,format=raw,serial=disk1,cache=none\ -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0 -drive file=/dev/vg_00/disk2,if=none,id=drive-scsi0-0-0-1,format=raw,serial=disk2,cache=none \ -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=1,drive=drive-scsi0-0-0-1,id=scsi0-0-0-1 \ -drive file=/dev/vg_00/disk3,if=none,id=drive-scsi0-0-0-2,format=raw,serial=disk3,cache=none \ -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=2,drive=drive-scsi0-0-0-2,id=scsi0-0-0-2 \ -drive file=/dev/vg_00/disk4,if=none,id=drive-scsi0-0-0-3,format=raw,serial=disk4,cache=none \ -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=3,drive=drive-scsi0-0-0-3,id=scsi0-0-0-3 \ -drive file=/dev/vg_00/disk5,if=none,id=drive-scsi0-0-0-4,format=raw,serial=disk5,cache=none \ -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=4,drive=drive-scsi0-0-0-4,id=scsi0-0-0-4 \ -netdev tap,fd=23,id=hostnet0,vhost=on,vhostfd=27 \ -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:60:d9:05,bus=pci.0,addr=0x3 \ -netdev tap,fd=31,id=hostnet1,vhost=on,vhostfd=32 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:60:a7:05,bus=pci.0,addr=0x8 -chardev pty,id=charserial0 \ -device isa-serial,chardev=charserial0,id=serial0 -vnc 127.0.0.1:3 \ -vga cirrus -device AC97,id=sound0,bus=pci.0,addr=0x4 \ -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 Clearly it's the 5 scsi disks which are yielding corruption when a VM is destroyed with "virsh destroy". Any ideas on what I need to do to ensure that writes at the guest O/S layer which are to be sent to physical disk actually make it to physical disk on the host? Of course, I am happy to provide any additional information, debugging, etc. that may be needed. Cheers, b.