From mboxrd@z Thu Jan 1 00:00:00 1970 From: Bernd Schubert Subject: Re: disk corruption after virsh destroy Date: Sat, 06 Jul 2013 15:03:07 +0200 Message-ID: <51D8158B.6020504@fastmail.fm> References: <20130703084726.GB17434@stefanha-thinkpad.muc.redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Cc: "Brian J. Murrell" , kvm@vger.kernel.org To: Stefan Hajnoczi Return-path: Received: from out1-smtp.messagingengine.com ([66.111.4.25]:55715 "EHLO out1-smtp.messagingengine.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750857Ab3GFNDK (ORCPT ); Sat, 6 Jul 2013 09:03:10 -0400 In-Reply-To: <20130703084726.GB17434@stefanha-thinkpad.muc.redhat.com> Sender: kvm-owner@vger.kernel.org List-ID: On 07/03/2013 10:47 AM, Stefan Hajnoczi wrote: > On Tue, Jul 02, 2013 at 10:40:11AM -0400, Brian J. Murrell wrote: >> I have a cluster of VMs setup with shared virtio-scsi disks. The >> purpose of sharing a disk is that if a VM goes down, another can >> pick up and mount the (ext4) filesystem on shared disk a provide >> service to it. >> >> But just to be super clear, only one VM ever has a filesystem >> mounted at a time even though multiple VMs technically can access >> the device at the same time. A VM mounting a filesystem ensures >> absolutely that no other node has it mounted before mounting it. >> >> That said, what I am finding is that when one a node dies and >> another node tries to mount the (ext4) filesystem, it is found dirty >> and needs an fsck. >> >> My understanding is that with ext{3,4}, this should not be the case >> and indeed it is my experience, on real hardware with coherent disk >> caching (i.e. no non-battery-backed caching disk controllers lying >> to the O/S about what has been written to physical disk) that this >> is the case. That is, a node failing does not leave an ext{3,4} >> filesystem dirty such that it needs an fsck. >> >> So, clearly, somewhere between the KVM VM and the physical disk, >> there is a cache that is resulting in the guest O/S believing data >> is being written to physical disk that is not actually being written >> there. To that end, I have ensured that on these shared disks that >> I set "cache=none", but this does not seem to have fixed the >> problem. > > I expect journal replay and possibly fsck when an ext4 file system was > left in a mounted state and with I/O pending (e.g. due to power > failure). > > A few questions: > > 1. Is the guest mounting the file system with barrier=0? barrier=1 is > the default. > > 2. Do the physical disks have a volatile write cache enabled (if yes, > the guest should use barrier=1)? If the physical disks have a > non-volatile write cache or the write cache is disabled (then > barrier=0 is okay). Er, why? The As far as I understood Brian the physical disks have not been reset, so their cache should be irrelevant? If the VM needs barrier=1, then there must be some VM caching involved, but Brian tried to disable that. At least in the past that worked fine with the emulated LSI scsi controller. That way I simulated shared storage and as long as I used the raw disks format and cache=none there never had been any corruption (although qcow2 didn't work and introduced issues). Brian, maybe you could figure out the pattern of the corruption? I need to add a very mode to ql-fstest, but with some network file systems on top of ext4 it also should work. https://bitbucket.org/aakef/ql-fstest Cheers, Bernd