From mboxrd@z Thu Jan 1 00:00:00 1970 From: Avi Kivity Subject: Re: writeback cache + h700 controller w/1gb nvcache, corruption on power loss Date: Tue, 24 Apr 2012 16:43:27 +0300 Message-ID: <4F96ADFF.6000502@redhat.com> References: <28769.660.1334566274126.JavaMail.root@sys1.internetdefensetechnologies.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: Ron Edison , kvm@vger.kernel.org, Kevin Wolf , Christoph Hellwig To: Stefan Hajnoczi Return-path: Received: from mx1.redhat.com ([209.132.183.28]:14607 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752861Ab2DXNnr (ORCPT ); Tue, 24 Apr 2012 09:43:47 -0400 In-Reply-To: Sender: kvm-owner@vger.kernel.org List-ID: On 04/17/2012 11:41 AM, Stefan Hajnoczi wrote: > > The disk corruption experienced was indeed lost data -- an fsck was necessary for 4 of the guests to boot at all in RW mode, they first came up read only. In the case of one of the guests there was actually files data / data lost after fsck was manually run upon reboot/single user mode. In some cases these were config files, in other database indexes, etc. This one of the 4 guests with the most severe corruption was not usable and we had to revert to a backup and pull current data out of it as much as possible. > > Since you used QEMU -drive cache=writeback data loss is expected on > host power failure. cache=writeback uses the (volatile) host page > cache and therefore data may not have made it to the RAID controller > before power was lost. > > Guest file system recovery - either a quick journal replay or a > painful fsck - is also expected on host power failure. The file > systems are dirty since the guest stopped executing without cleanly > unmounting its file systems. If you use cache=none or > cache=directsync then you should get a quick journal replay and the > risk of a painful fsck should be reduced (most/all of the data will > have been preserved). Both cache=writeback and cache=none should result in journal replay if the guest is using barriers. If you're using ext4 and data=journal, you should expect no data loss. With the defaults you can expect to lost recently written data (quite a lot with data=writeback) but the filesystem structure itself should be fine. Even with cache=directsync you should expect some data loss as the guest may hold data in its own page cache. So something unexpected is happening. Which guests (by OS type) are failing? -- error compiling committee.c: too many arguments to function