From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:47713)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <amartin@xes-inc.com>) id 1XJsjo-0004FK-Eq
	for qemu-devel@nongnu.org; Tue, 19 Aug 2014 19:22:52 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <amartin@xes-inc.com>) id 1XJsjk-0006ap-3y
	for qemu-devel@nongnu.org; Tue, 19 Aug 2014 19:22:48 -0400
Received: from xes-mad.com ([216.165.139.218]:9980)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <amartin@xes-inc.com>) id 1XJsjj-0006Zs-U7
	for qemu-devel@nongnu.org; Tue, 19 Aug 2014 19:22:44 -0400
Date: Tue, 19 Aug 2014 18:20:38 -0500 (CDT)
From: Andrew Martin <amartin@xes-inc.com>
Message-ID: <838926932.102908.1408490438455.JavaMail.zimbra@xes-inc.com>
In-Reply-To: <20140819145925.GB13680@stefanha-thinkpad.redhat.com>
References: <1009168463.49610.1408133034828.JavaMail.zimbra@xes-inc.com>
	<985931631.51123.1408133895894.JavaMail.zimbra@xes-inc.com>
	<20140819145925.GB13680@stefanha-thinkpad.redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] Using cache=writeback safely on qemu 1.4.0 and
 later
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Stefan Hajnoczi <stefanha@gmail.com>
Cc: qemu-devel@nongnu.org

----- Original Message -----
> From: "Stefan Hajnoczi" <stefanha@gmail.com>
> To: "Andrew Martin" <amartin@xes-inc.com>
> Cc: qemu-devel@nongnu.org
> Sent: Tuesday, August 19, 2014 9:59:25 AM
> Subject: Re: [Qemu-devel] Using cache=writeback safely on qemu 1.4.0 and later
> 
> If you strace -f the QEMU process on the host, you will see fdatasync(2)
> system calls when the guest flushes the disk.
> 
> You can find the file descriptor number by checking ls -l
> /proc/$PID_OF_QEMU/fd and looking for the disk image file.

When the disk is set to cache=writethrough on one of the same VMs, I see frequent 
fdatasync(2) calls (every few seconds). However, when I change the disk over to
cache=writeback, since boot I have not yet seen a single fdatasync(2) call, even
after writing data 2x the amount of RAM:
# time strace -ft -p4113 2>&1 | grep fdatasync
^C

real    15m39.245s
user    0m7.940s
sys     0m18.280s

Note that the disk is defined as follows:
<disk type='file' device='disk'>
        <driver name='qemu' type='qcow2' cache='writeback'/>
        <source file='/var/lib/libvirt/images/vm.img'/>
        <target dev='vda' bus='virtio'/>
        <alias name='virtio-disk0'/>
        <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
</disk>


> > I recently experienced UPS failure on several hosts which caused a hard
> > shutdown. After restarting, 3 of the guests had corruption on their disks
> > and
> > required a fairly long fsck to fix. Afterwards, data that had been written
> > to
> > the disks several hours before the crash was corrupted, which makes me
> > think
> > that it was never fsync()-ed to the non-volatile storage.
> 
> What exactly was the "corruption" you encountered?  Which application,
> error message, etc.

Two of the servers are web servers with apache2. In one case, a python daemon
copies JPGs onto the server - the last 100 copied onto the server were corrupted. 
In another case, some files had been uploaded several days prior to the www-root, 
but after the hard reset said files were no longer present in the filesystem. 


> > Is it safe in this setup to use cache=writeback? Or, should I use
> > cache=writethrough instead?
> 
> Ubuntu 12.04 is recent and sends write cache flushes.
> 
> Are you sure the file system and/or application workload are flushing
> the disk cache?  Please check the mount options and application-specific
> configuration.

The mount options for the ext4 filesystem in the VM in both cases are:
rw,relatime,errors=remount-ro,data=ordered

Similarly, the host's ext4 filesystem holding the images is mounted with:
rw,relatime,data=ordered

I did not see any errors in the kernel log in the guest, probably because the 
root filesystem was read-only until the fsck had completed.

Thanks,

Andrew