From mboxrd@z Thu Jan  1 00:00:00 1970
From: Avi Kivity <avi@redhat.com>
Subject: Re: JFYI: ext4 bug triggerable by kvm
Date: Tue, 17 Aug 2010 17:59:07 +0300
Message-ID: <4C6AA3BB.5020103@redhat.com>
References: <4C694E7D.3060600@codemonkey.ws> <20100816184237.GA16579@infradead.org> <4C69A0C4.2080102@codemonkey.ws> <20100817090755.GA11110@infradead.org> <4C6A86E4.9080600@codemonkey.ws> <20100817130702.GA16635@infradead.org> <4C6A9AB5.6050404@codemonkey.ws> <20100817142808.GA22412@infradead.org> <4C6A9F4F.8040209@msgid.tls.msk.ru> <4C6AA061.80704@codemonkey.ws> <20100817144651.GB10280@infradead.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: Anthony Liguori <anthony@codemonkey.ws>,
	Michael Tokarev <mjt@tls.msk.ru>,
	KVM list <kvm@vger.kernel.org>, Kevin Wolf <kwolf@redhat.com>
To: Christoph Hellwig <hch@infradead.org>
Return-path: <kvm-owner@vger.kernel.org>
Received: from mx1.redhat.com ([209.132.183.28]:8059 "EHLO mx1.redhat.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1750837Ab0HQO7P (ORCPT <rfc822;kvm@vger.kernel.org>);
	Tue, 17 Aug 2010 10:59:15 -0400
In-Reply-To: <20100817144651.GB10280@infradead.org>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

  On 08/17/2010 05:46 PM, Christoph Hellwig wrote:
> On Tue, Aug 17, 2010 at 09:44:49AM -0500, Anthony Liguori wrote:
>> I think the real issue is we're mixing host configuration with guest
>> visible state.
> The last time I proposed to decouple the two you and Avi were heavily
> opposed to it..

I wasn't that I can recall.

>> With O_SYNC, we're causing cache=writethrough to do writethrough
>> through two layers of the storage heirarchy.  I don't think that's
>> necessary or desirable though.
> It's absolutely nessecary if we tell the guest that we do not have
> a volatile write cache.  Which is the only good reason to use
> data=writethrough anyway - except for dealing with old guests that
> can't handle volatile writecache it's an absolutely stupid mode of
> operation.

I agree, but there's another case: tell the guest that we have a write 
cache, use O_DSYNC, but only flush the disk cache on guest flushes.

The reason for this is that if we don't use O_DSYNC the page cache can 
grow to huge proportions.  While this is allowed by the contract between 
virtual drive and guest, guest software and users won't expect a huge 
data loss on power fail, only a minor data loss from the last fraction 
of a second before the failure.

I believe this can be approximated by mounting the host filesystem with 
barrier=0?

-- 
error compiling committee.c: too many arguments to function