From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from [140.186.70.92] (port=57187 helo=eggs.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1OyANz-0006Nm-Vn
	for qemu-devel@nongnu.org; Tue, 21 Sep 2010 17:28:25 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69)
	(envelope-from <anthony@codemonkey.ws>) id 1OyANy-000077-Q2
	for qemu-devel@nongnu.org; Tue, 21 Sep 2010 17:28:23 -0400
Received: from mail-qy0-f173.google.com ([209.85.216.173]:58188)
	by eggs.gnu.org with esmtp (Exim 4.69)
	(envelope-from <anthony@codemonkey.ws>) id 1OyANy-000072-Mi
	for qemu-devel@nongnu.org; Tue, 21 Sep 2010 17:28:22 -0400
Received: by qyk34 with SMTP id 34so4481438qyk.4
	for <qemu-devel@nongnu.org>; Tue, 21 Sep 2010 14:28:22 -0700 (PDT)
Message-ID: <4C99235D.9050506@codemonkey.ws>
Date: Tue, 21 Sep 2010 16:27:57 -0500
From: Anthony Liguori <anthony@codemonkey.ws>
MIME-Version: 1.0
References: <4C97916E.2080801@codemonkey.ws> <20100920193451.GA11516@lst.de>
	<4C97BFF3.90103@codemonkey.ws> <20100920231742.GB18512@lst.de>
	<4C97F9C6.60501@codemonkey.ws> <20100921142608.GA18290@lst.de>
	<4C98CB7D.30703@codemonkey.ws> <20100921205740.GA1467@lst.de>
In-Reply-To: <20100921205740.GA1467@lst.de>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Subject: [Qemu-devel] Re: Caching modes
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Christoph Hellwig <hch@lst.de>
Cc: Kevin Wolf <kwolf@redhat.com>, qemu-devel <qemu-devel@nongnu.org>

On 09/21/2010 03:57 PM, Christoph Hellwig wrote:
> On Tue, Sep 21, 2010 at 10:13:01AM -0500, Anthony Liguori wrote:
>    
>> 1) make virtual WC guest controllable.  If a guest enables WC,&=
>> ~O_DSYNC.  If it disables WC, |= O_DSYNC.  Obviously, we can let a user
>> specify the virtual WC mode but it has to be changable during live
>> migration.
>>      
> I have patches for that are almost ready to submit.
>
>    
>> 2) only let the user choose between using and not using the host page
>> cache.  IOW, direct=on|off.  cache=XXX is deprecated.
>>      
> Also done by that patch series.  That's exactly what I described to mail
> roundtrips ago..
>    

Yes.

>> My concern is ext4.  With a preallocated file and cache=none as
>> implemented today, performance is good even when barrier=1.  If we
>> enable O_DSYNC, performance will plummet.  Ultimately, this is an ext4
>> problem, not a QEMU problem.
>>      
> For Linux or Windows guests WCE=0 is not a particularly good default
> given that they can deal with the write caches, and mirrors the
> situation with consumer SATA disk.  For for older Unix guests you'll
> need to be able to persistently disable the write cache.
>
> To make things more confusing the default ATA/SATA way to tune the
> volatile write cache setting is not persistent - e.g. if you disable it
> using hdparm it will come up enabled again.
>    

Yes, potentially, we could save this in a config file (and really, I 
mean libvirt could save it).

>> 2) User does not have enterprise storage, but has an image on ext4 with
>> barrier=1.  User explicitly disables WC in guest because they don't know
>> what they're doing.
>>
>> For (2), again it's probably the user doing the wrong thing because if
>> they don't have enterprise storage, then they shouldn't care about a
>> virtual WC.  Practically though, I've seen a lot of this with users.
>>      
> This setting is just fine, especially if using O_DIRECT.  The guest
> sends cache flush requests often enough to not make it a problem.  If
> you do not use O_DIRECT in that scenario which will cache a lot more
> data in theory - but any filesystem aware of cache flushes will flush
> them frequent enough to not make it a problem.  It is a real problem
> however when using ext3 in it's default setting in the guest which
> doesn't use barrier.  But that's a bug in ext3 and nothing but
> petitioning it's maintainer to fix it will help you there.
>    

It's not just ext3, it's ext4 with barrier=0 which is what certain 
applications are being told to do in the face of poor performance.

So direct=on,wc=on + ext4 barrier=0 in the guest is less safe than ext4 
barrier=0 on bare metal.

Very specifically, if we do cache=none as we do today, and within the 
guest, we have ext4 barrier=0 and run DB2, DB2's guarantees are weaker 
than they are on bare metal because of the fact that metadata is not 
getting flushed.

To resolve this, we need to do direct=on,wc=off + ext4 barrier=0 on the 
host.  This is safe and should perform reasonably well but there's far 
too much complexity for a user to get to this point.

Regards,

Anthony Liguori