From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1KoEQr-0008Lw-4J
	for qemu-devel@nongnu.org; Fri, 10 Oct 2008 05:37:13 -0400
Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1KoEQq-0008Lk-2c
	for qemu-devel@nongnu.org; Fri, 10 Oct 2008 05:37:12 -0400
Received: from [199.232.76.173] (port=44637 helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1KoEQp-0008Lg-Nq
	for qemu-devel@nongnu.org; Fri, 10 Oct 2008 05:37:11 -0400
Received: from mx2.redhat.com ([66.187.237.31]:36380)
	by monty-python.gnu.org with esmtp (Exim 4.60)
	(envelope-from <avi@redhat.com>) id 1KoEQq-0001S4-4f
	for qemu-devel@nongnu.org; Fri, 10 Oct 2008 05:37:12 -0400
Message-ID: <48EF2120.7070606@redhat.com>
Date: Fri, 10 Oct 2008 11:32:16 +0200
From: Avi Kivity <avi@redhat.com>
MIME-Version: 1.0
Subject: Re: [Qemu-devel] [RFC] Disk integrity in QEMU
References: <48EE38B9.2050106@codemonkey.ws> <48EF0A26.90209@redhat.com>
In-Reply-To: <48EF0A26.90209@redhat.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Reply-To: qemu-devel@nongnu.org
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: qemu-devel@nongnu.org
Cc: Chris Wright <chrisw@redhat.com>, Mark McLoughlin <markmc@redhat.com>, Ryan Harper <ryanh@us.ibm.com>, kvm-devel <kvm-devel@lists.sourceforge.net>, Laurent Vivier <Laurent.Vivier@bull.net>

Gerd Hoffmann wrote:
>   Hi,
>
>   
>> Read performance should be unaffected by using O_DSYNC.  O_DIRECT will
>> significantly reduce read performance.  I think we should use O_DSYNC by
>> default and I have sent out a patch that contains that.  We will follow
>> up with benchmarks to demonstrate this.
>>     
>
> So O_SYNC on/off is pretty much equivalent to disk write caching being
> on/off, right?  So we could make that guest-controlled, i.e. toggeling
> write caching in the guest (using hdparm) toggles O_SYNC in qemu?  This
> together with disk-flush command support (mapping to fsync on the host)
> should allow guests to go into barrier mode for better write performance
> without loosing data integrity.
>   

IDE write caching is very different from host write caching.

The IDE write cache is not susceptible to software failures (well it is 
susceptible to firmware failures, but let's ignore that).  It is likely 
to survive reset and perhaps even powerdown.  The risk window is a few 
megabytes and tens of milliseconds long.

The host pagecache will not survive software failures, resets, or 
powerdown.  The risk window is hundreds of megabytes and thousands of 
milliseconds long.

It's perfectly normal to leave a production system on IDE (though 
perhaps not a mission-critical database), but totally mad to do so with 
host caching.  I don't think we should tie data integrity to an IDE 
misfeature that doesn't even exist anymore (with the advent of SATA NCQ).

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.