From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1KoEQr-0008Lw-4J for qemu-devel@nongnu.org; Fri, 10 Oct 2008 05:37:13 -0400 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1KoEQq-0008Lk-2c for qemu-devel@nongnu.org; Fri, 10 Oct 2008 05:37:12 -0400 Received: from [199.232.76.173] (port=44637 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1KoEQp-0008Lg-Nq for qemu-devel@nongnu.org; Fri, 10 Oct 2008 05:37:11 -0400 Received: from mx2.redhat.com ([66.187.237.31]:36380) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1KoEQq-0001S4-4f for qemu-devel@nongnu.org; Fri, 10 Oct 2008 05:37:12 -0400 Message-ID: <48EF2120.7070606@redhat.com> Date: Fri, 10 Oct 2008 11:32:16 +0200 From: Avi Kivity MIME-Version: 1.0 Subject: Re: [Qemu-devel] [RFC] Disk integrity in QEMU References: <48EE38B9.2050106@codemonkey.ws> <48EF0A26.90209@redhat.com> In-Reply-To: <48EF0A26.90209@redhat.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Reply-To: qemu-devel@nongnu.org List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel@nongnu.org Cc: Chris Wright , Mark McLoughlin , Ryan Harper , kvm-devel , Laurent Vivier Gerd Hoffmann wrote: > Hi, > > >> Read performance should be unaffected by using O_DSYNC. O_DIRECT will >> significantly reduce read performance. I think we should use O_DSYNC by >> default and I have sent out a patch that contains that. We will follow >> up with benchmarks to demonstrate this. >> > > So O_SYNC on/off is pretty much equivalent to disk write caching being > on/off, right? So we could make that guest-controlled, i.e. toggeling > write caching in the guest (using hdparm) toggles O_SYNC in qemu? This > together with disk-flush command support (mapping to fsync on the host) > should allow guests to go into barrier mode for better write performance > without loosing data integrity. > IDE write caching is very different from host write caching. The IDE write cache is not susceptible to software failures (well it is susceptible to firmware failures, but let's ignore that). It is likely to survive reset and perhaps even powerdown. The risk window is a few megabytes and tens of milliseconds long. The host pagecache will not survive software failures, resets, or powerdown. The risk window is hundreds of megabytes and thousands of milliseconds long. It's perfectly normal to leave a production system on IDE (though perhaps not a mission-critical database), but totally mad to do so with host caching. I don't think we should tie data integrity to an IDE misfeature that doesn't even exist anymore (with the advent of SATA NCQ). -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain.