From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1MiFx5-0005ix-7B for qemu-devel@nongnu.org; Mon, 31 Aug 2009 19:06:19 -0400 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1MiFx0-0005fG-RD for qemu-devel@nongnu.org; Mon, 31 Aug 2009 19:06:18 -0400 Received: from [199.232.76.173] (port=55251 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1MiFx0-0005f9-AI for qemu-devel@nongnu.org; Mon, 31 Aug 2009 19:06:14 -0400 Received: from verein.lst.de ([213.95.11.210]:40408) by monty-python.gnu.org with esmtps (TLS-1.0:DHE_RSA_3DES_EDE_CBC_SHA1:24) (Exim 4.60) (envelope-from ) id 1MiFwz-0003VZ-RJ for qemu-devel@nongnu.org; Mon, 31 Aug 2009 19:06:14 -0400 Date: Tue, 1 Sep 2009 01:06:12 +0200 From: Christoph Hellwig Subject: Re: [Qemu-devel] [PATCH 1/4] block: add enable_write_cache flag Message-ID: <20090831230612.GC10220@lst.de> References: <20090831201627.GA4811@lst.de> <20090831201651.GA4874@lst.de> <20090831220950.GB24318@shareable.org> <20090831221622.GA8834@lst.de> <20090831224645.GD24318@shareable.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20090831224645.GD24318@shareable.org> List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Jamie Lokier Cc: Christoph Hellwig , qemu-devel@nongnu.org On Mon, Aug 31, 2009 at 11:46:45PM +0100, Jamie Lokier wrote: > > On Mon, Aug 31, 2009 at 11:09:50PM +0100, Jamie Lokier wrote: > > > Right now, on a Linux host O_SYNC is unsafe with hardware that has a > > > volatile write cache. That might not be changed, but if it is than > > > performance with cache=writethrough will plummet (due to issuing a > > > CACHE FLUSH to the hardware after every write), while performance with > > > cache=writeback will be reasonable. > > > > Currenly all modes are more or less unsafe with volatile write caches > > at least when using ext3 or raw block device accesses. XFS is safe > > two thirds due to doing the right thing and one third due to sheer > > luck. > > Right, but now you've made it worse. By not calling fdatasync at all, > you've reduced the integrity. Previously it would reach the drive's > cache, and take whatever (short) time it took to reach the platter. > Now you're leaving data in the host cache which can stay for much > longer, and is vulnerable to host kernel crashes. Your last comment is for data=writeback, which in Avi's proposal that I implemented would indeed lost any guarantees and be for all pratical matters unsafe. It's not true for any of the other options. > Oh, and QEMU could call whatever "hdparm -F" does when using raw block > devices ;-) Actually for ide/scsi implementing cache control is on my todo list. Not sure about virtio yet. > Well I'd like to start by pointing out your patch introduces a > regression in the combination cache=writeback with emulated SCSI, > because it effectively removes the fdatasync calls in that case :-) Yes, you already pointed this out above. > It goes to show no matter how hard we try, data integrity is a > slippery thing where getting it wrong does not show up under normal > circumstances, only during catastrophic system failures. Honestly, it should not. Digging through all this was a bit of work, but I was extremly how carelessly most people that touched it before were. It's not rocket sciense and can be tested quite easily using various tools - qemu beeing the easiest nowdays but scsi_debug or an instrumented iscsi target would do the same thing. > It failed with fsync, which > is also important to applications, but filesystem integrity is the > most important thing and it's been good at that for many years. Users might disagree with that. With my user hat on I couldn't care less on what state the internal metadata is as long as I get back at my data which the OS has guaranteed me to reach the disk after a successfull fsync/fdatasync/O_SYNC write. > > E.g. if you want to move your old SCO Unix box into a VM it's the > > only safe option. > > I agree, and for that reason, cache=writethrough or cache=none are the > only reasonable defaults. despite the extremly misleading name cache=none is _NOT_ an alternative, unless we make it open the image using O_DIRECT|O_SYNC.