From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michael Tokarev Subject: Re: JFYI: ext4 bug triggerable by kvm Date: Tue, 17 Aug 2010 18:40:15 +0400 Message-ID: <4C6A9F4F.8040209@msgid.tls.msk.ru> References: <4C694483.5010903@msgid.tls.msk.ru> <4C694E7D.3060600@codemonkey.ws> <20100816184237.GA16579@infradead.org> <4C69A0C4.2080102@codemonkey.ws> <20100817090755.GA11110@infradead.org> <4C6A86E4.9080600@codemonkey.ws> <20100817130702.GA16635@infradead.org> <4C6A9AB5.6050404@codemonkey.ws> <20100817142808.GA22412@infradead.org> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: Anthony Liguori , KVM list , Kevin Wolf To: Christoph Hellwig Return-path: Received: from isrv.corpit.ru ([86.62.121.231]:40089 "EHLO isrv.corpit.ru" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752760Ab0HQOkR (ORCPT ); Tue, 17 Aug 2010 10:40:17 -0400 In-Reply-To: <20100817142808.GA22412@infradead.org> Sender: kvm-owner@vger.kernel.org List-ID: 17.08.2010 18:28, Christoph Hellwig wrote: > On Tue, Aug 17, 2010 at 09:20:37AM -0500, Anthony Liguori wrote: [] >> For normal writes from a guest, we don't need to follow the write >> with an fsync(). We should only need to issue an fsync() given an >> explicit flush from the guest. > > Define normal writes. For cache=none and cache=writeback we don't > have to, and instead do explicit calls to fsync()/fdatasync() calls > when a we a cache flush from the guest. For data=writethrough we > guarantee data has made it to disk, and we implement this using > O_DSYNC/O_SYNC when opening the file. That tells the operating system > to not return until data has hit the disk. For Linux this is > internally implement using a range-fsync/fdatasync after the actual > write. And this is actually what I mentioned in the very beginning, in a hopefully-single-thread-email I've sent. Mentioned that ext4 is very slow when using with O_SYNC (without O_DIRECT). I still had no opportunity to collect more info on this, and yes, I've seen your (Christoph's) speed tests of a few FSes in the famous "BTRFS: Unbelievably slow with kvm/qemu" thread. A few users reported _insane_ write speeds of qcow2 files with default cache mode on ext4. And this is what prompted all this discussion (which actually has nothing to do with the $subject line ;), -- an attempt to think about replacing O_SYNC/fsync() with something "lighter"... >> fsync() being slow is orthogonal to my point. I don't see why we >> need to do an fsync() on *every* write. It should only be necessary >> when a guest injects an actual barrier. We don't do sync on every write, but O_SYNC implies that. And apparently it is what happening behind the scenes in ext4 O_SYNC case. But ok.... /mjt