From mboxrd@z Thu Jan 1 00:00:00 1970 From: Avi Kivity Subject: Re: [PATCH, RFC] virtio_blk: add cache flush command Date: Mon, 11 May 2009 21:00:24 +0300 Message-ID: <4A0867B8.2090601@redhat.com> References: <20090511083908.GB20082@lst.de> <4A083B7C.1000703@codemonkey.ws> <20090511154046.GA4226@lst.de> <4A08482E.30100@redhat.com> <20090511162810.GA6027@lst.de> <4A085721.2050005@redhat.com> <4A0864CE.10505@codemonkey.ws> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Christoph Hellwig , Rusty Russell , kvm@vger.kernel.org To: Anthony Liguori Return-path: Received: from mx2.redhat.com ([66.187.237.31]:38168 "EHLO mx2.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1758808AbZEKSBE (ORCPT ); Mon, 11 May 2009 14:01:04 -0400 In-Reply-To: <4A0864CE.10505@codemonkey.ws> Sender: kvm-owner@vger.kernel.org List-ID: Anthony Liguori wrote: > Avi Kivity wrote: >> Christoph Hellwig wrote: >>> On Mon, May 11, 2009 at 06:45:50PM +0300, Avi Kivity wrote: >>> >>>>> Right now it's fsync. By the time I'll submit the backend change it >>>>> will still be fsync, but at least called from the posix-aio-compat >>>>> thread pool. >>>>> >>>>> >>>> I think if we have cache=writeback we should ignore this. >>>> >>> >>> It's only needed for cache=writeback, because without that there is no >>> reason to flush a write cache. >>> >> >> Maybe we should add a fourth cache= mode then. But >> cache=writeback+fsync doesn't correspond to any real world drive; in >> the real world you're limited to power failures and a few megabytes >> of cache (typically less), cache=writeback+fsync can lose hundreds of >> megabytes due to power loss or software failure. >> >> Oh, and cache=writeback+fsync doesn't work on qcow2, unless we add >> fsync after metadata updates. > > But how do we define the data integrity guarantees to the user of > cache=writeback+fsync? It seems to require a rather detailed > knowledge of Linux's use of T_FLUSH operations. True. I don't think cache=writeback+fsync is useful. Like I mentioned, it doesn't act like a real drive, and it doesn't work well with qcow2. > > Right now, it's fairly easy to understand. cache=none and > cache=writethrough guarantee that all write operations that the guest > thinks have completed are completed. cache=writeback provides no such > guarantee. cache=none is partially broken as well, since O_DIRECT writes might hit an un-battery-packed write cache. I think cache=writeback will send the necessary flushes, if the disk and the underlying filesystem support them. > cache=writeback+fsync would guarantee that only operations that > include a T_FLUSH are present on disk which currently includes fsyncs > but does not include O_DIRECT writes. I guess whether O_SYNC does a > T_FLUSH also has to be determined. > > It seems too complicated to me. If we could provide a mode where > cache=writeback provided as strong a guarantee as cache=writethrough, > then that would be quite interesting. It don't think we realistically can. >>> (Or maybe ext3 actually is stupid enough to flush the whole fs even for >>> that case >> >> Sigh. > > I'm also worried about ext3 here. I'm just waiting for btrfs. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic.