From mboxrd@z Thu Jan 1 00:00:00 1970 From: Anthony Liguori Subject: Re: [PATCH, RFC] virtio_blk: add cache flush command Date: Mon, 11 May 2009 12:47:58 -0500 Message-ID: <4A0864CE.10505@codemonkey.ws> References: <20090511083908.GB20082@lst.de> <4A083B7C.1000703@codemonkey.ws> <20090511154046.GA4226@lst.de> <4A08482E.30100@redhat.com> <20090511162810.GA6027@lst.de> <4A085721.2050005@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Christoph Hellwig , Rusty Russell , kvm@vger.kernel.org To: Avi Kivity Return-path: Received: from an-out-0708.google.com ([209.85.132.249]:9472 "EHLO an-out-0708.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751318AbZEKRsB (ORCPT ); Mon, 11 May 2009 13:48:01 -0400 Received: by an-out-0708.google.com with SMTP id d40so10296112and.1 for ; Mon, 11 May 2009 10:48:01 -0700 (PDT) In-Reply-To: <4A085721.2050005@redhat.com> Sender: kvm-owner@vger.kernel.org List-ID: Avi Kivity wrote: > Christoph Hellwig wrote: >> On Mon, May 11, 2009 at 06:45:50PM +0300, Avi Kivity wrote: >> >>>> Right now it's fsync. By the time I'll submit the backend change it >>>> will still be fsync, but at least called from the posix-aio-compat >>>> thread pool. >>>> >>>> >>> I think if we have cache=writeback we should ignore this. >>> >> >> It's only needed for cache=writeback, because without that there is no >> reason to flush a write cache. >> > > Maybe we should add a fourth cache= mode then. But > cache=writeback+fsync doesn't correspond to any real world drive; in > the real world you're limited to power failures and a few megabytes of > cache (typically less), cache=writeback+fsync can lose hundreds of > megabytes due to power loss or software failure. > > Oh, and cache=writeback+fsync doesn't work on qcow2, unless we add > fsync after metadata updates. But how do we define the data integrity guarantees to the user of cache=writeback+fsync? It seems to require a rather detailed knowledge of Linux's use of T_FLUSH operations. Right now, it's fairly easy to understand. cache=none and cache=writethrough guarantee that all write operations that the guest thinks have completed are completed. cache=writeback provides no such guarantee. cache=writeback+fsync would guarantee that only operations that include a T_FLUSH are present on disk which currently includes fsyncs but does not include O_DIRECT writes. I guess whether O_SYNC does a T_FLUSH also has to be determined. It seems too complicated to me. If we could provide a mode where cache=writeback provided as strong a guarantee as cache=writethrough, then that would be quite interesting. >> (Or maybe ext3 actually is stupid enough to flush the whole fs even for >> that case > > Sigh. I'm also worried about ext3 here. Regards, Anthony Liguori