From mboxrd@z Thu Jan 1 00:00:00 1970 From: Avi Kivity Subject: Re: [PATCH, RFC] virtio_blk: add cache flush command Date: Mon, 11 May 2009 21:40:41 +0300 Message-ID: <4A087129.40808@redhat.com> References: <20090511083908.GB20082@lst.de> <4A083B7C.1000703@codemonkey.ws> <20090511154046.GA4226@lst.de> <4A08482E.30100@redhat.com> <20090511162810.GA6027@lst.de> <4A085721.2050005@redhat.com> <4A0864CE.10505@codemonkey.ws> <4A0867B8.2090601@redhat.com> <4A086E72.5060302@codemonkey.ws> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Christoph Hellwig , Rusty Russell , kvm@vger.kernel.org To: Anthony Liguori Return-path: Received: from mx2.redhat.com ([66.187.237.31]:57116 "EHLO mx2.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1759251AbZEKSlW (ORCPT ); Mon, 11 May 2009 14:41:22 -0400 In-Reply-To: <4A086E72.5060302@codemonkey.ws> Sender: kvm-owner@vger.kernel.org List-ID: Anthony Liguori wrote: > Avi Kivity wrote: >> Anthony Liguori wrote: >> >>> >>> Right now, it's fairly easy to understand. cache=none and >>> cache=writethrough guarantee that all write operations that the >>> guest thinks have completed are completed. cache=writeback provides >>> no such guarantee. >> >> cache=none is partially broken as well, since O_DIRECT writes might >> hit an un-battery-packed write cache. I think cache=writeback will >> send the necessary flushes, if the disk and the underlying filesystem >> support them. > > Sure, but this likely doesn't upset people that much since O_DIRECT > has always had this behavior. But people are not using O_DIRECT. They're using their guests, which may or may not issue the appropriate barriers. They don't know that we're using O_DIRECT underneath with different guarantees. > Using non-battery backed disks with writeback enabled introduces a > larger set of possible data integrity issues. I think this case is > acceptable to ignore because it's a straight forward policy. It isn't straightforward to me. A guest should be able to get the same guarantees running on a hypervisor backed by such a disk as it would get if it was running on bare metal with the same disk. Right now, that's not the case, we're reducing the guarantees the guest gets. >>> cache=writeback+fsync would guarantee that only operations that >>> include a T_FLUSH are present on disk which currently includes >>> fsyncs but does not include O_DIRECT writes. I guess whether O_SYNC >>> does a T_FLUSH also has to be determined. >>> >>> It seems too complicated to me. If we could provide a mode where >>> cache=writeback provided as strong a guarantee as >>> cache=writethrough, then that would be quite interesting. >> >> It don't think we realistically can. > > Maybe two fds? One open in O_SYNC and one not. Is such a thing sane? For all I care, yes. Filesystem developers would probably have you locked up. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic.