From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1KpCt9-00020R-Bc for qemu-devel@nongnu.org; Sun, 12 Oct 2008 22:10:27 -0400 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1KpCt4-00020C-7R for qemu-devel@nongnu.org; Sun, 12 Oct 2008 22:10:26 -0400 Received: from [199.232.76.173] (port=47874 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1KpCt4-000209-1N for qemu-devel@nongnu.org; Sun, 12 Oct 2008 22:10:22 -0400 Received: from mx1.redhat.com ([66.187.233.31]:60146) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1KpCt3-0005jW-OB for qemu-devel@nongnu.org; Sun, 12 Oct 2008 22:10:22 -0400 Message-ID: <48F2ADD9.9000804@redhat.com> Date: Sun, 12 Oct 2008 22:09:29 -0400 From: Mark Wagner MIME-Version: 1.0 Subject: Re: [Qemu-devel] [RFC] Disk integrity in QEMU References: <48EE38B9.2050106@codemonkey.ws> <48EF1D55.7060307@redhat.com> <48F0E83E.2000907@redhat.com> <48F10DFD.40505@codemonkey.ws> <48F14814.7000805@redhat.com> <48F239DB.6050302@codemonkey.ws> <48F295F0.3020105@redhat.com> <48F2A294.4020303@codemonkey.ws> In-Reply-To: <48F2A294.4020303@codemonkey.ws> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Reply-To: qemu-devel@nongnu.org List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel@nongnu.org Cc: Chris Wright , Mark McLoughlin , Ryan Harper , kvm-devel , Laurent Vivier Anthony Liguori wrote: > Mark Wagner wrote: >> If you stopped and listened to yourself, you'd see that you are making >> my point... >> >> AFAIK, QEMU is neither designed nor intended to be an Enterprise >> Storage Array, >> I thought this group is designing a virtualization layer. However, >> the persistent >> argument is that since Enterprise Storage products will often >> acknowledge a write >> before the data is actually on the disk, its OK for QEMU to do the same. > > I think you're a little lost in this thread. We're going to have QEMU > only acknowledge writes when they complete. I've already sent out a > patch. Just waiting a couple days to let everyone give their input. > Actually, I'm just don't being clear enough in trying to point out that I don't think just setting a default value for "cache" goes far enough. My argument has nothing to do with the default value. It has to do with what the right thing to do is in specific situations regardless of the value of the cache setting. My point is that if a file is opened in the guest with the O_DIRECT (or O_DSYNC) then QEMU *must* honor that regardless of whatever value the current value of "cache" is. So, if the system admin for the host decides to set cache=on and something in the guest opens a file with O_DIRECT, I feel that it is a violation of the system call for the host to cache the write in its local cache w/o sending it immediately to the storage subsystem. It must get an ACK from the storage subsystem before it can return to the guest in order to preserve the guarantee. So, if your proposed default value for the cache is in effect, then O_DSYNC should provide the write-thru required by the guests use of O_DIRECT on the writes. However, if the default cache value is not used and its set to cache=on, and if the guest is using O_DIRECT or O_DSYNC, I feel there are issues that need to be addressed. -mark >> If QEMU >> had a similar design to Enterprise Storage with redundancy, battery >> backup, etc, I'd >> be fine with it, but you don't. QEMU is a layer that I've also thought >> was suppose >> to be small, lightweight and unobtrusive that is silently putting >> everyones data >> at risk. >> >> The low-end iSCSI server from EqualLogic claims: >> "it combines intelligence and automation with fault tolerance" >> "Dual, redundant controllers with a total of 4 GB battery-backed >> memory" >> >> AFAIK QEMU provides neither of these characteristics. > > So if this is your only concern, we're in violent agreement. You were > previously arguing that we should use O_DIRECT in the host if we're not > "lying" about write completions anymore. That's what I'm opposing > because the details of whether we use O_DIRECT or not have absolutely > nothing to do with data integrity as long as we're using O_DSYNC. > > Regards, > > Anthony Liguori > >> >> -mark >> >>> The fact that the virtualization layer has a cache is really not that >>> unusual. >> Do other virtualization layers lie to the guest and indicate that the >> data >> has successfully been ACK'd by the storage subsystem when the data is >> actually >> still in the host cache? >> >> >> -mark >>> >>> Regards, >>> >>> Anthony Liguori >>> >>> >> >> >> > > >