From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1Jypkf-0002In-IP for qemu-devel@nongnu.org; Wed, 21 May 2008 10:57:13 -0400 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1Jypke-0002IF-FP for qemu-devel@nongnu.org; Wed, 21 May 2008 10:57:12 -0400 Received: from [199.232.76.173] (port=37501 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Jypke-0002IB-Bc for qemu-devel@nongnu.org; Wed, 21 May 2008 10:57:12 -0400 Received: from bzq-179-150-194.static.bezeqint.net ([212.179.150.194]:41717 helo=il.qumranet.com) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1Jypkd-0003I9-7S for qemu-devel@nongnu.org; Wed, 21 May 2008 10:57:12 -0400 Message-ID: <48343844.1050107@qumranet.com> Date: Wed, 21 May 2008 17:57:08 +0300 From: Avi Kivity MIME-Version: 1.0 Subject: Re: [Qemu-devel] Re: [PATCH][v2] Align file accesses with cache=off (O_DIRECT) References: <1211283126.4314.70.camel@frecb07144> <200805202352.17807.paul@codesourcery.com> <483373BA.6090108@codemonkey.ws> <200805210205.37432.paul@codesourcery.com> <4833778C.4030209@codemonkey.ws> <4833DC3F.8000604@suse.de> <20080521122629.GA14416@shareable.org> <48341783.3060204@qumranet.com> <20080521134154.GA15210@shareable.org> <483429EB.7070705@codemonkey.ws> <48342F05.2090603@qumranet.com> <48343106.4070801@codemonkey.ws> In-Reply-To: <48343106.4070801@codemonkey.ws> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Reply-To: qemu-devel@nongnu.org List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Anthony Liguori Cc: Blue Swirl , Laurent Vivier , qemu-devel@nongnu.org, Paul Brook Anthony Liguori wrote: > Avi Kivity wrote: >> Anthony Liguori wrote: >>> >>> "cached" is not a terribly accurate term. O_DIRECT avoids the host >>> page cache but it doesn't guarantee that the disk is using >>> write-through. For that, you need to use hdparm. >>> >>> O_SYNC basically turns the host page cache into a write-through >>> cache. In terms of data integrity, the only question that matters >>> is whether you're misleading the guest into thinking data is on the >>> disk when it isn't. Both O_DIRECT and O_SYNC accomplish this. >>> >>> If you just are concerned with data integrity, O_SYNC is probably >>> better because you get the benefits of host caching. O_DIRECT is >>> really for circumstances where you know that using the host page >>> cache is going to reduce performance. >> >> In one specific circumstance O_SYNC has data integrity problems: >> shared disks with guests running on different hosts (or even a guest >> on one host, sharing a disk with another host). In these cases, two >> reads can return different values without an intervening write. > > Are you assuming the underlying disk sharing protocol does not keep > the page cache coherent? > I was assuming access to a raw partition. But yes, with a cluster file system this objection goes away. At a significant cost, though. You're now running two cache coherency protocols on top of each other. >> In the general case, O_DIRECT gives better performance. It avoids >> copying from the host pagecache to guest memory, and if you have >> spare memory to benefit from caching, give it to the guest; the >> nearer to the data consumer the cache is, the faster it performs. > > This assumes the only thing you're running on the machine is VMs. If > you're just running one VM on your desktop, it is probably preferable > to go through the host page cache. In particular, the host page cache > can be automatically aged and adjusted whereas, in general, you cannot > reduce the guests page cache size. Agreed. For casual uses, O_DIRECT is overkill. It does get rid of the data copies, though. > >> In one specific case O_SYNC (or regular read/write cached operation) >> is better, rebooting the same guest over and over, as the guest cache >> is flushed on reboot. Not a very interesting case, but unfortunately >> one that is very visible. >> >> (if you have a backing file for a COW disk, then opening the backing >> file without O_DIRECT may be a good idea too, as the file can be >> shared among many guests). > > FWIW, we really only need to use O_SYNC when the guest has disabled > write-back. I think we should do that unconditionally too as it's an > issue of correctness. If the guest is used for non critical applications (like testing distro installers), then it's just a slowdown. Even if the guest did not disable disk writeback, pagecache and disk write caches have vastly different characteristics, so I think we should set O_SYNC there as well. Here's a summary of the use cases I saw so far: - casual use, no critical data: write back cache - backing file shared among many guests: read-only, cached - desktop system, but don't lose my data: O_SYNC (significant resources on the host) - dedicated virtualization engine: O_DIRECT (most host resources assigned to guests) -- error compiling committee.c: too many arguments to function