From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from [140.186.70.92] (port=44799 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1OxiU4-0004Vo-Hs for qemu-devel@nongnu.org; Mon, 20 Sep 2010 11:40:49 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1OxiU3-000604-9d for qemu-devel@nongnu.org; Mon, 20 Sep 2010 11:40:48 -0400 Received: from mail-px0-f173.google.com ([209.85.212.173]:63324) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1OxiU3-0005zx-2f for qemu-devel@nongnu.org; Mon, 20 Sep 2010 11:40:47 -0400 Received: by pxi12 with SMTP id 12so1320555pxi.4 for ; Mon, 20 Sep 2010 08:40:45 -0700 (PDT) Message-ID: <4C978071.2010209@codemonkey.ws> Date: Mon, 20 Sep 2010 10:40:33 -0500 From: Anthony Liguori MIME-Version: 1.0 Subject: Re: [Qemu-devel] [RFC] block-queue: Delay and batch metadata writes References: <1284991010-10951-1-git-send-email-kwolf@redhat.com> <4C977028.3050602@codemonkey.ws> <4C9778EC.9060704@redhat.com> In-Reply-To: <4C9778EC.9060704@redhat.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Kevin Wolf Cc: qemu-devel@nongnu.org On 09/20/2010 10:08 AM, Kevin Wolf wrote: >> If you're comfortable with a writeback cache for metadata, then you >> should also be comfortable with a writeback cache for data in which >> case, cache=writeback is the answer. >> > Well, there is a difference: We don't pollute the host page cache with > guest data and we don't get a virtual "disk cache" as big as the host > RAM, but only a very limited queue of metadata. > > Basically, in qemu we have three different types of caching: > > 1. O_DSYNC, everything is always synced without any explicit request. > This is cache=writethrough. > I actually think O_DSYNC is the wrong implementation of cache=writethrough. cache=writethrough should behave just like cache=none except that data goes through the page cache. > 2. Nothing is ever synced. This is cache=unsafe. > > 3. We present a writeback disk cache to the guest and the guest needs > to explicitly flush to gets its data safe on disk. This is > cache=writeback and cache=none. > We shouldn't tie the virtual disk cache to which cache= option is used in the host. cache=none means that all requests go directly to the disk. cache=writeback means the host acts as a writeback cache. If your disk is in writethrough mode, exposing cache=none as a writeback disk cache is not correct. > We're still lacking modes for O_DSYNC | O_DIRECT and unsafe | O_DIRECT, > but they are entirely possible, because it's two different dimensions. > (And I think Christoph was planning to actually make it two independent > options) > I don't really think O_DSYNC | O_DIRECT makes much sense. >> If it's a matter of batching, batching can't occur if you have a barrier >> between steps 3 and 5. The only way you can get batching is by doing a >> writeback cache for the metadata such that you can complete your request >> before the metadata is written. >> >> Am I misunderstanding the idea? >> > No, I think you understand it right, but maybe you were not completely > aware that cache=none doesn't mean writethrough. > No, cache=none means don't cache on the host. In my mind, cache=none|cache=writethrough is specifically about eliminating the host from the cache hierarchy. This is not a correctness issue with respect to integrity but rather about data loss. If you have strong storage with battery backed caches, then you can relax flushes. However, if you've got a cache in the host and the host isn't battery backed, that's no longer safe to do. So even with cache=none, if we added a writeback cache for metadata, it would really need to be an optional feature. Something like cache=none|writethrough|metadata|writeback. Regards, Anthony Liguori > Kevin >