From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1KpUbX-0007Ef-1E for qemu-devel@nongnu.org; Mon, 13 Oct 2008 17:05:27 -0400 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1KpUbS-0007EQ-Kk for qemu-devel@nongnu.org; Mon, 13 Oct 2008 17:05:25 -0400 Received: from [199.232.76.173] (port=52282 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1KpUbS-0007EN-D6 for qemu-devel@nongnu.org; Mon, 13 Oct 2008 17:05:22 -0400 Received: from e38.co.us.ibm.com ([32.97.110.159]:46107) by monty-python.gnu.org with esmtps (TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1KpUbR-00059Y-3l for qemu-devel@nongnu.org; Mon, 13 Oct 2008 17:05:22 -0400 Received: from d03relay02.boulder.ibm.com (d03relay02.boulder.ibm.com [9.17.195.227]) by e38.co.us.ibm.com (8.13.1/8.13.1) with ESMTP id m9DL4dlm027920 for ; Mon, 13 Oct 2008 15:04:39 -0600 Received: from d03av01.boulder.ibm.com (d03av01.boulder.ibm.com [9.17.195.167]) by d03relay02.boulder.ibm.com (8.13.8/8.13.8/NCO v9.1) with ESMTP id m9DL5BWQ124832 for ; Mon, 13 Oct 2008 15:05:11 -0600 Received: from d03av01.boulder.ibm.com (loopback [127.0.0.1]) by d03av01.boulder.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id m9DL5AMm016826 for ; Mon, 13 Oct 2008 15:05:11 -0600 Date: Mon, 13 Oct 2008 16:05:09 -0500 From: Ryan Harper Subject: Re: [Qemu-devel] Re: [RFC] Disk integrity in QEMU Message-ID: <20081013210509.GL21410@us.ibm.com> References: <48EE38B9.2050106@codemonkey.ws> <20081013170610.GF21410@us.ibm.com> <6A99DBA5-D422-447D-BF9D-019FB394E6C6@lvivier.info> <20081013194328.GJ21410@us.ibm.com> <148FE536-F397-4F51-AE3F-C94E4F1F5D4E@lvivier.info> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <148FE536-F397-4F51-AE3F-C94E4F1F5D4E@lvivier.info> Reply-To: qemu-devel@nongnu.org List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Laurent Vivier Cc: Chris Wright , Mark McLoughlin , Laurent Vivier , qemu-devel@nongnu.org, Ryan Harper * Laurent Vivier [2008-10-13 15:39]: > >> > >>as "cache=on" implies a factor (memory) shared by the whole system, > >>you must take into account the size of the host memory and run some > >>applications (several guests ?) to pollute the host cache, for > >>instance you can run 4 guest and run bench in each of them > >>concurrently, and you could reasonably limits the size of the host > >>memory to 5 x the size of the guest memory. > >>(for instance 4 guests with 128 MB on a host with 768 MB). > > > >I'm not following you here, the only assumption I see is that we > >have 1g > >of host mem free for caching the write. > > Is this a realistic use case ? Optimistic? I don't think it is unrealistic. It is hard to know what hardware and use-case any end user may have at their disposal. > >> > >>as O_DSYNC implies journal commit, you should run a bench on the ext3 > >>host file system concurrently to the bench on a guest to see the > >>impact of the commit on each bench. > > > >I understand the goal here, but what sort of host ext3 journaling load > >is appropriate. Additionally, when we're exporting block devices, I > >don't believe the ext3 journal is an issue. > > Yes, it's a comment for the last test case. > I think you can run the same benchmark as you do in the guest. I'm not sure where to go with this. If it turns out that scaling out on to of ext3 stinks, then the deployment needs to change to deal with that limitation in ext3. Use a proper block device, something like lvm. > >>According to the semantic, I don't understand how O_DSYNC can be > >>better than cache=off in this case... > > > >I don't have a good answer either, but O_DIRECT and O_DSYNC are > >different paths through the kernel. This deserves a better reply, but > >I don't have one off the top of my head. > > The O_DIRECT kernel path should be more "direct" than the O_DSYNC one. > Perhaps a oprofile could help to understand ? > What it is strange also is the CPU usage with cache=off. It should be > lower than others, perhaps an alignment issue ? > due to the LVM ? All possible, I don't have an oprofile of it. > >> > >>OK, but in this case the size of the cache for "cache=off" is the > >>size > >>of the guest cache whereas in the other cases the size of the cache > >>is > >>the size of the guest cache + the size of the host cache, this is not > >>fair... > > > >it isn't supposed to be fair, cache=off is O_DIRECT, we're reading > >from > >the device, we *want* to be able to lean on the host cache to read the > >data, pay once and benefit in other guests if possible. > > OK, but if you want to follow this way I think you must run several > guests concurrently to see how the host cache help each of them. > If you want I can try this tomorrow ? The O_DSYNC patch is the one > posted to the mailing-list ? The patch used is the same as what is on the list, feel free to try. > > And moreover, you should run an endurance test to see how the cache > evolves. I'm not sure how interesting this is, either it was in the cache or not, depending on what work you do you can either devolve to a case where nothing is in cache or where everything is in cache. The point being that by using cache where we can we get the benefit. If you use cache=off you'll never be able to get that boost when it would other wise been available. -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx (512) 838-9253 T/L: 678-9253 ryanh@us.ibm.com