From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1Kp24f-0003yb-IQ for qemu-devel@nongnu.org; Sun, 12 Oct 2008 10:37:37 -0400 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1Kp24e-0003yL-Vo for qemu-devel@nongnu.org; Sun, 12 Oct 2008 10:37:37 -0400 Received: from [199.232.76.173] (port=47641 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Kp24e-0003yI-Q6 for qemu-devel@nongnu.org; Sun, 12 Oct 2008 10:37:36 -0400 Received: from il.qumranet.com ([212.179.150.194]:46066) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1Kp24e-0003WT-5P for qemu-devel@nongnu.org; Sun, 12 Oct 2008 10:37:36 -0400 Message-ID: <48F20BBC.1040708@il.qumranet.com> Date: Sun, 12 Oct 2008 16:37:48 +0200 From: Dor Laor MIME-Version: 1.0 Subject: Re: [Qemu-devel] [RFC] Disk integrity in QEMU References: <48EE38B9.2050106@codemonkey.ws> <48EF1D55.7060307@redhat.com> <48F0E83E.2000907@redhat.com> <48F10DFD.40505@codemonkey.ws> <20081012004401.GA9763@acer.localdomain> <48F1CF9E.9030500@redhat.com> In-Reply-To: <48F1CF9E.9030500@redhat.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Reply-To: qemu-devel@nongnu.org List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel@nongnu.org Cc: Chris Wright , Mark McLoughlin , Ryan Harper , kvm-devel , Laurent Vivier Avi Kivity wrote: > Chris Wright wrote: >> I think it's safe to say the perf folks are concerned w/ data integrity >> first, stable/reproducible results second, and raw performance third. >> >> So seeing data cached in host was simply not what they expected. I >> think >> write through is sufficient. However I think that uncached vs. wt will >> show up on the radar under reproducible results (need to tune based on >> cache size). And in most overcommit scenarios memory is typically more >> precious than cpu, it's unclear to me if the extra buffering is anything >> other than memory overhead. As long as it's configurable then it's >> comparable and benchmarking and best practices can dictate best choice. >> > > Getting good performance because we have a huge amount of free memory > in the host is not a good benchmark. Under most circumstances, the > free memory will be used either for more guests, or will be given to > the existing guests, which can utilize it more efficiently than the host. > > I can see two cases where this is not true: > > - using older, 32-bit guests which cannot utilize all of the cache. I > think Windows XP is limited to 512MB of cache, and usually doesn't > utilize even that. So if you have an application running on 32-bit > Windows (or on 32-bit Linux with pae disabled), and a huge host, you > will see a significant boost from cache=writethrough. This is a case > where performance can exceed native, simply because native cannot > exploit all the resources of the host. > > - if cache requirements vary in time across the different guests, and > if some smart ballooning is not in place, having free memory on the > host means we utilize it for whichever guest has the greatest need, so > overall performance improves. > > > Another justification for ODIRECT is that many production system will use the base images for their VMs. It's mainly true for desktop virtualization but probably for some server virtualization deployments. In these type of scenarios, we can have all of the base image chain opened as default with caching for read-only while the leaf images are open with cache=off. Since there is ongoing effort (both by IT and developers) to keep the base images as big as possible, it guarantees that this data is best suited for caching in the host while the private leaf images will be uncached. This way we provide good performance and caching for the shared parent images while also promising correctness. Actually this is what happens on mainline qemu with cache=off. Cheers, Dor