From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1Kp24f-0003yb-IQ
	for qemu-devel@nongnu.org; Sun, 12 Oct 2008 10:37:37 -0400
Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1Kp24e-0003yL-Vo
	for qemu-devel@nongnu.org; Sun, 12 Oct 2008 10:37:37 -0400
Received: from [199.232.76.173] (port=47641 helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1Kp24e-0003yI-Q6
	for qemu-devel@nongnu.org; Sun, 12 Oct 2008 10:37:36 -0400
Received: from il.qumranet.com ([212.179.150.194]:46066)
	by monty-python.gnu.org with esmtp (Exim 4.60)
	(envelope-from <dor.laor@qumranet.com>) id 1Kp24e-0003WT-5P
	for qemu-devel@nongnu.org; Sun, 12 Oct 2008 10:37:36 -0400
Message-ID: <48F20BBC.1040708@il.qumranet.com>
Date: Sun, 12 Oct 2008 16:37:48 +0200
From: Dor Laor <dor.laor@qumranet.com>
MIME-Version: 1.0
Subject: Re: [Qemu-devel] [RFC] Disk integrity in QEMU
References: <48EE38B9.2050106@codemonkey.ws>	<48EF1D55.7060307@redhat.com>	<48F0E83E.2000907@redhat.com>	<48F10DFD.40505@codemonkey.ws>	<20081012004401.GA9763@acer.localdomain>
	<48F1CF9E.9030500@redhat.com>
In-Reply-To: <48F1CF9E.9030500@redhat.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Reply-To: qemu-devel@nongnu.org
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: qemu-devel@nongnu.org
Cc: Chris Wright <chrisw@redhat.com>, Mark McLoughlin <markmc@redhat.com>, Ryan Harper <ryanh@us.ibm.com>, kvm-devel <kvm@vger.kernel.org>, Laurent Vivier <Laurent.Vivier@bull.net>

Avi Kivity wrote:
> Chris Wright wrote:
>> I think it's safe to say the perf folks are concerned w/ data integrity
>> first, stable/reproducible results second, and raw performance third.
>>
>> So seeing data cached in host was simply not what they expected.  I 
>> think
>> write through is sufficient.  However I think that uncached vs. wt will
>> show up on the radar under reproducible results (need to tune based on
>> cache size).  And in most overcommit scenarios memory is typically more
>> precious than cpu, it's unclear to me if the extra buffering is anything
>> other than memory overhead.  As long as it's configurable then it's
>> comparable and benchmarking and best practices can dictate best choice.
>>   
>
> Getting good performance because we have a huge amount of free memory 
> in the host is not a good benchmark.  Under most circumstances, the 
> free memory will be used either for more guests, or will be given to 
> the existing guests, which can utilize it more efficiently than the host.
>
> I can see two cases where this is not true:
>
> - using older, 32-bit guests which cannot utilize all of the cache.  I 
> think Windows XP is limited to 512MB of cache, and usually doesn't 
> utilize even that.  So if you have an application running on 32-bit 
> Windows (or on 32-bit Linux with pae disabled), and a huge host, you 
> will see a significant boost from cache=writethrough.  This is a case 
> where performance can exceed native, simply because native cannot 
> exploit all the resources of the host.
>
> - if cache requirements vary in time across the different guests, and 
> if some smart ballooning is not in place, having free memory on the 
> host means we utilize it for whichever guest has the greatest need, so 
> overall performance improves.
>
>
>
Another justification for ODIRECT is that many production system will 
use the base images for their VMs.
It's mainly true for desktop virtualization but probably for some server 
virtualization deployments.
In these type of scenarios, we can have all of the base image chain 
opened as default with caching for read-only while the
leaf images are open with cache=off.
Since there is ongoing effort (both by IT and developers) to keep the 
base images as big as possible, it guarantees that
this data is best suited for caching in the host while the private leaf 
images will be uncached.
This way we provide good performance and caching for the shared parent 
images while also promising correctness.
Actually this is what happens on mainline qemu with cache=off.

Cheers,
Dor