From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1Kpmzp-0003WS-KH
	for qemu-devel@nongnu.org; Tue, 14 Oct 2008 12:43:45 -0400
Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1Kpmzn-0003RL-Ij
	for qemu-devel@nongnu.org; Tue, 14 Oct 2008 12:43:44 -0400
Received: from [199.232.76.173] (port=35924 helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1Kpmzn-0003QW-4G
	for qemu-devel@nongnu.org; Tue, 14 Oct 2008 12:43:43 -0400
Received: from mx2.redhat.com ([66.187.237.31]:41869)
	by monty-python.gnu.org with esmtp (Exim 4.60)
	(envelope-from <avi@redhat.com>) id 1Kpmzm-0000vP-Pa
	for qemu-devel@nongnu.org; Tue, 14 Oct 2008 12:43:43 -0400
Message-ID: <48F4CC05.3090408@redhat.com>
Date: Tue, 14 Oct 2008 18:42:45 +0200
From: Avi Kivity <avi@redhat.com>
MIME-Version: 1.0
Subject: Re: [Qemu-devel] Re: [RFC] Disk integrity in QEMU
References: <48EE38B9.2050106@codemonkey.ws>	<20081013170610.GF21410@us.ibm.com>
	<48F396C2.30704@codemonkey.ws>
In-Reply-To: <48F396C2.30704@codemonkey.ws>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Reply-To: qemu-devel@nongnu.org
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: qemu-devel@nongnu.org
Cc: Chris Wright <chrisw@redhat.com>, Mark McLoughlin <markmc@redhat.com>, Ryan Harper <ryanh@us.ibm.com>, Laurent Vivier <Laurent.Vivier@bull.net>, kvm-devel <kvm-devel@lists.sourceforge.net>

Anthony Liguori wrote:
>
> With 16k writes I think we hit a pathological case with the particular
> storage backend we're using since it has many disks and the volume is
> striped.  Also the results a bit different when going through a file
> system verses a LVM partition (the later being the first data set). 
> Presumably, this is because even with no flags, writes happen
> synchronously to a LVM partition.
>

With no flags, writes should hit the buffer cache (which is the page
cache's name when used to cache block devices).

> Also, cache=off seems to do pretty terribly when operating on an ext3
> file.  I suspect this has to do with how ext3 implements O_DIRECT.

Is the file horribly fragmented?  Otherwise ext3 O_DIRECT should be
quite good.

Maybe the mapping is not in the host cache and has to be brought in.

>
> However, the data demonstrates pretty nicely that O_DSYNC gives you
> native write speed, but accelerated read speed which I think we agree
> is the desirable behavior.  cache=off never seems to outperform
> cache=wt which is another good argument for it being the default over
> cache=off.

Without copyless block I/O, there's no reason to expect cache=none to
outperform cache=writethrough.  I expect the read performance to
evaporate with a random access pattern over a large disk (or even
sequential access, given enough running time).

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.