From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1KpTKN-0006bW-RB
	for qemu-devel@nongnu.org; Mon, 13 Oct 2008 15:43:39 -0400
Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1KpTKL-0006YH-GL
	for qemu-devel@nongnu.org; Mon, 13 Oct 2008 15:43:39 -0400
Received: from [199.232.76.173] (port=42020 helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1KpTKL-0006YB-Bg
	for qemu-devel@nongnu.org; Mon, 13 Oct 2008 15:43:37 -0400
Received: from e1.ny.us.ibm.com ([32.97.182.141]:46315)
	by monty-python.gnu.org with esmtps
	(TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.60)
	(envelope-from <ryanh@us.ibm.com>) id 1KpTKK-0001Y4-Fb
	for qemu-devel@nongnu.org; Mon, 13 Oct 2008 15:43:37 -0400
Received: from d01relay04.pok.ibm.com (d01relay04.pok.ibm.com [9.56.227.236])
	by e1.ny.us.ibm.com (8.13.8/8.13.8) with ESMTP id m9DJhVAj018195
	for <qemu-devel@nongnu.org>; Mon, 13 Oct 2008 15:43:31 -0400
Received: from d01av04.pok.ibm.com (d01av04.pok.ibm.com [9.56.224.64])
	by d01relay04.pok.ibm.com (8.13.8/8.13.8/NCO v9.1) with ESMTP id
	m9DJhUbW233060
	for <qemu-devel@nongnu.org>; Mon, 13 Oct 2008 15:43:30 -0400
Received: from d01av04.pok.ibm.com (loopback [127.0.0.1])
	by d01av04.pok.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id
	m9DJhU6G021027
	for <qemu-devel@nongnu.org>; Mon, 13 Oct 2008 15:43:30 -0400
Date: Mon, 13 Oct 2008 14:43:28 -0500
From: Ryan Harper <ryanh@us.ibm.com>
Subject: Re: [Qemu-devel] Re: [RFC] Disk integrity in QEMU
Message-ID: <20081013194328.GJ21410@us.ibm.com>
References: <48EE38B9.2050106@codemonkey.ws>
	<20081013170610.GF21410@us.ibm.com>
	<6A99DBA5-D422-447D-BF9D-019FB394E6C6@lvivier.info>
Mime-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
In-Reply-To: <6A99DBA5-D422-447D-BF9D-019FB394E6C6@lvivier.info>
Content-Transfer-Encoding: quoted-printable
Reply-To: qemu-devel@nongnu.org
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Laurent Vivier <laurent@lvivier.info>
Cc: Chris Wright <chrisw@redhat.com>, Mark McLoughlin <markmc@redhat.com>, kvm-devel <kvm-devel@lists.sourceforge.net>, Laurent Vivier <Laurent.Vivier@bull.net>, qemu-devel@nongnu.org, Ryan Harper <ryanh@us.ibm.com>

* Laurent Vivier <laurent@lvivier.info> [2008-10-13 13:52]:
>=20
> Le 13 oct. 08 =E0 19:06, Ryan Harper a =E9crit :
>=20
> >* Anthony Liguori <anthony@codemonkey.ws> [2008-10-09 12:00]:
> >>Read performance should be unaffected by using O_DSYNC.  O_DIRECT =20
> >>will
> >>significantly reduce read performance.  I think we should use =20
> >>O_DSYNC by
> >>default and I have sent out a patch that contains that.  We will =20
> >>follow
> >>up with benchmarks to demonstrate this.
> >
>=20
> Hi Ryan,
>=20
> as "cache=3Don" implies a factor (memory) shared by the whole system, =20
> you must take into account the size of the host memory and run some =20
> applications (several guests ?) to pollute the host cache, for =20
> instance you can run 4 guest and run bench in each of them =20
> concurrently, and you could reasonably limits the size of the host =20
> memory to 5 x the size of the guest memory.
> (for instance 4 guests with 128 MB on a host with 768 MB).

I'm not following you here, the only assumption I see is that we have 1g
of host mem free for caching the write.


>=20
> as O_DSYNC implies journal commit, you should run a bench on the ext3 =20
> host file system concurrently to the bench on a guest to see the =20
> impact of the commit on each bench.

I understand the goal here, but what sort of host ext3 journaling load
is appropriate.  Additionally, when we're exporting block devices, I
don't believe the ext3 journal is an issue.

>=20
> >
> >baremetal baseline (1g dataset):
> >---------------------------+-------+-------+--------------=20
> >+------------+
> >Test scenarios             | bandw | % CPU | ave submit   | ave =20
> >compl  |
> >type, block size, iface    | MB/s  | usage | latency usec | latency =20
> >ms |
> >---------------------------+-------+-------+--------------=20
> >+------------+
> >write, 16k, lvm, direct=3D1  | 127.7 |  12   |   11.66      |    =20
> >9.48    |
> >write, 64k, lvm, direct=3D1  | 178.4 |   5   |   13.65      |   =20
> >27.15    |
> >write, 1M,  lvm, direct=3D1  | 186.0 |   3   |  163.75      |  =20
> >416.91    |
> >---------------------------+-------+-------+--------------=20
> >+------------+
> >read , 16k, lvm, direct=3D1  | 170.4 |  15   |   10.86      |    =20
> >7.10    |
> >read , 64k, lvm, direct=3D1  | 199.2 |   5   |   12.52      |   =20
> >24.31    |
> >read , 1M,  lvm, direct=3D1  | 202.0 |   3   |  133.74      |  =20
> >382.67    |
> >---------------------------+-------+-------+--------------=20
> >+------------+
> >
>=20
> Could you recall which benchmark you use ?

yeah:

fio --name=3Dguestrun --filename=3D/dev/vda --rw=3Dwrite --bs=3D${SIZE}
--ioengine=3Dlibaio --direct=3D1 --norandommap --numjobs=3D1 --group_repo=
rting
--thread --size=3D1g --write_lat_log --write_bw_log --iodepth=3D74

>=20
> >kvm write (1g dataset):
> >---------------------------+-------+-------+--------------=20
> >+------------+
> >Test scenarios             | bandw | % CPU | ave submit   | ave =20
> >compl  |
> >block size,iface,cache,sync| MB/s  | usage | latency usec | latency =20
> >ms |
> >---------------------------+-------+-------+--------------=20
> >+------------+
> >16k,virtio,off,none        | 135.0 |  94   |    9.1       |    =20
> >8.71    |
> >16k,virtio,on ,none        | 184.0 | 100   |   63.69      |   =20
> >63.48    |
> >16k,virtio,on ,O_DSYNC     | 150.0 |  35   |    6.63      |    =20
> >8.31    |
> >---------------------------+-------+-------+--------------=20
> >+------------+
> >64k,virtio,off,none        | 169.0 |  51   |   17.10      |   =20
> >28.00    |
> >64k,virtio,on ,none        | 189.0 |  60   |   69.42      |   =20
> >24.92    |
> >64k,virtio,on ,O_DSYNC     | 171.0 |  48   |   18.83      |   =20
> >27.72    |
> >---------------------------+-------+-------+--------------=20
> >+------------+
> >1M ,virtio,off,none        | 142.0 |  30   |  7176.00     |  =20
> >523.00    |
> >1M ,virtio,on ,none        | 190.0 |  45   |  5332.63     |  =20
> >392.35    |
> >1M ,virtio,on ,O_DSYNC     | 164.0 |  39   |  6444.48     |  =20
> >471.20    |
> >---------------------------+-------+-------+--------------=20
> >+------------+
>=20
> According to the semantic, I don't understand how O_DSYNC can be =20
> better than cache=3Doff in this case...

I don't have a good answer either, but O_DIRECT and O_DSYNC are
different paths through the kernel.  This deserves a better reply, but
I don't have one off the top of my head.

>=20
> >
> >kvm read (1g dataset):
> >---------------------------+-------+-------+--------------=20
> >+------------+
> >Test scenarios             | bandw | % CPU | ave submit   | ave =20
> >compl  |
> >block size,iface,cache,sync| MB/s  | usage | latency usec | latency =20
> >ms |
> >---------------------------+-------+-------+--------------=20
> >+------------+
> >16k,virtio,off,none        | 175.0 |  40   |   22.42      |    =20
> >6.71    |
> >16k,virtio,on ,none        | 211.0 | 147   |   59.49      |    =20
> >5.54    |
> >16k,virtio,on ,O_DSYNC     | 212.0 | 145   |   60.45      |    =20
> >5.47    |
> >---------------------------+-------+-------+--------------=20
> >+------------+
> >64k,virtio,off,none        | 190.0 |  64   |   16.31      |   =20
> >24.92    |
> >64k,virtio,on ,none        | 546.0 | 161   |  111.06      |    =20
> >8.54    |
> >64k,virtio,on ,O_DSYNC     | 520.0 | 151   |  116.66      |    =20
> >8.97    |
> >---------------------------+-------+-------+--------------=20
> >+------------+
> >1M ,virtio,off,none        | 182.0 |  32   | 5573.44      |  =20
> >407.21    |
> >1M ,virtio,on ,none        | 750.0 | 127   | 1344.65      |   =20
> >96.42    |
> >1M ,virtio,on ,O_DSYNC     | 768.0 | 123   | 1289.05      |   =20
> >94.25    |
> >---------------------------+-------+-------+--------------=20
> >+------------+
>=20
> OK, but in this case the size of the cache for "cache=3Doff" is the siz=
e =20
> of the guest cache whereas in the other cases the size of the cache is =
=20
> the size of the guest cache + the size of the host cache, this is not =20
> fair...

it isn't supposed to be fair, cache=3Doff is O_DIRECT, we're reading from
the device, we *want* to be able to lean on the host cache to read the
data, pay once and benefit in other guests if possible.

>=20
> >
> >----------------------------------------------------------------------=
----
> >exporting file in ext3 filesystem as block device (1g)
> >----------------------------------------------------------------------=
----
> >
> >kvm write (1g dataset):
> >---------------------------+-------+-------+--------------=20
> >+------------+
> >Test scenarios             | bandw | % CPU | ave submit   | ave =20
> >compl  |
> >block size,iface,cache,sync| MB/s  | usage | latency usec | latency =20
> >ms |
> >---------------------------+-------+-------+--------------=20
> >+------------+
> >16k,virtio,off,none        |  12.1 |  15   |    9.1       |    =20
> >8.71    |
> >16k,virtio,on ,none        | 192.0 |  52   |   62.52      |    =20
> >6.17    |
> >16k,virtio,on ,O_DSYNC     | 142.0 |  59   |   18.81      |    =20
> >8.29    |
> >---------------------------+-------+-------+--------------=20
> >+------------+
> >64k,virtio,off,none        |  15.5 |   8   |   21.10      |  =20
> >311.00    |
> >64k,virtio,on ,none        | 454.0 | 130   |  113.25      |   =20
> >10.65    |
> >64k,virtio,on ,O_DSYNC     | 154.0 |  48   |   20.25      |   =20
> >30.75    |
> >---------------------------+-------+-------+--------------=20
> >+------------+
> >1M ,virtio,off,none        |  24.7 |   5   | 41736.22     | =20
> >3020.08    |
> >1M ,virtio,on ,none        | 485.0 | 100   |  2052.09     |  =20
> >149.81    |
> >1M ,virtio,on ,O_DSYNC     | 161.0 |  42   |  6268.84     |  =20
> >453.84    |
> >---------------------------+-------+-------+--------------=20
> >+------------+
>=20
> What file type do you use (qcow2, raw ?).

Raw.

--=20
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
(512) 838-9253   T/L: 678-9253
ryanh@us.ibm.com