From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1KpSfk-0006RA-DO
	for qemu-devel@nongnu.org; Mon, 13 Oct 2008 15:01:40 -0400
Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1KpSfj-0006Qc-NW
	for qemu-devel@nongnu.org; Mon, 13 Oct 2008 15:01:40 -0400
Received: from [199.232.76.173] (port=39572 helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1KpSfj-0006QV-Gb
	for qemu-devel@nongnu.org; Mon, 13 Oct 2008 15:01:39 -0400
Received: from mx1.redhat.com ([66.187.233.31]:59599)
	by monty-python.gnu.org with esmtp (Exim 4.60)
	(envelope-from <mwagner@redhat.com>) id 1KpSfi-00026x-VW
	for qemu-devel@nongnu.org; Mon, 13 Oct 2008 15:01:39 -0400
Message-ID: <48F39AE3.4060000@redhat.com>
Date: Mon, 13 Oct 2008 15:00:51 -0400
From: Mark Wagner <mwagner@redhat.com>
MIME-Version: 1.0
Subject: Re: [Qemu-devel] Re: [RFC] Disk integrity in QEMU
References: <48EE38B9.2050106@codemonkey.ws>
	<20081013170610.GF21410@us.ibm.com>
In-Reply-To: <20081013170610.GF21410@us.ibm.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Reply-To: qemu-devel@nongnu.org
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: qemu-devel@nongnu.org
Cc: Chris Wright <chrisw@redhat.com>, Mark McLoughlin <markmc@redhat.com>, kvm-devel <kvm-devel@lists.sourceforge.net>, Laurent Vivier <Laurent.Vivier@bull.net>, Ryan Harper <ryanh@us.ibm.com>

Ryan Harper wrote:
> * Anthony Liguori <anthony@codemonkey.ws> [2008-10-09 12:00]:
>   
>> Read performance should be unaffected by using O_DSYNC.  O_DIRECT will 
>> significantly reduce read performance.  I think we should use O_DSYNC by 
>> default and I have sent out a patch that contains that.  We will follow 
>> up with benchmarks to demonstrate this.
>>     
>
>
>  baremetal baseline (1g dataset):
>  ---------------------------+-------+-------+--------------+------------+
>  Test scenarios             | bandw | % CPU | ave submit   | ave compl  |
>  type, block size, iface    | MB/s  | usage | latency usec | latency ms |
>  ---------------------------+-------+-------+--------------+------------+
>  write, 16k, lvm, direct=1  | 127.7 |  12   |   11.66      |    9.48    |
>  write, 64k, lvm, direct=1  | 178.4 |   5   |   13.65      |   27.15    |
>  write, 1M,  lvm, direct=1  | 186.0 |   3   |  163.75      |  416.91    |
>  ---------------------------+-------+-------+--------------+------------+
>  read , 16k, lvm, direct=1  | 170.4 |  15   |   10.86      |    7.10    |
>  read , 64k, lvm, direct=1  | 199.2 |   5   |   12.52      |   24.31    |
>  read , 1M,  lvm, direct=1  | 202.0 |   3   |  133.74      |  382.67    |
>  ---------------------------+-------+-------+--------------+------------+
>
>  kvm write (1g dataset):
>  ---------------------------+-------+-------+--------------+------------+
>  Test scenarios             | bandw | % CPU | ave submit   | ave compl  |
>  block size,iface,cache,sync| MB/s  | usage | latency usec | latency ms |
>  ---------------------------+-------+-------+--------------+------------+
>  16k,virtio,off,none        | 135.0 |  94   |    9.1       |    8.71    |
>  16k,virtio,on ,none        | 184.0 | 100   |   63.69      |   63.48    |
>  16k,virtio,on ,O_DSYNC     | 150.0 |  35   |    6.63      |    8.31    |
>  ---------------------------+-------+-------+--------------+------------+
>  64k,virtio,off,none        | 169.0 |  51   |   17.10      |   28.00    |
>  64k,virtio,on ,none        | 189.0 |  60   |   69.42      |   24.92    |
>  64k,virtio,on ,O_DSYNC     | 171.0 |  48   |   18.83      |   27.72    |
>  ---------------------------+-------+-------+--------------+------------+
>  1M ,virtio,off,none        | 142.0 |  30   |  7176.00     |  523.00    |
>  1M ,virtio,on ,none        | 190.0 |  45   |  5332.63     |  392.35    |
>  1M ,virtio,on ,O_DSYNC     | 164.0 |  39   |  6444.48     |  471.20    |
>  ---------------------------+-------+-------+--------------+------------+
>
>  kvm read (1g dataset):
>  ---------------------------+-------+-------+--------------+------------+
>  Test scenarios             | bandw | % CPU | ave submit   | ave compl  |
>  block size,iface,cache,sync| MB/s  | usage | latency usec | latency ms |
>  ---------------------------+-------+-------+--------------+------------+
>  16k,virtio,off,none        | 175.0 |  40   |   22.42      |    6.71    |
>  16k,virtio,on ,none        | 211.0 | 147   |   59.49      |    5.54    |
>  16k,virtio,on ,O_DSYNC     | 212.0 | 145   |   60.45      |    5.47    |
>  ---------------------------+-------+-------+--------------+------------+
>  64k,virtio,off,none        | 190.0 |  64   |   16.31      |   24.92    |
>  64k,virtio,on ,none        | 546.0 | 161   |  111.06      |    8.54    |
>  64k,virtio,on ,O_DSYNC     | 520.0 | 151   |  116.66      |    8.97    |
>  ---------------------------+-------+-------+--------------+------------+
>  1M ,virtio,off,none        | 182.0 |  32   | 5573.44      |  407.21    |
>  1M ,virtio,on ,none        | 750.0 | 127   | 1344.65      |   96.42    |
>  1M ,virtio,on ,O_DSYNC     | 768.0 | 123   | 1289.05      |   94.25    |
>  ---------------------------+-------+-------+--------------+------------+
>
>  --------------------------------------------------------------------------
>  exporting file in ext3 filesystem as block device (1g)
>  --------------------------------------------------------------------------
>
>  kvm write (1g dataset):
>  ---------------------------+-------+-------+--------------+------------+
>  Test scenarios             | bandw | % CPU | ave submit   | ave compl  |
>  block size,iface,cache,sync| MB/s  | usage | latency usec | latency ms |
>  ---------------------------+-------+-------+--------------+------------+
>  16k,virtio,off,none        |  12.1 |  15   |    9.1       |    8.71    |
>  16k,virtio,on ,none        | 192.0 |  52   |   62.52      |    6.17    |
>  16k,virtio,on ,O_DSYNC     | 142.0 |  59   |   18.81      |    8.29    |
>  ---------------------------+-------+-------+--------------+------------+
>  64k,virtio,off,none        |  15.5 |   8   |   21.10      |  311.00    |
>  64k,virtio,on ,none        | 454.0 | 130   |  113.25      |   10.65    |
>  64k,virtio,on ,O_DSYNC     | 154.0 |  48   |   20.25      |   30.75    |
>  ---------------------------+-------+-------+--------------+------------+
>  1M ,virtio,off,none        |  24.7 |   5   | 41736.22     | 3020.08    |
>  1M ,virtio,on ,none        | 485.0 | 100   |  2052.09     |  149.81    |
>  1M ,virtio,on ,O_DSYNC     | 161.0 |  42   |  6268.84     |  453.84    |
>  ---------------------------+-------+-------+--------------+------------+
>
>
> --
> Ryan Harper
> Software Engineer; Linux Technology Center
> IBM Corp., Austin, Tx
> (512) 838-9253   T/L: 678-9253
> ryanh@us.ibm.com
>
>
>   
Ryan

Can you please post the details of the guest and host configurations. 
 From seeing kvm write data that is greater than that of bare metal,
I would think that your test dataset is too small and not
exceeding that of the host cache size.

Our previous testing has shown that once you exceed the host cache
and cause the cache to flush, performance will drop to a point lower
than if you didn't use the cache in the first place.

Can you repeat the tests using a data set that is 2X the size of your
hosts memory and post the results for the community to see?

-mark