From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1KpSfk-0006RA-DO for qemu-devel@nongnu.org; Mon, 13 Oct 2008 15:01:40 -0400 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1KpSfj-0006Qc-NW for qemu-devel@nongnu.org; Mon, 13 Oct 2008 15:01:40 -0400 Received: from [199.232.76.173] (port=39572 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1KpSfj-0006QV-Gb for qemu-devel@nongnu.org; Mon, 13 Oct 2008 15:01:39 -0400 Received: from mx1.redhat.com ([66.187.233.31]:59599) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1KpSfi-00026x-VW for qemu-devel@nongnu.org; Mon, 13 Oct 2008 15:01:39 -0400 Message-ID: <48F39AE3.4060000@redhat.com> Date: Mon, 13 Oct 2008 15:00:51 -0400 From: Mark Wagner MIME-Version: 1.0 Subject: Re: [Qemu-devel] Re: [RFC] Disk integrity in QEMU References: <48EE38B9.2050106@codemonkey.ws> <20081013170610.GF21410@us.ibm.com> In-Reply-To: <20081013170610.GF21410@us.ibm.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Reply-To: qemu-devel@nongnu.org List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel@nongnu.org Cc: Chris Wright , Mark McLoughlin , kvm-devel , Laurent Vivier , Ryan Harper Ryan Harper wrote: > * Anthony Liguori [2008-10-09 12:00]: > >> Read performance should be unaffected by using O_DSYNC. O_DIRECT will >> significantly reduce read performance. I think we should use O_DSYNC by >> default and I have sent out a patch that contains that. We will follow >> up with benchmarks to demonstrate this. >> > > > baremetal baseline (1g dataset): > ---------------------------+-------+-------+--------------+------------+ > Test scenarios | bandw | % CPU | ave submit | ave compl | > type, block size, iface | MB/s | usage | latency usec | latency ms | > ---------------------------+-------+-------+--------------+------------+ > write, 16k, lvm, direct=1 | 127.7 | 12 | 11.66 | 9.48 | > write, 64k, lvm, direct=1 | 178.4 | 5 | 13.65 | 27.15 | > write, 1M, lvm, direct=1 | 186.0 | 3 | 163.75 | 416.91 | > ---------------------------+-------+-------+--------------+------------+ > read , 16k, lvm, direct=1 | 170.4 | 15 | 10.86 | 7.10 | > read , 64k, lvm, direct=1 | 199.2 | 5 | 12.52 | 24.31 | > read , 1M, lvm, direct=1 | 202.0 | 3 | 133.74 | 382.67 | > ---------------------------+-------+-------+--------------+------------+ > > kvm write (1g dataset): > ---------------------------+-------+-------+--------------+------------+ > Test scenarios | bandw | % CPU | ave submit | ave compl | > block size,iface,cache,sync| MB/s | usage | latency usec | latency ms | > ---------------------------+-------+-------+--------------+------------+ > 16k,virtio,off,none | 135.0 | 94 | 9.1 | 8.71 | > 16k,virtio,on ,none | 184.0 | 100 | 63.69 | 63.48 | > 16k,virtio,on ,O_DSYNC | 150.0 | 35 | 6.63 | 8.31 | > ---------------------------+-------+-------+--------------+------------+ > 64k,virtio,off,none | 169.0 | 51 | 17.10 | 28.00 | > 64k,virtio,on ,none | 189.0 | 60 | 69.42 | 24.92 | > 64k,virtio,on ,O_DSYNC | 171.0 | 48 | 18.83 | 27.72 | > ---------------------------+-------+-------+--------------+------------+ > 1M ,virtio,off,none | 142.0 | 30 | 7176.00 | 523.00 | > 1M ,virtio,on ,none | 190.0 | 45 | 5332.63 | 392.35 | > 1M ,virtio,on ,O_DSYNC | 164.0 | 39 | 6444.48 | 471.20 | > ---------------------------+-------+-------+--------------+------------+ > > kvm read (1g dataset): > ---------------------------+-------+-------+--------------+------------+ > Test scenarios | bandw | % CPU | ave submit | ave compl | > block size,iface,cache,sync| MB/s | usage | latency usec | latency ms | > ---------------------------+-------+-------+--------------+------------+ > 16k,virtio,off,none | 175.0 | 40 | 22.42 | 6.71 | > 16k,virtio,on ,none | 211.0 | 147 | 59.49 | 5.54 | > 16k,virtio,on ,O_DSYNC | 212.0 | 145 | 60.45 | 5.47 | > ---------------------------+-------+-------+--------------+------------+ > 64k,virtio,off,none | 190.0 | 64 | 16.31 | 24.92 | > 64k,virtio,on ,none | 546.0 | 161 | 111.06 | 8.54 | > 64k,virtio,on ,O_DSYNC | 520.0 | 151 | 116.66 | 8.97 | > ---------------------------+-------+-------+--------------+------------+ > 1M ,virtio,off,none | 182.0 | 32 | 5573.44 | 407.21 | > 1M ,virtio,on ,none | 750.0 | 127 | 1344.65 | 96.42 | > 1M ,virtio,on ,O_DSYNC | 768.0 | 123 | 1289.05 | 94.25 | > ---------------------------+-------+-------+--------------+------------+ > > -------------------------------------------------------------------------- > exporting file in ext3 filesystem as block device (1g) > -------------------------------------------------------------------------- > > kvm write (1g dataset): > ---------------------------+-------+-------+--------------+------------+ > Test scenarios | bandw | % CPU | ave submit | ave compl | > block size,iface,cache,sync| MB/s | usage | latency usec | latency ms | > ---------------------------+-------+-------+--------------+------------+ > 16k,virtio,off,none | 12.1 | 15 | 9.1 | 8.71 | > 16k,virtio,on ,none | 192.0 | 52 | 62.52 | 6.17 | > 16k,virtio,on ,O_DSYNC | 142.0 | 59 | 18.81 | 8.29 | > ---------------------------+-------+-------+--------------+------------+ > 64k,virtio,off,none | 15.5 | 8 | 21.10 | 311.00 | > 64k,virtio,on ,none | 454.0 | 130 | 113.25 | 10.65 | > 64k,virtio,on ,O_DSYNC | 154.0 | 48 | 20.25 | 30.75 | > ---------------------------+-------+-------+--------------+------------+ > 1M ,virtio,off,none | 24.7 | 5 | 41736.22 | 3020.08 | > 1M ,virtio,on ,none | 485.0 | 100 | 2052.09 | 149.81 | > 1M ,virtio,on ,O_DSYNC | 161.0 | 42 | 6268.84 | 453.84 | > ---------------------------+-------+-------+--------------+------------+ > > > -- > Ryan Harper > Software Engineer; Linux Technology Center > IBM Corp., Austin, Tx > (512) 838-9253 T/L: 678-9253 > ryanh@us.ibm.com > > > Ryan Can you please post the details of the guest and host configurations. From seeing kvm write data that is greater than that of bare metal, I would think that your test dataset is too small and not exceeding that of the host cache size. Our previous testing has shown that once you exceed the host cache and cause the cache to flush, performance will drop to a point lower than if you didn't use the cache in the first place. Can you repeat the tests using a data set that is 2X the size of your hosts memory and post the results for the community to see? -mark