From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1KpTKN-0006bW-RB for qemu-devel@nongnu.org; Mon, 13 Oct 2008 15:43:39 -0400 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1KpTKL-0006YH-GL for qemu-devel@nongnu.org; Mon, 13 Oct 2008 15:43:39 -0400 Received: from [199.232.76.173] (port=42020 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1KpTKL-0006YB-Bg for qemu-devel@nongnu.org; Mon, 13 Oct 2008 15:43:37 -0400 Received: from e1.ny.us.ibm.com ([32.97.182.141]:46315) by monty-python.gnu.org with esmtps (TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1KpTKK-0001Y4-Fb for qemu-devel@nongnu.org; Mon, 13 Oct 2008 15:43:37 -0400 Received: from d01relay04.pok.ibm.com (d01relay04.pok.ibm.com [9.56.227.236]) by e1.ny.us.ibm.com (8.13.8/8.13.8) with ESMTP id m9DJhVAj018195 for ; Mon, 13 Oct 2008 15:43:31 -0400 Received: from d01av04.pok.ibm.com (d01av04.pok.ibm.com [9.56.224.64]) by d01relay04.pok.ibm.com (8.13.8/8.13.8/NCO v9.1) with ESMTP id m9DJhUbW233060 for ; Mon, 13 Oct 2008 15:43:30 -0400 Received: from d01av04.pok.ibm.com (loopback [127.0.0.1]) by d01av04.pok.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id m9DJhU6G021027 for ; Mon, 13 Oct 2008 15:43:30 -0400 Date: Mon, 13 Oct 2008 14:43:28 -0500 From: Ryan Harper Subject: Re: [Qemu-devel] Re: [RFC] Disk integrity in QEMU Message-ID: <20081013194328.GJ21410@us.ibm.com> References: <48EE38B9.2050106@codemonkey.ws> <20081013170610.GF21410@us.ibm.com> <6A99DBA5-D422-447D-BF9D-019FB394E6C6@lvivier.info> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: <6A99DBA5-D422-447D-BF9D-019FB394E6C6@lvivier.info> Content-Transfer-Encoding: quoted-printable Reply-To: qemu-devel@nongnu.org List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Laurent Vivier Cc: Chris Wright , Mark McLoughlin , kvm-devel , Laurent Vivier , qemu-devel@nongnu.org, Ryan Harper * Laurent Vivier [2008-10-13 13:52]: >=20 > Le 13 oct. 08 =E0 19:06, Ryan Harper a =E9crit : >=20 > >* Anthony Liguori [2008-10-09 12:00]: > >>Read performance should be unaffected by using O_DSYNC. O_DIRECT =20 > >>will > >>significantly reduce read performance. I think we should use =20 > >>O_DSYNC by > >>default and I have sent out a patch that contains that. We will =20 > >>follow > >>up with benchmarks to demonstrate this. > > >=20 > Hi Ryan, >=20 > as "cache=3Don" implies a factor (memory) shared by the whole system, =20 > you must take into account the size of the host memory and run some =20 > applications (several guests ?) to pollute the host cache, for =20 > instance you can run 4 guest and run bench in each of them =20 > concurrently, and you could reasonably limits the size of the host =20 > memory to 5 x the size of the guest memory. > (for instance 4 guests with 128 MB on a host with 768 MB). I'm not following you here, the only assumption I see is that we have 1g of host mem free for caching the write. >=20 > as O_DSYNC implies journal commit, you should run a bench on the ext3 =20 > host file system concurrently to the bench on a guest to see the =20 > impact of the commit on each bench. I understand the goal here, but what sort of host ext3 journaling load is appropriate. Additionally, when we're exporting block devices, I don't believe the ext3 journal is an issue. >=20 > > > >baremetal baseline (1g dataset): > >---------------------------+-------+-------+--------------=20 > >+------------+ > >Test scenarios | bandw | % CPU | ave submit | ave =20 > >compl | > >type, block size, iface | MB/s | usage | latency usec | latency =20 > >ms | > >---------------------------+-------+-------+--------------=20 > >+------------+ > >write, 16k, lvm, direct=3D1 | 127.7 | 12 | 11.66 | =20 > >9.48 | > >write, 64k, lvm, direct=3D1 | 178.4 | 5 | 13.65 | =20 > >27.15 | > >write, 1M, lvm, direct=3D1 | 186.0 | 3 | 163.75 | =20 > >416.91 | > >---------------------------+-------+-------+--------------=20 > >+------------+ > >read , 16k, lvm, direct=3D1 | 170.4 | 15 | 10.86 | =20 > >7.10 | > >read , 64k, lvm, direct=3D1 | 199.2 | 5 | 12.52 | =20 > >24.31 | > >read , 1M, lvm, direct=3D1 | 202.0 | 3 | 133.74 | =20 > >382.67 | > >---------------------------+-------+-------+--------------=20 > >+------------+ > > >=20 > Could you recall which benchmark you use ? yeah: fio --name=3Dguestrun --filename=3D/dev/vda --rw=3Dwrite --bs=3D${SIZE} --ioengine=3Dlibaio --direct=3D1 --norandommap --numjobs=3D1 --group_repo= rting --thread --size=3D1g --write_lat_log --write_bw_log --iodepth=3D74 >=20 > >kvm write (1g dataset): > >---------------------------+-------+-------+--------------=20 > >+------------+ > >Test scenarios | bandw | % CPU | ave submit | ave =20 > >compl | > >block size,iface,cache,sync| MB/s | usage | latency usec | latency =20 > >ms | > >---------------------------+-------+-------+--------------=20 > >+------------+ > >16k,virtio,off,none | 135.0 | 94 | 9.1 | =20 > >8.71 | > >16k,virtio,on ,none | 184.0 | 100 | 63.69 | =20 > >63.48 | > >16k,virtio,on ,O_DSYNC | 150.0 | 35 | 6.63 | =20 > >8.31 | > >---------------------------+-------+-------+--------------=20 > >+------------+ > >64k,virtio,off,none | 169.0 | 51 | 17.10 | =20 > >28.00 | > >64k,virtio,on ,none | 189.0 | 60 | 69.42 | =20 > >24.92 | > >64k,virtio,on ,O_DSYNC | 171.0 | 48 | 18.83 | =20 > >27.72 | > >---------------------------+-------+-------+--------------=20 > >+------------+ > >1M ,virtio,off,none | 142.0 | 30 | 7176.00 | =20 > >523.00 | > >1M ,virtio,on ,none | 190.0 | 45 | 5332.63 | =20 > >392.35 | > >1M ,virtio,on ,O_DSYNC | 164.0 | 39 | 6444.48 | =20 > >471.20 | > >---------------------------+-------+-------+--------------=20 > >+------------+ >=20 > According to the semantic, I don't understand how O_DSYNC can be =20 > better than cache=3Doff in this case... I don't have a good answer either, but O_DIRECT and O_DSYNC are different paths through the kernel. This deserves a better reply, but I don't have one off the top of my head. >=20 > > > >kvm read (1g dataset): > >---------------------------+-------+-------+--------------=20 > >+------------+ > >Test scenarios | bandw | % CPU | ave submit | ave =20 > >compl | > >block size,iface,cache,sync| MB/s | usage | latency usec | latency =20 > >ms | > >---------------------------+-------+-------+--------------=20 > >+------------+ > >16k,virtio,off,none | 175.0 | 40 | 22.42 | =20 > >6.71 | > >16k,virtio,on ,none | 211.0 | 147 | 59.49 | =20 > >5.54 | > >16k,virtio,on ,O_DSYNC | 212.0 | 145 | 60.45 | =20 > >5.47 | > >---------------------------+-------+-------+--------------=20 > >+------------+ > >64k,virtio,off,none | 190.0 | 64 | 16.31 | =20 > >24.92 | > >64k,virtio,on ,none | 546.0 | 161 | 111.06 | =20 > >8.54 | > >64k,virtio,on ,O_DSYNC | 520.0 | 151 | 116.66 | =20 > >8.97 | > >---------------------------+-------+-------+--------------=20 > >+------------+ > >1M ,virtio,off,none | 182.0 | 32 | 5573.44 | =20 > >407.21 | > >1M ,virtio,on ,none | 750.0 | 127 | 1344.65 | =20 > >96.42 | > >1M ,virtio,on ,O_DSYNC | 768.0 | 123 | 1289.05 | =20 > >94.25 | > >---------------------------+-------+-------+--------------=20 > >+------------+ >=20 > OK, but in this case the size of the cache for "cache=3Doff" is the siz= e =20 > of the guest cache whereas in the other cases the size of the cache is = =20 > the size of the guest cache + the size of the host cache, this is not =20 > fair... it isn't supposed to be fair, cache=3Doff is O_DIRECT, we're reading from the device, we *want* to be able to lean on the host cache to read the data, pay once and benefit in other guests if possible. >=20 > > > >----------------------------------------------------------------------= ---- > >exporting file in ext3 filesystem as block device (1g) > >----------------------------------------------------------------------= ---- > > > >kvm write (1g dataset): > >---------------------------+-------+-------+--------------=20 > >+------------+ > >Test scenarios | bandw | % CPU | ave submit | ave =20 > >compl | > >block size,iface,cache,sync| MB/s | usage | latency usec | latency =20 > >ms | > >---------------------------+-------+-------+--------------=20 > >+------------+ > >16k,virtio,off,none | 12.1 | 15 | 9.1 | =20 > >8.71 | > >16k,virtio,on ,none | 192.0 | 52 | 62.52 | =20 > >6.17 | > >16k,virtio,on ,O_DSYNC | 142.0 | 59 | 18.81 | =20 > >8.29 | > >---------------------------+-------+-------+--------------=20 > >+------------+ > >64k,virtio,off,none | 15.5 | 8 | 21.10 | =20 > >311.00 | > >64k,virtio,on ,none | 454.0 | 130 | 113.25 | =20 > >10.65 | > >64k,virtio,on ,O_DSYNC | 154.0 | 48 | 20.25 | =20 > >30.75 | > >---------------------------+-------+-------+--------------=20 > >+------------+ > >1M ,virtio,off,none | 24.7 | 5 | 41736.22 | =20 > >3020.08 | > >1M ,virtio,on ,none | 485.0 | 100 | 2052.09 | =20 > >149.81 | > >1M ,virtio,on ,O_DSYNC | 161.0 | 42 | 6268.84 | =20 > >453.84 | > >---------------------------+-------+-------+--------------=20 > >+------------+ >=20 > What file type do you use (qcow2, raw ?). Raw. --=20 Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx (512) 838-9253 T/L: 678-9253 ryanh@us.ibm.com