From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:48995) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WuREJ-0007Hz-2h for qemu-devel@nongnu.org; Tue, 10 Jun 2014 14:57:16 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WuRE9-0007PJ-Q4 for qemu-devel@nongnu.org; Tue, 10 Jun 2014 14:57:06 -0400 Received: from e28smtp05.in.ibm.com ([122.248.162.5]:35886) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WuRE9-0007PB-5K for qemu-devel@nongnu.org; Tue, 10 Jun 2014 14:56:57 -0400 Received: from /spool/local by e28smtp05.in.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 11 Jun 2014 00:26:52 +0530 Received: from d28relay03.in.ibm.com (d28relay03.in.ibm.com [9.184.220.60]) by d28dlp02.in.ibm.com (Postfix) with ESMTP id 5FBED3940058 for ; Wed, 11 Jun 2014 00:26:48 +0530 (IST) Received: from d28av05.in.ibm.com (d28av05.in.ibm.com [9.184.220.67]) by d28relay03.in.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id s5AIveoD2425166 for ; Wed, 11 Jun 2014 00:27:40 +0530 Received: from d28av05.in.ibm.com (localhost [127.0.0.1]) by d28av05.in.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id s5AIulJY002534 for ; Wed, 11 Jun 2014 00:26:47 +0530 Message-ID: <539754EC.3070600@us.ibm.com> Date: Tue, 10 Jun 2014 13:56:44 -0500 From: Karl Rister MIME-Version: 1.0 References: <53961C5B.9020201@us.ibm.com> <20140610014038.GA11308@T430.nay.redhat.com> In-Reply-To: <20140610014038.GA11308@T430.nay.redhat.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] dataplane performance on s390 List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Fam Zheng Cc: pbonzini@redhat.com, qemu-devel@nongnu.org, stefanha@redhat.com On 06/09/2014 08:40 PM, Fam Zheng wrote: > On Mon, 06/09 15:43, Karl Rister wrote: >> Hi All >> >> I was asked by our development team to do a performance sniff test of the >> latest dataplane code on s390 and compare it against qemu.git. Here is a >> brief description of the configuration, the testing done, and then the >> results. >> >> Configuration: >> >> Host: 26 CPU LPAR, 64GB, 8 zFCP adapters >> Guest: 4 VCPU, 1GB, 128 virtio block devices >> >> Each virtio block device maps to a dm-multipath device in the host with 8 >> paths. Multipath is configured with the service-time policy. All block >> devices are configured to use the deadline IO scheduler. >> >> Test: >> >> FIO is used to run 4 scenarios: sequential read, sequential write, random >> read, and random write. Sequential scenarios use a 128KB request size and >> random scenarios us a 8KB request size. Each scenario is run with an >> increasing number of jobs, from 1 to 128 (powers of 2). Each job is bound >> to an individual file on an ext3 file system on a virtio device and uses >> O_DIRECT, libaio, and iodepth=1. Each test is run three times for 2 minutes >> each, the first iteration (a warmup) is thrown out and the next two >> iterations are averaged together. >> >> Results: >> >> Baseline: qemu.git 93f94f9018229f146ed6bbe9e5ff72d67e4bd7ab >> >> Dataplane: bdrv_set_aio_context 0ab50cde71aa27f39b8a3ea4766ff82671adb2a4 > > Hi Karl, > > Thanks for the results. > > The throughput differences look minimal, where is the bandwidth saturated in > these tests? And why use iodepth=1, not more? Hi Fam Based on previously collected data, the configuration is hitting saturation at the following points: Sequential Read: 128 jobs Sequential Write: 32 jobs Random Read: 64 jobs Random Write: saturation not reached The iodepth=1 configuration is a somewhat arbitrary choice that is only limited by machine run time, I could certainly run higher loads and at times I do. Thanks. Karl > > Thanks, > Fam > >> >> Sequential Read: >> >> Overall a slight throughput regression with a noticeable reduction in CPU >> efficiency. >> >> 1 Job: Throughput regressed -1.4%, CPU improved -0.83%. >> 2 Job: Throughput regressed -2.5%, CPU regressed +2.81% >> 4 Job: Throughput regressed -2.2%, CPU regressed +12.22% >> 8 Job: Throughput regressed -0.7%, CPU regressed +9.77% >> 16 Job: Throughput regressed -3.4%, CPU regressed +7.04% >> 32 Job: Throughput regressed -1.8%, CPU regressed +12.03% >> 64 Job: Throughput regressed -0.1%, CPU regressed +10.60% >> 128 Job: Throughput increased +0.3%, CPU regressed +10.70% >> >> Sequential Write: >> >> Mostly regressed throughput, although it gets better as job count increases >> and even has some gains at higher job counts. CPU efficiency is regressed. >> >> 1 Job: Throughput regressed -1.9%, CPU regressed +0.90% >> 2 Job: Throughput regressed -2.0%, CPU regressed +1.07% >> 4 Job: Throughput regressed -2.4%, CPU regressed +8.68% >> 8 Job: Throughput regressed -2.0%, CPU regressed +4.23% >> 16 Job: Throughput regressed -5.0%, CPU regressed +10.53% >> 32 Job: Throughput improved +7.6%, CPU regressed +7.37% >> 64 Job: Throughput regressed -0.6%, CPU regressed +7.29% >> 128 Job: Throughput improved +8.3%, CPU regressed +6.68% >> >> Random Read: >> >> Again, mostly throughput regressions except for the largest job counts. CPU >> efficiency is regressed at all data points. >> >> 1 Job: Throughput regressed -3.0%, CPU regressed +0.14% >> 2 Job: Throughput regressed -3.6%, CPU regressed +6.86% >> 4 Job: Throughput regressed -5.1%, CPU regressed +11.11% >> 8 Job: Throughput regressed -8.6%, CPU regressed +12.32% >> 16 Job: Throughput regressed -5.7%, CPU regressed +12.99% >> 32 Job: Throughput regressed -7.4%, CPU regressed +7.62% >> 64 Job: Throughput improved +10.0%, CPU regressed +10.83% >> 128 Job: Throughput improved +10.7%, CPU regressed +10.85% >> >> Random Write: >> >> Throughput and CPU regressed at all but one data point. >> >> 1 Job: Throughput regressed -2.3%, CPU improved -1.50% >> 2 Job: Throughput regressed -2.2%, CPU regressed +0.16% >> 4 Job: Throughput regressed -1.0%, CPU regressed +8.36% >> 8 Job: Throughput regressed -8.6%, CPU regressed +12.47% >> 16 Job: Throughput regressed -3.1%, CPU regressed +12.40% >> 32 Job: Throughput regressed -0.2%, CPU regressed +11.59% >> 64 Job: Throughput regressed -1.9%, CPU regressed +12.65% >> 128 Job: Throughput improved +5.6%, CPU regressed +11.68% >> >> >> * CPU consumption is an efficiency calculation of usage per MB of >> throughput. >> >> -- >> Karl Rister >> IBM Linux/KVM Development Optimization >> >> > -- Karl Rister IBM Linux/KVM Development Optimization