From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:48995)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <kmr@us.ibm.com>) id 1WuREJ-0007Hz-2h
	for qemu-devel@nongnu.org; Tue, 10 Jun 2014 14:57:16 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <kmr@us.ibm.com>) id 1WuRE9-0007PJ-Q4
	for qemu-devel@nongnu.org; Tue, 10 Jun 2014 14:57:06 -0400
Received: from e28smtp05.in.ibm.com ([122.248.162.5]:35886)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <kmr@us.ibm.com>) id 1WuRE9-0007PB-5K
	for qemu-devel@nongnu.org; Tue, 10 Jun 2014 14:56:57 -0400
Received: from /spool/local
	by e28smtp05.in.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use
	Only! Violators will be prosecuted
	for <qemu-devel@nongnu.org> from <kmr@us.ibm.com>;
	Wed, 11 Jun 2014 00:26:52 +0530
Received: from d28relay03.in.ibm.com (d28relay03.in.ibm.com [9.184.220.60])
	by d28dlp02.in.ibm.com (Postfix) with ESMTP id 5FBED3940058
	for <qemu-devel@nongnu.org>; Wed, 11 Jun 2014 00:26:48 +0530 (IST)
Received: from d28av05.in.ibm.com (d28av05.in.ibm.com [9.184.220.67])
	by d28relay03.in.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id
	s5AIveoD2425166
	for <qemu-devel@nongnu.org>; Wed, 11 Jun 2014 00:27:40 +0530
Received: from d28av05.in.ibm.com (localhost [127.0.0.1])
	by d28av05.in.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id
	s5AIulJY002534
	for <qemu-devel@nongnu.org>; Wed, 11 Jun 2014 00:26:47 +0530
Message-ID: <539754EC.3070600@us.ibm.com>
Date: Tue, 10 Jun 2014 13:56:44 -0500
From: Karl Rister <kmr@us.ibm.com>
MIME-Version: 1.0
References: <53961C5B.9020201@us.ibm.com>
	<20140610014038.GA11308@T430.nay.redhat.com>
In-Reply-To: <20140610014038.GA11308@T430.nay.redhat.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] dataplane performance on s390
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Fam Zheng <famz@redhat.com>
Cc: pbonzini@redhat.com, qemu-devel@nongnu.org, stefanha@redhat.com

On 06/09/2014 08:40 PM, Fam Zheng wrote:
> On Mon, 06/09 15:43, Karl Rister wrote:
>> Hi All
>>
>> I was asked by our development team to do a performance sniff test of the
>> latest dataplane code on s390 and compare it against qemu.git.  Here is a
>> brief description of the configuration, the testing done, and then the
>> results.
>>
>> Configuration:
>>
>> Host: 26 CPU LPAR, 64GB, 8 zFCP adapters
>> Guest: 4 VCPU, 1GB, 128 virtio block devices
>>
>> Each virtio block device maps to a dm-multipath device in the host with 8
>> paths.  Multipath is configured with the service-time policy.  All block
>> devices are configured to use the deadline IO scheduler.
>>
>> Test:
>>
>> FIO is used to run 4 scenarios: sequential read, sequential write, random
>> read, and random write.  Sequential scenarios use a 128KB request size and
>> random scenarios us a 8KB request size.  Each scenario is run with an
>> increasing number of jobs, from 1 to 128 (powers of 2).  Each job is bound
>> to an individual file on an ext3 file system on a virtio device and uses
>> O_DIRECT, libaio, and iodepth=1.  Each test is run three times for 2 minutes
>> each, the first iteration (a warmup) is thrown out and the next two
>> iterations are averaged together.
>>
>> Results:
>>
>> Baseline: qemu.git 93f94f9018229f146ed6bbe9e5ff72d67e4bd7ab
>>
>> Dataplane: bdrv_set_aio_context 0ab50cde71aa27f39b8a3ea4766ff82671adb2a4
>
> Hi Karl,
>
> Thanks for the results.
>
> The throughput differences look minimal, where is the bandwidth saturated in
> these tests?  And why use iodepth=1, not more?

Hi Fam

Based on previously collected data, the configuration is hitting 
saturation at the following points:

Sequential Read: 128 jobs
Sequential Write: 32 jobs
Random Read: 64 jobs
Random Write: saturation not reached

The iodepth=1 configuration is a somewhat arbitrary choice that is only 
limited by machine run time, I could certainly run higher loads and at 
times I do.

Thanks.

Karl

>
> Thanks,
> Fam
>
>>
>> Sequential Read:
>>
>> Overall a slight throughput regression with a noticeable reduction in CPU
>> efficiency.
>>
>> 1 Job: Throughput regressed -1.4%, CPU improved -0.83%.
>> 2 Job: Throughput regressed -2.5%, CPU regressed +2.81%
>> 4 Job: Throughput regressed -2.2%, CPU regressed +12.22%
>> 8 Job: Throughput regressed -0.7%, CPU regressed +9.77%
>> 16 Job: Throughput regressed -3.4%, CPU regressed +7.04%
>> 32 Job: Throughput regressed -1.8%, CPU regressed +12.03%
>> 64 Job: Throughput regressed -0.1%, CPU regressed +10.60%
>> 128 Job: Throughput increased +0.3%, CPU regressed +10.70%
>>
>> Sequential Write:
>>
>> Mostly regressed throughput, although it gets better as job count increases
>> and even has some gains at higher job counts.  CPU efficiency is regressed.
>>
>> 1 Job: Throughput regressed -1.9%, CPU regressed +0.90%
>> 2 Job: Throughput regressed -2.0%, CPU regressed +1.07%
>> 4 Job: Throughput regressed -2.4%, CPU regressed +8.68%
>> 8 Job: Throughput regressed -2.0%, CPU regressed +4.23%
>> 16 Job: Throughput regressed -5.0%, CPU regressed +10.53%
>> 32 Job: Throughput improved +7.6%, CPU regressed +7.37%
>> 64 Job: Throughput regressed -0.6%, CPU regressed +7.29%
>> 128 Job: Throughput improved +8.3%, CPU regressed +6.68%
>>
>> Random Read:
>>
>> Again, mostly throughput regressions except for the largest job counts.  CPU
>> efficiency is regressed at all data points.
>>
>> 1 Job: Throughput regressed -3.0%, CPU regressed +0.14%
>> 2 Job: Throughput regressed -3.6%, CPU regressed +6.86%
>> 4 Job: Throughput regressed -5.1%, CPU regressed +11.11%
>> 8 Job: Throughput regressed -8.6%, CPU regressed +12.32%
>> 16 Job: Throughput regressed -5.7%, CPU regressed +12.99%
>> 32 Job: Throughput regressed -7.4%, CPU regressed +7.62%
>> 64 Job: Throughput improved +10.0%, CPU regressed +10.83%
>> 128 Job: Throughput improved +10.7%, CPU regressed +10.85%
>>
>> Random Write:
>>
>> Throughput and CPU regressed at all but one data point.
>>
>> 1 Job: Throughput regressed -2.3%, CPU improved -1.50%
>> 2 Job: Throughput regressed -2.2%, CPU regressed +0.16%
>> 4 Job: Throughput regressed -1.0%, CPU regressed +8.36%
>> 8 Job: Throughput regressed -8.6%, CPU regressed +12.47%
>> 16 Job: Throughput regressed -3.1%, CPU regressed +12.40%
>> 32 Job: Throughput regressed -0.2%, CPU regressed +11.59%
>> 64 Job: Throughput regressed -1.9%, CPU regressed +12.65%
>> 128 Job: Throughput improved +5.6%, CPU regressed +11.68%
>>
>>
>> * CPU consumption is an efficiency calculation of usage per MB of
>> throughput.
>>
>> --
>> Karl Rister <kmr@us.ibm.com>
>> IBM Linux/KVM Development Optimization
>>
>>
>


-- 
Karl Rister <kmr@us.ibm.com>
IBM Linux/KVM Development Optimization