From mboxrd@z Thu Jan  1 00:00:00 1970
From: Andrew Theurer <habanero@linux.vnet.ibm.com>
Subject: Re: KVM performance vs. Xen
Date: Thu, 30 Apr 2009 08:44:15 -0500
Message-ID: <49F9AB2F.4020505@linux.vnet.ibm.com>
References: <49F8672E.5080507@linux.vnet.ibm.com> <49F967AE.4040905@redhat.com> <49F99E6A.3060404@linux.vnet.ibm.com> <49F9A160.3030609@redhat.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: kvm-devel <kvm@vger.kernel.org>
To: Avi Kivity <avi@redhat.com>
Return-path: <kvm-owner@vger.kernel.org>
Received: from e39.co.us.ibm.com ([32.97.110.160]:42256 "EHLO
	e39.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1753656AbZD3NoS (ORCPT <rfc822;kvm@vger.kernel.org>);
	Thu, 30 Apr 2009 09:44:18 -0400
Received: from d03relay02.boulder.ibm.com (d03relay02.boulder.ibm.com [9.17.195.227])
	by e39.co.us.ibm.com (8.13.1/8.13.1) with ESMTP id n3UDeqFt024901
	for <kvm@vger.kernel.org>; Thu, 30 Apr 2009 07:40:52 -0600
Received: from d03av04.boulder.ibm.com (d03av04.boulder.ibm.com [9.17.195.170])
	by d03relay02.boulder.ibm.com (8.13.8/8.13.8/NCO v9.2) with ESMTP id n3UDiI6h220406
	for <kvm@vger.kernel.org>; Thu, 30 Apr 2009 07:44:18 -0600
Received: from d03av04.boulder.ibm.com (loopback [127.0.0.1])
	by d03av04.boulder.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id n3UDiIqk024721
	for <kvm@vger.kernel.org>; Thu, 30 Apr 2009 07:44:18 -0600
In-Reply-To: <49F9A160.3030609@redhat.com>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

Avi Kivity wrote:
> Andrew Theurer wrote:
>> Avi Kivity wrote:
>>>>
>>>
>>> What's the typical I/O load (disk and network bandwidth) while the 
>>> tests are running?
>> This is average thrgoughput:
>> network:    Tx: 79 MB/sec  Rx: 5 MB/sec
>
> MB as in Byte or Mb as in bit?
Byte.  There are 4 x 1 Gb adapters, each handling about 20 MB/sec or 160 
Mbit/sec.
>
>> disk:    read: 17 MB/sec  write: 40 MB/sec
>
> This could definitely cause the extra load, especially if it's many 
> small requests (compared to a few large ones).
I don't have the request sizes at my fingertips, but we have to use a 
lot of disks to support this I/O, so I think it's safe to assume there 
are a lot more requests than a simple large sequential read/write.
>
>>>> The host hardware:
>>>> A 2 socket, 8 core Nehalem with SMT, and EPT enabled, lots of 
>>>> disks, 4 x
>>>> 1 GB Ethenret
>>>
>>> CPU time measurements with SMT can vary wildly if the system is not 
>>> fully loaded.  If the scheduler happens to schedule two threads on a 
>>> single core, both of these threads will generate less work compared 
>>> to if they were scheduled on different cores.
>> Understood.  Even if at low loads, the scheduler does the right thing 
>> and spreads out to all the cores first, once it goes beyond 50% util, 
>> the CPU util can climb at a much higher rate (compared to a linear 
>> increase in work) because it then starts scheduling 2 threads per 
>> core, and each thread can do less work.  I have always wanted 
>> something which could more accurately show the utilization of a 
>> processor core, but I guess we have to use what we have today.  I 
>> will run again with SMT off to see what we get.
>
> On the other hand, without SMT you will get to overcommit much faster, 
> so you'll have scheduling artifacts.  Unfortunately there's no good 
> answer here (except to improve the SMT scheduler).
>
>>> Yes, it is.  If there is a lot of I/O, this might be due to the 
>>> thread pool used for I/O.
>> I have a older patch which makes a small change to posix_aio_thread.c 
>> by trying to keep the thread pool size a bit lower than it is today.  
>> I will dust that off and see if it helps.
>
> Really, I think linux-aio support can help here.
Yes, I think that would work for real block devices, but would that help 
for files?  I am using real block devices right now, but it would be 
nice to also see a benefit for files in a file-system.  Or maybe I am 
mis-understanding this, and linux-aio can be used on files?

-Andrew

>
>>>
>>> Yes, there is a scheduler tracer, though I have no idea how to 
>>> operate it.
>>>
>>> Do you have kvm_stat logs?
>> Sorry, I don't, but I'll run that next time.  BTW, I did not notice a 
>> batch/log mode the last time I ram kvm_stat.  Or maybe it was not 
>> obvious to me.  Is there an ideal way to run kvm_stat without a 
>> curses like output?
>
> You're probably using an ancient version:
>
> $ kvm_stat --help
> Usage: kvm_stat [options]
>
> Options:
>  -h, --help            show this help message and exit
>  -1, --once, --batch   run in batch mode for one second
>  -l, --log             run in logging mode (like vmstat)
>  -f FIELDS, --fields=FIELDS
>                        fields to display (regex)
>
>