From mboxrd@z Thu Jan  1 00:00:00 1970
From: Andrew Theurer <habanero@linux.vnet.ibm.com>
Subject: Re: KVM performance vs. Xen
Date: Thu, 30 Apr 2009 07:49:46 -0500
Message-ID: <49F99E6A.3060404@linux.vnet.ibm.com>
References: <49F8672E.5080507@linux.vnet.ibm.com> <49F967AE.4040905@redhat.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: kvm-devel <kvm@vger.kernel.org>
To: Avi Kivity <avi@redhat.com>
Return-path: <kvm-owner@vger.kernel.org>
Received: from e34.co.us.ibm.com ([32.97.110.152]:48115 "EHLO
	e34.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1755669AbZD3Mtu (ORCPT <rfc822;kvm@vger.kernel.org>);
	Thu, 30 Apr 2009 08:49:50 -0400
Received: from d03relay04.boulder.ibm.com (d03relay04.boulder.ibm.com [9.17.195.106])
	by e34.co.us.ibm.com (8.13.1/8.13.1) with ESMTP id n3UClKvm001011
	for <kvm@vger.kernel.org>; Thu, 30 Apr 2009 06:47:20 -0600
Received: from d03av04.boulder.ibm.com (d03av04.boulder.ibm.com [9.17.195.170])
	by d03relay04.boulder.ibm.com (8.13.8/8.13.8/NCO v9.2) with ESMTP id n3UCnnJ6120366
	for <kvm@vger.kernel.org>; Thu, 30 Apr 2009 06:49:50 -0600
Received: from d03av04.boulder.ibm.com (loopback [127.0.0.1])
	by d03av04.boulder.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id n3UCnn3H016469
	for <kvm@vger.kernel.org>; Thu, 30 Apr 2009 06:49:49 -0600
In-Reply-To: <49F967AE.4040905@redhat.com>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

Avi Kivity wrote:
> Andrew Theurer wrote:
>> I wanted to share some performance data for KVM and Xen.  I thought it
>> would be interesting to share some performance results especially
>> compared to Xen, using a more complex situation like heterogeneous
>> server consolidation.
>>
>> The Workload:
>> The workload is one that simulates a consolidation of servers on to a
>> single host.  There are 3 server types: web, imap, and app (j2ee).  In
>> addition, there are other "helper" servers which are also consolidated:
>> a db server, which helps out with the app server, and an nfs server,
>> which helps out with the web server (a portion of the docroot is nfs
>> mounted).  There is also one other server that is simply idle.  All 6
>> servers make up one set.  The first 3 server types are sent requests,
>> which in turn may send requests to the db and nfs helper servers.  The
>> request rate is throttled to produce a fixed amount of work.  In order
>> to increase utilization on the host, more sets of these servers are
>> used.  The clients which send requests also have a response time
>> requirement which is monitored.  The following results have passed the
>> response time requirements.
>>
>
> What's the typical I/O load (disk and network bandwidth) while the 
> tests are running?
This is average thrgoughput:
network:    Tx: 79 MB/sec  Rx: 5 MB/sec
disk:    read: 17 MB/sec  write: 40 MB/sec
>
>> The host hardware:
>> A 2 socket, 8 core Nehalem with SMT, and EPT enabled, lots of disks, 4 x
>> 1 GB Ethenret
>
> CPU time measurements with SMT can vary wildly if the system is not 
> fully loaded.  If the scheduler happens to schedule two threads on a 
> single core, both of these threads will generate less work compared to 
> if they were scheduled on different cores.
Understood.  Even if at low loads, the scheduler does the right thing 
and spreads out to all the cores first, once it goes beyond 50% util, 
the CPU util can climb at a much higher rate (compared to a linear 
increase in work) because it then starts scheduling 2 threads per core, 
and each thread can do less work.  I have always wanted something which 
could more accurately show the utilization of a processor core, but I 
guess we have to use what we have today.  I will run again with SMT off 
to see what we get.
>
>
>> Test Results:
>> The throughput is equal in these tests, as the clients throttle the work
>> (this is assuming you don't run out of a resource on the host).  What's
>> telling is the CPU used to do the same amount of work:
>>
>> Xen:  52.85%
>> KVM:  66.93%
>>
>> So, KVM requires 66.93/52.85 = 26.6% more CPU to do the same amount of
>> work. Here's the breakdown:
>>
>> total    user    nice  system     irq softirq   guest
>> 66.90    7.20    0.00   12.94    0.35    3.39   43.02
>>
>> Comparing guest time to all other busy time, that's a 23.88/43.02 = 55%
>> overhead for virtualization.  I certainly don't expect it to be 0, but
>> 55% seems a bit high.  So, what's the reason for this overhead?  At the
>> bottom is oprofile output of top functions for KVM.  Some observations:
>>
>> 1) I'm seeing about 2.3% in scheduler functions [that I recognize].
>> Does that seems a bit excessive?
>
> Yes, it is.  If there is a lot of I/O, this might be due to the thread 
> pool used for I/O.
I have a older patch which makes a small change to posix_aio_thread.c by 
trying to keep the thread pool size a bit lower than it is today.  I 
will dust that off and see if it helps.
>
>> 2) cpu_physical_memory_rw due to not using preadv/pwritev?
>
> I think both virtio-net and virtio-blk use memcpy().
>
>> 3) vmx_[save|load]_host_state: I take it this is from guest switches?
>
> These are called when you context-switch from a guest, and, much more 
> frequently, when you enter qemu.
>
>> We have 180,000 context switches a second.  Is this more than expected?
>
>
> Way more.  Across 16 logical cpus, this is >10,000 cs/sec/cpu.
>
>> I wonder if schedstats can show why we context switch (need to let
>> someone else run, yielded, waiting on io, etc).
>>
>
> Yes, there is a scheduler tracer, though I have no idea how to operate 
> it.
>
> Do you have kvm_stat logs?
Sorry, I don't, but I'll run that next time.  BTW, I did not notice a 
batch/log mode the last time I ram kvm_stat.  Or maybe it was not 
obvious to me.  Is there an ideal way to run kvm_stat without a curses 
like output?

-Andrew