From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:39785)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <lists@wiesinger.com>) id 1aHwgH-0008VP-66
	for qemu-devel@nongnu.org; Sat, 09 Jan 2016 11:47:58 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <lists@wiesinger.com>) id 1aHwgC-0002cf-4t
	for qemu-devel@nongnu.org; Sat, 09 Jan 2016 11:47:57 -0500
Received: from vps01.wiesinger.com ([46.36.37.179]:52590)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <lists@wiesinger.com>) id 1aHwgB-0002M4-Qc
	for qemu-devel@nongnu.org; Sat, 09 Jan 2016 11:47:52 -0500
Received: from wiesinger.com (wiesinger.com [62.178.19.14])
	by vps01.wiesinger.com (Postfix) with ESMTPS id 003369F207
	for <qemu-devel@nongnu.org>; Sat,  9 Jan 2016 17:46:43 +0100 (CET)
Received: from [192.168.32.242] (89-104-7-101.customer.bnet.at [89.104.7.101])
	(authenticated bits=0)
	by wiesinger.com (8.15.2/8.15.2) with ESMTPSA id u09GkXFK017865
	(version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO)
	for <qemu-devel@nongnu.org>; Sat, 9 Jan 2016 17:46:38 +0100
References: <5666A553.8070009@wiesinger.com>
From: Gerhard Wiesinger <lists@wiesinger.com>
Message-ID: <56913969.3000305@wiesinger.com>
Date: Sat, 9 Jan 2016 17:46:33 +0100
MIME-Version: 1.0
In-Reply-To: <5666A553.8070009@wiesinger.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: quoted-printable
Subject: Re: [Qemu-devel] QEMU/KVM performance gets worser - high load -
 high interrupts - high context switches
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: qemu-devel@nongnu.org

3On 08.12.2015 10:39, Gerhard Wiesinger wrote:
> Hello,
>
> Yesterday I looked at my munin statistics on my KVM host and I swar=20
> that performance gets worser: load is getting higher, interrupts are=20
> getting higher and are high as well as context switches. VMs and=20
> applications didn't change that way.
>
> You can find graphics at: http://www.wiesinger.com/tmp/kvm/
> Last spike I guess was upgrade from FC22 to FC23 or a kernel update.=20
> And it was even lower on older versions
>
> For me it looks like the high interrupt load and context switches are=20
> the root cause. Interrupts inside the VM are <100, so with 10 VMs I'm=20
> expecting 1000+baseload =3D> <2000, see statistics below.
>
> All VMs are virtio on disk/network except one (IDE/rtl8139).
>
> # Host as well as all guests (except 2 VMs):
> uname -a
> Linux kvm 4.2.6-301.fc23.x86_64 #1 SMP Fri Nov 20 22:22:41 UTC 2015=20
> x86_64 x86_64 x86_64 GNU/Linux
>
> qemu-system-x86-2.4.1-1.fc23.x86_64
>
> Platform:
>
> All VMs have the pc-i440fx-2.4 profile (I upgraded yesterday from=20
> pc-i440fx-2.3 without any change).
>
> Any ideas, anyone having same issues?
>
> Ciao,
> Gerhard
>
> kvm: no VM running
>  r  b   swpd   free   buff  cache   si   so    bi    bo in   cs us sy=20
> id wa st
>  0  0      0 3308516 102408 3798568    0    0     0    12 197 679  0 =20
> 0 99  0  0
>  0  0      0 3308516 102416 3798564    0    0     0    42 197 914  0 =20
> 0 99  1  0
>  0  0      0 3308516 102416 3798568    0    0     0     0 190 791  0 =20
> 0 100  0  0
>  2  0      0 3308484 102416 3798568    0    0     0     0 129 440  0 =20
> 0 100  0  0
>
> kvm: 2 VMs running
>  procs -----------memory---------- ---swap-- -----io---- -system--=20
> ------cpu-----
>  r  b   swpd   free   buff  cache   si   so    bi    bo in   cs us sy=20
> id wa st
>  1  0      0 2641464 103052 3814700    0    0     0     0 2715 5648 =20
> 3  2 95  0  0
>  0  0      0 2641340 103052 3814700    0    0     0     0 2601 5555 =20
> 1  2 97  0  0
>  1  0      0 2641308 103052 3814700    0    0     0     5 2687 5708 =20
> 3  2 95  0  0
>  0  0      0 2640620 103060 3814628    0    0     0    30 2779 5756 =20
> 4  3 93  1  0
>  0  0      0 2640644 103060 3814636    0    0     0     0 2436 5364 =20
> 1  2 97  0  0
>  1  0      0 2640520 103060 3814636    0    0     0   119 2734 5975 =20
> 3  2 95  0  0
>
> kvm: all 10 VMs running
> procs -----------memory---------- ---swap-- -----io---- -system--=20
> ------cpu-----
>  r  b   swpd   free   buff  cache   si   so    bi    bo in   cs us sy=20
> id wa st
>  1  0      0  60408  78892 3371984    0    0     0    85 9015 17357 =20
> 4  9 87  0  0
>  2  0      0  60408  78892 3371968    0    0     0    47 9375 17797 =20
> 9  9 82  0  0
>  0  0      0  60472  78892 3372092    0    0    40    60 8882 17343 =20
> 4  8 86  1  0
>  1  0      0  60316  78892 3372080    0    0     0    59 8863 17517 =20
> 4  8 87  0  0
>  0  0      0  59540  78900 3372092    0    0     0    55 9135 17796 =20
> 8  9 81  1  0
>  0  0      0  59168  78900 3372112    0    0     0    51 8931 17484 =20
> 4  9 87  0  0
>
> cat /proc/cpuinfo
> processor       : 0
> vendor_id       : GenuineIntel
> cpu family      : 6
> model           : 15
> model name      : Intel(R) Core(TM)2 Quad CPU           @ 2.66GHz
> stepping        : 7
>
>


OK, I found what the problem is:
analysis via:
1.) kvm_stat
2.) /usr/bin/perf record -p <PID of qemu>
/usr/bin/perf report -i perf.data > perf-report.txt
cat perf-report.txt
# Overhead  Command          Shared Object            Symbol
# ........  ...............  .......................=20
..........................................
#
     15.75%  qemu-system-x86  [kernel.kallsyms]        [k] __fget
      8.33%  qemu-system-x86  [kernel.kallsyms]        [k]=20
_raw_spin_lock_irqsave
      7.54%  qemu-system-x86  [kernel.kallsyms]        [k] fput
      6.61%  qemu-system-x86  [kernel.kallsyms]        [k] do_sys_poll
      3.60%  qemu-system-x86  [kernel.kallsyms]        [k] __pollwait
      2.20%  qemu-system-x86  [kernel.kallsyms]        [k]=20
_raw_write_unlock_irqrestore
      2.09%  qemu-system-x86  libpthread-2.22.so       [.]=20
pthread_mutex_lock
...

Found also:
1.) https://bugzilla.redhat.com/show_bug.cgi?id=3D949547
2.) https://www.kraxel.org/blog/2014/03/qemu-and-usb-tablet-cpu-consumtio=
n/

After reading that I  did the following:

# On 10 Linux VMs I removed:
# 1.) Serial device itself
# 2.) PCI controller VirtIO serial
# 3.) USB Mouse tablet

# Positive consequences via munin monitoring:
# Reduced fork rate: 40 =3D> 13
# process states: running 15 =3D> <1
# C=C3=9CU temperature: (core dependant) 65-70=C2=B0C =3D> 56-64=C2=B0C
# CPU usage: system: 47% =3D> 15%, user: 76% =3D> 50%
# Context Switches: 20k =3D> 7.5k
# Interrupts: 16k =3D> 9k
# Load average: 2.8 =3D> 1

=3D> back at the level before one year!!!!!!!

Any idea why the serial device/PCI controller and the USB mouse tablet=20
consume so much CPU on latest kernel and/or qemu?

Anyone has same experience?

Thnx.

Ciao,
Gerhard