KVM with hugepages generate huge load with two guests

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

* KVM with hugepages generate huge load with two guests
@ 2010-09-30  9:07 Dmitry Golubev
  2010-10-01 22:30 ` Marcelo Tosatti
  2010-10-03  9:28 ` Avi Kivity
  0 siblings, 2 replies; 18+ messages in thread
From: Dmitry Golubev @ 2010-09-30  9:07 UTC (permalink / raw)
  To: kvm

Hi,

I am not sure what's really happening, but every few hours
(unpredictable) two virtual machines (Linux 2.6.32) start to generate
huge cpu loads. It looks like some kind of loop is unable to complete
or something...

So the idea is:

1. I have two linux 2.6.32 x64 (openvz, proxmox project) guests
running on linux 2.6.35 x64 (ubuntu maverick) host with a Q6600
Core2Quad on qemu-kvm 0.12.5 and libvirt 0.8.3 and another one small
32bit linux virtual machine (16MB of ram) with a router inside (i
doubt it contributes to the problem).

2. All these machines use hufetlbfs. The server has 8GB of RAM, I
reserved 3696 huge pages (page size is 2MB) on the server, and I am
running the main guests each having 3550MB of virtual memory. The
third guest, as I wrote before, takes 16MB of virtual memory.

3. Once run, the guests reserve huge pages for themselves normally. As
mem-prealloc is default, they grab all the memory they should have,
leaving 6 pages unreserved (HugePages_Free - HugePages_Rsvd = 6) all
times - so as I understand they should not want to get any more,
right?

4. All virtual machines run perfectly normal without any disturbances
for few hours. They do not, however, use all their memory, so maybe
the issue arises when they pass some kind of a threshold.

5. At some point of time both guests exhibit cpu load over the top
(16-24). At the same time, host works perfectly well, showing load of
8 and that both kvm processes use CPU equally and fully. This point of
time is unpredictable - it can be anything from one to twenty hours,
but it will be less than a day. Sometimes the load disappears in a
moment, but usually it stays like that, and everything works extremely
slow (even a 'ps' command executes some 2-5 minutes).

6. If I am patient, I can start rebooting the gueat systems - once
they have restarted, everything returns to normal. If I destroy one of
the guests (virsh destroy), the other one starts working normally at
once (!).

I am relatively new to kvm and I am absolutely lost here. I have not
experienced such problems before, but recently I upgraded from ubuntu
lucid (I think it was linux 2.6.32, qemukvm 0.12.3 and libvirt 0.7.5)
and started to use hugepages. These two virtual machines are not
normally run on the same host system (i have a corosync/pacemaker
cluster with drbd storage), but when one of the hosts is not
abailable, they start running on the same host. That is the reason I
have not noticed this earlier.

Unfortunately, I don't have any spare hardware to experiment and this
is a production system, so my debugging options are rather limited.

Do you have any ideas, what could be wrong?

Thanks,
Dmitry

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: KVM with hugepages generate huge load with two guests
  2010-09-30  9:07 KVM with hugepages generate huge load with two guests Dmitry Golubev
@ 2010-10-01 22:30 ` Marcelo Tosatti
  2010-10-01 23:50   ` Dmitry Golubev
       [not found]   ` <AANLkTinJDLoWjiXwX1MOpuVf4RUuGE3qjrawS=d+5Swu@mail.gmail.com>
  2010-10-03  9:28 ` Avi Kivity
  1 sibling, 2 replies; 18+ messages in thread
From: Marcelo Tosatti @ 2010-10-01 22:30 UTC (permalink / raw)
  To: Dmitry Golubev; +Cc: kvm

On Thu, Sep 30, 2010 at 12:07:15PM +0300, Dmitry Golubev wrote:
> Hi,
> 
> I am not sure what's really happening, but every few hours
> (unpredictable) two virtual machines (Linux 2.6.32) start to generate
> huge cpu loads. It looks like some kind of loop is unable to complete
> or something...
> 
> So the idea is:
> 
> 1. I have two linux 2.6.32 x64 (openvz, proxmox project) guests
> running on linux 2.6.35 x64 (ubuntu maverick) host with a Q6600
> Core2Quad on qemu-kvm 0.12.5 and libvirt 0.8.3 and another one small
> 32bit linux virtual machine (16MB of ram) with a router inside (i
> doubt it contributes to the problem).
> 
> 2. All these machines use hufetlbfs. The server has 8GB of RAM, I
> reserved 3696 huge pages (page size is 2MB) on the server, and I am
> running the main guests each having 3550MB of virtual memory. The
> third guest, as I wrote before, takes 16MB of virtual memory.
> 
> 3. Once run, the guests reserve huge pages for themselves normally. As
> mem-prealloc is default, they grab all the memory they should have,
> leaving 6 pages unreserved (HugePages_Free - HugePages_Rsvd = 6) all
> times - so as I understand they should not want to get any more,
> right?
> 
> 4. All virtual machines run perfectly normal without any disturbances
> for few hours. They do not, however, use all their memory, so maybe
> the issue arises when they pass some kind of a threshold.
> 
> 5. At some point of time both guests exhibit cpu load over the top
> (16-24). At the same time, host works perfectly well, showing load of
> 8 and that both kvm processes use CPU equally and fully. This point of
> time is unpredictable - it can be anything from one to twenty hours,
> but it will be less than a day. Sometimes the load disappears in a
> moment, but usually it stays like that, and everything works extremely
> slow (even a 'ps' command executes some 2-5 minutes).
> 
> 6. If I am patient, I can start rebooting the gueat systems - once
> they have restarted, everything returns to normal. If I destroy one of
> the guests (virsh destroy), the other one starts working normally at
> once (!).
> 
> I am relatively new to kvm and I am absolutely lost here. I have not
> experienced such problems before, but recently I upgraded from ubuntu
> lucid (I think it was linux 2.6.32, qemukvm 0.12.3 and libvirt 0.7.5)
> and started to use hugepages. These two virtual machines are not
> normally run on the same host system (i have a corosync/pacemaker
> cluster with drbd storage), but when one of the hosts is not
> abailable, they start running on the same host. That is the reason I
> have not noticed this earlier.
> 
> Unfortunately, I don't have any spare hardware to experiment and this
> is a production system, so my debugging options are rather limited.
> 
> Do you have any ideas, what could be wrong?

Is there swapping activity on the host when this happens? 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: KVM with hugepages generate huge load with two guests
  2010-10-01 22:30 ` Marcelo Tosatti
@ 2010-10-01 23:50   ` Dmitry Golubev
  2010-10-02  0:56     ` Dmitry Golubev
  2010-10-02  8:03     ` Michael Tokarev
       [not found]   ` <AANLkTinJDLoWjiXwX1MOpuVf4RUuGE3qjrawS=d+5Swu@mail.gmail.com>
  1 sibling, 2 replies; 18+ messages in thread
From: Dmitry Golubev @ 2010-10-01 23:50 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kvm

Hi,

Thanks for reply. Well, although there is plenty of RAM left (about
100MB), some swap space was used during the operation:

Mem:   8193472k total,  8089788k used,   103684k free,     5768k buffers
Swap: 11716412k total,    36636k used, 11679776k free,   103112k cached

I am not sure why, though. Are you saying that there are bursts of
memory usage that push some pages to swap and they are not unswapped
although used? I will try to replicate the problem now and send you
some better printout from the moment the problem happens. I have not
noticed anything unusual when I was watching the system - there was
plenty of RAM free and a few megabytes in swap... Is there any kind of
check I can try during the problem occurring? Or should I free
50-100MB from hugepages and the system shall be stable again?

Thanks,
Dmitry

On Sat, Oct 2, 2010 at 1:30 AM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> On Thu, Sep 30, 2010 at 12:07:15PM +0300, Dmitry Golubev wrote:
>> Hi,
>>
>> I am not sure what's really happening, but every few hours
>> (unpredictable) two virtual machines (Linux 2.6.32) start to generate
>> huge cpu loads. It looks like some kind of loop is unable to complete
>> or something...
>>
>> So the idea is:
>>
>> 1. I have two linux 2.6.32 x64 (openvz, proxmox project) guests
>> running on linux 2.6.35 x64 (ubuntu maverick) host with a Q6600
>> Core2Quad on qemu-kvm 0.12.5 and libvirt 0.8.3 and another one small
>> 32bit linux virtual machine (16MB of ram) with a router inside (i
>> doubt it contributes to the problem).
>>
>> 2. All these machines use hufetlbfs. The server has 8GB of RAM, I
>> reserved 3696 huge pages (page size is 2MB) on the server, and I am
>> running the main guests each having 3550MB of virtual memory. The
>> third guest, as I wrote before, takes 16MB of virtual memory.
>>
>> 3. Once run, the guests reserve huge pages for themselves normally. As
>> mem-prealloc is default, they grab all the memory they should have,
>> leaving 6 pages unreserved (HugePages_Free - HugePages_Rsvd = 6) all
>> times - so as I understand they should not want to get any more,
>> right?
>>
>> 4. All virtual machines run perfectly normal without any disturbances
>> for few hours. They do not, however, use all their memory, so maybe
>> the issue arises when they pass some kind of a threshold.
>>
>> 5. At some point of time both guests exhibit cpu load over the top
>> (16-24). At the same time, host works perfectly well, showing load of
>> 8 and that both kvm processes use CPU equally and fully. This point of
>> time is unpredictable - it can be anything from one to twenty hours,
>> but it will be less than a day. Sometimes the load disappears in a
>> moment, but usually it stays like that, and everything works extremely
>> slow (even a 'ps' command executes some 2-5 minutes).
>>
>> 6. If I am patient, I can start rebooting the gueat systems - once
>> they have restarted, everything returns to normal. If I destroy one of
>> the guests (virsh destroy), the other one starts working normally at
>> once (!).
>>
>> I am relatively new to kvm and I am absolutely lost here. I have not
>> experienced such problems before, but recently I upgraded from ubuntu
>> lucid (I think it was linux 2.6.32, qemukvm 0.12.3 and libvirt 0.7.5)
>> and started to use hugepages. These two virtual machines are not
>> normally run on the same host system (i have a corosync/pacemaker
>> cluster with drbd storage), but when one of the hosts is not
>> abailable, they start running on the same host. That is the reason I
>> have not noticed this earlier.
>>
>> Unfortunately, I don't have any spare hardware to experiment and this
>> is a production system, so my debugging options are rather limited.
>>
>> Do you have any ideas, what could be wrong?
>
> Is there swapping activity on the host when this happens?
>
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: KVM with hugepages generate huge load with two guests
  2010-10-01 23:50   ` Dmitry Golubev
@ 2010-10-02  0:56     ` Dmitry Golubev
  2010-10-02  8:03     ` Michael Tokarev
  1 sibling, 0 replies; 18+ messages in thread
From: Dmitry Golubev @ 2010-10-02  0:56 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kvm

OK, I have repeated the problem. The two machines were working fine
for few hours without some services running (these would take up some
gigabyte additionally in total), I ran these services again and some
40 minutes later the problem reappeared (may be a coincidence, though,
but I don't think so). From top command output it looks like this:

top - 03:38:10 up 2 days, 20:08,  1 user,  load average: 9.60, 6.92, 5.36
Tasks: 143 total,   3 running, 140 sleeping,   0 stopped,   0 zombie
Cpu(s): 85.7%us,  4.2%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi, 10.0%si,  0.0%st
Mem:   8193472k total,  8056700k used,   136772k free,     4912k buffers
Swap: 11716412k total,    64884k used, 11651528k free,    55640k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
21306 libvirt-  20   0 3781m  10m 2408 S  190  0.1  31:36.09 kvm
 4984 libvirt-  20   0 3771m  19m 1440 S  180  0.2 390:30.04 kvm

Comparing to the previous shot i sent before (that was taken few hours
ago), and you will not see much difference in my opinion.

Note that I have 8GB of RAM and totally both VMs take up 7GB. There is
nothing else running on the server, except the VMs and cluster
software (drbd, pacemaker etc). Right now the drbd sync process is
taking some cpu resources - that is why the libvirt processes do not
show as 200% (physically, it is a quad-core processor). Is almost 1GB
really not enough for KVM to support two 3.5GB guests? I see 136MB of
free memory right now - it is not even used...

Thanks,
Dmitry

On Sat, Oct 2, 2010 at 2:50 AM, Dmitry Golubev <lastguru@gmail.com> wrote:
> Hi,
>
> Thanks for reply. Well, although there is plenty of RAM left (about
> 100MB), some swap space was used during the operation:
>
> Mem:   8193472k total,  8089788k used,   103684k free,     5768k buffers
> Swap: 11716412k total,    36636k used, 11679776k free,   103112k cached
>
> I am not sure why, though. Are you saying that there are bursts of
> memory usage that push some pages to swap and they are not unswapped
> although used? I will try to replicate the problem now and send you
> some better printout from the moment the problem happens. I have not
> noticed anything unusual when I was watching the system - there was
> plenty of RAM free and a few megabytes in swap... Is there any kind of
> check I can try during the problem occurring? Or should I free
> 50-100MB from hugepages and the system shall be stable again?
>
> Thanks,
> Dmitry
>
> On Sat, Oct 2, 2010 at 1:30 AM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
>> On Thu, Sep 30, 2010 at 12:07:15PM +0300, Dmitry Golubev wrote:
>>> Hi,
>>>
>>> I am not sure what's really happening, but every few hours
>>> (unpredictable) two virtual machines (Linux 2.6.32) start to generate
>>> huge cpu loads. It looks like some kind of loop is unable to complete
>>> or something...
>>>
>>> So the idea is:
>>>
>>> 1. I have two linux 2.6.32 x64 (openvz, proxmox project) guests
>>> running on linux 2.6.35 x64 (ubuntu maverick) host with a Q6600
>>> Core2Quad on qemu-kvm 0.12.5 and libvirt 0.8.3 and another one small
>>> 32bit linux virtual machine (16MB of ram) with a router inside (i
>>> doubt it contributes to the problem).
>>>
>>> 2. All these machines use hufetlbfs. The server has 8GB of RAM, I
>>> reserved 3696 huge pages (page size is 2MB) on the server, and I am
>>> running the main guests each having 3550MB of virtual memory. The
>>> third guest, as I wrote before, takes 16MB of virtual memory.
>>>
>>> 3. Once run, the guests reserve huge pages for themselves normally. As
>>> mem-prealloc is default, they grab all the memory they should have,
>>> leaving 6 pages unreserved (HugePages_Free - HugePages_Rsvd = 6) all
>>> times - so as I understand they should not want to get any more,
>>> right?
>>>
>>> 4. All virtual machines run perfectly normal without any disturbances
>>> for few hours. They do not, however, use all their memory, so maybe
>>> the issue arises when they pass some kind of a threshold.
>>>
>>> 5. At some point of time both guests exhibit cpu load over the top
>>> (16-24). At the same time, host works perfectly well, showing load of
>>> 8 and that both kvm processes use CPU equally and fully. This point of
>>> time is unpredictable - it can be anything from one to twenty hours,
>>> but it will be less than a day. Sometimes the load disappears in a
>>> moment, but usually it stays like that, and everything works extremely
>>> slow (even a 'ps' command executes some 2-5 minutes).
>>>
>>> 6. If I am patient, I can start rebooting the gueat systems - once
>>> they have restarted, everything returns to normal. If I destroy one of
>>> the guests (virsh destroy), the other one starts working normally at
>>> once (!).
>>>
>>> I am relatively new to kvm and I am absolutely lost here. I have not
>>> experienced such problems before, but recently I upgraded from ubuntu
>>> lucid (I think it was linux 2.6.32, qemukvm 0.12.3 and libvirt 0.7.5)
>>> and started to use hugepages. These two virtual machines are not
>>> normally run on the same host system (i have a corosync/pacemaker
>>> cluster with drbd storage), but when one of the hosts is not
>>> abailable, they start running on the same host. That is the reason I
>>> have not noticed this earlier.
>>>
>>> Unfortunately, I don't have any spare hardware to experiment and this
>>> is a production system, so my debugging options are rather limited.
>>>
>>> Do you have any ideas, what could be wrong?
>>
>> Is there swapping activity on the host when this happens?
>>
>>
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: KVM with hugepages generate huge load with two guests
  2010-10-01 23:50   ` Dmitry Golubev
  2010-10-02  0:56     ` Dmitry Golubev
@ 2010-10-02  8:03     ` Michael Tokarev
  1 sibling, 0 replies; 18+ messages in thread
From: Michael Tokarev @ 2010-10-02  8:03 UTC (permalink / raw)
  To: Dmitry Golubev; +Cc: Marcelo Tosatti, kvm

02.10.2010 03:50, Dmitry Golubev wrote:
> Hi,
> 
> Thanks for reply. Well, although there is plenty of RAM left (about
> 100MB), some swap space was used during the operation:
> 
> Mem:   8193472k total,  8089788k used,   103684k free,     5768k buffers
> Swap: 11716412k total,    36636k used, 11679776k free,   103112k cached

If you want to see swapping, run vmstat with, say, 5-second interval:
 $ vmstat 5

Amount of swap used is interesting, but amount of swapins/swapouts
per secound is much more so.

JFYI.

/mjt

^ permalink raw reply	[flat|nested] 18+ messages in thread

[parent not found: <AANLkTinJDLoWjiXwX1MOpuVf4RUuGE3qjrawS=d+5Swu@mail.gmail.com>]

* Re: KVM with hugepages generate huge load with two guests
       [not found]   ` <AANLkTinJDLoWjiXwX1MOpuVf4RUuGE3qjrawS=d+5Swu@mail.gmail.com>
@ 2010-11-17  2:19     ` Dmitry Golubev
  2010-11-18  6:53       ` Dmitry Golubev
  0 siblings, 1 reply; 18+ messages in thread
From: Dmitry Golubev @ 2010-11-17  2:19 UTC (permalink / raw)
  To: kvm

Hi,

Maybe you remember that I wrote few weeks ago about KVM cpu load
problem with hugepages. The problem was lost hanging, however I have
now some new information. So the description remains, however I have
decreased both guest memory and the amount of hugepages:

Ram = 8GB, hugepages = 3546

Total of 2 virual machines:
1. router with 32MB of RAM (hugepages) and 1VCPU
2. linux guest with 3500MB of RAM (hugepages) and 4VCPU

Everything works fine until I start the second linux guest with the
same 3500MB of guest RAM also in hugepages and also 4VCPU. The rest of
description is the same as before: after a while the host shows
loadaverage of about 8 (on a Core2Quad) and it seems that both big
guests consume exactly the same amount of resources. The hosts seems
responsive though. Inside the guests, however, things are not so good
- the load sky rockets to at least 20. Guests are not responsive and
even a 'ps' executes inappropriately slow (may take few minutes -
here, however, load builds up and it seems that machine becomes slower
with time, unlike host, which shows the jump in resource consumption
instantly). It also seem that the more guests uses memory, the faster
the problem appers. Still at least a gig of RAM is free on each guest
and there is no swap activity inside the guest.

The most important thing - why I went back and quoted older message
than the last one, is that there is no more swap activity on host, so
the previous track of thought may also be wrong and I returned to the
beginning. There is plenty of RAM now and swap on host is always on 0
as seen in 'top'. And there is 100% cpu load, equally shared between
the two large guests. To stop the load I can destroy either large
guest. Additionally, I have just discovered that suspending any large
guest works as well. Moreover, after resume, the load does not come
back for a while. Both methods stop the high load instantly (faster
than a second). As you were asking for a 'top' inside the guest, here
it is:

top - 03:27:27 up 42 min,  1 user,  load average: 18.37, 7.68, 3.12
Tasks: 197 total,  23 running, 174 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.0%us, 89.2%sy,  0.0%ni, 10.5%id,  0.0%wa,  0.0%hi,  0.2%si,  0.0%st
Mem:   3510912k total,  1159760k used,  2351152k free,    62568k buffers
Swap:  4194296k total,        0k used,  4194296k free,   484492k cached
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
12303 root      20   0     0    0    0 R  100  0.0   0:33.72
vpsnetclean
11772 99        20   0  149m  11m 2104 R   82  0.3   0:15.10 httpd
10906 99        20   0  149m  11m 2124 R   73  0.3   0:11.52 httpd
10247 99        20   0  149m  11m 2128 R   31  0.3   0:05.39 httpd
 3916 root      20   0 86468  11m 1476 R   16  0.3   0:15.14
cpsrvd-ssl
10919 99        20   0  149m  11m 2124 R    8  0.3   0:03.43 httpd
11296 99        20   0  149m  11m 2112 R    7  0.3   0:03.26 httpd
12265 99        20   0  149m  11m 2088 R    7  0.3   0:08.01 httpd
12317 root      20   0 99.6m 1384  716 R    7  0.0   0:06.57 crond
12326 503       20   0  8872   96   72 R    7  0.0   0:01.13 php
 3634 root      20   0 74804 1176  596 R    6  0.0   0:12.15 crond
11864 32005     20   0 87224  13m 2528 R    6  0.4   0:30.84
cpsrvd-ssl
12275 root      20   0 30628 9976 1364 R    6  0.3   0:24.68 cpgs_chk
11305 99        20   0  149m  11m 2104 R    6  0.3   0:02.53 httpd
12278 root      20   0  8808 1328  968 R    6  0.0   0:04.63 sim
 1534 root      20   0     0    0    0 S    6  0.0   0:03.29
flush-254:2
 3626 root      20   0  149m  13m 5324 R    6  0.4   0:27.62 httpd
12279 32008     20   0 87472 7668 2480 R    6  0.2   0:27.63
munin-update
10243 99        20   0  149m  11m 2128 R    5  0.3   0:08.47 httpd
12321 root      20   0 99.6m 1460  792 R    5  0.0   0:07.43 crond
12325 root      20   0 74804  672   92 R    5  0.0   0:00.76 crond
 1531 root      20   0     0    0    0 S    2  0.0   0:02.26 kjournald
    1 root      20   0 10316  756  620 S    0  0.0   0:02.10 init
    2 root      20   0     0    0    0 S    0  0.0   0:00.01 kthreadd
    3 root      RT   0     0    0    0 S    0  0.0   0:01.08
migration/0
    4 root      20   0     0    0    0 S    0  0.0   0:00.02
ksoftirqd/0
    5 root      RT   0     0    0    0 S    0  0.0   0:00.00
watchdog/0
    6 root      RT   0     0    0    0 S    0  0.0   0:00.47
migration/1
    7 root      20   0     0    0    0 S    0  0.0   0:00.03
ksoftirqd/1
    8 root      RT   0     0    0    0 S    0  0.0   0:00.00
watchdog/1


The tasks are changing in the 'top' view, so it is nothing like a
single task hanging - it is more like a machine working off a swap.
The problem is, however that according to vmstat, there is no swap
activity during this time. Should I try to decrease RAM I give to my
guests even more? Is it too much to have 3 guests with hugepages?
Should I try something else? Unfortunately it is a production system
and I can't play with it very much.

Here is 'top' on the host:

top - 03:32:12 up 25 days, 23:38,  2 users,  load average: 8.50, 5.07, 10.39
Tasks: 133 total,   1 running, 132 sleeping,   0 stopped,   0 zombie
Cpu(s): 99.1%us,  0.7%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.2%si,  0.0%st
Mem:   8193472k total,  8071776k used,   121696k free,    45296k buffers
Swap: 11716412k total,        0k used, 11714844k free,   197236k cached
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 8426 libvirt-  20   0 3771m  27m 3904 S  199  0.3  10:28.33 kvm
 8374 libvirt-  20   0 3815m  32m 3908 S  199  0.4   8:11.53 kvm
 1557 libvirt-  20   0  225m 7720 2092 S    1  0.1 436:54.45 kvm
   72 root      20   0     0    0    0 S    0  0.0   6:22.54
kondemand/3
  379 root      20   0     0    0    0 S    0  0.0  58:20.99 md3_raid5
    1 root      20   0 23768 1944 1228 S    0  0.0   0:00.95 init
    2 root      20   0     0    0    0 S    0  0.0   0:00.24 kthreadd
    3 root      20   0     0    0    0 S    0  0.0   0:12.66
ksoftirqd/0
    4 root      RT   0     0    0    0 S    0  0.0   0:07.58
migration/0
    5 root      RT   0     0    0    0 S    0  0.0   0:00.00
watchdog/0
    6 root      RT   0     0    0    0 S    0  0.0   0:15.05
migration/1
    7 root      20   0     0    0    0 S    0  0.0   0:19.64
ksoftirqd/1
    8 root      RT   0     0    0    0 S    0  0.0   0:00.00
watchdog/1
    9 root      RT   0     0    0    0 S    0  0.0   0:07.21
migration/2
   10 root      20   0     0    0    0 S    0  0.0   0:41.74
ksoftirqd/2
   11 root      RT   0     0    0    0 S    0  0.0   0:00.00
watchdog/2
   12 root      RT   0     0    0    0 S    0  0.0   0:13.62
migration/3
   13 root      20   0     0    0    0 S    0  0.0   0:24.63
ksoftirqd/3
   14 root      RT   0     0    0    0 S    0  0.0   0:00.00
watchdog/3
   15 root      20   0     0    0    0 S    0  0.0   1:17.11 events/0
   16 root      20   0     0    0    0 S    0  0.0   1:33.30 events/1
   17 root      20   0     0    0    0 S    0  0.0   4:15.28 events/2
   18 root      20   0     0    0    0 S    0  0.0   1:13.49 events/3
   19 root      20   0     0    0    0 S    0  0.0   0:00.00 cpuset
   20 root      20   0     0    0    0 S    0  0.0   0:00.02 khelper
   21 root      20   0     0    0    0 S    0  0.0   0:00.00 netns
   22 root      20   0     0    0    0 S    0  0.0   0:00.00 async/mgr
   23 root      20   0     0    0    0 S    0  0.0   0:00.00 pm
   25 root      20   0     0    0    0 S    0  0.0   0:02.47
sync_supers
   26 root      20   0     0    0    0 S    0  0.0   0:03.86
bdi-default


Please help...

Thanks,
Dmitry

On Sat, Oct 2, 2010 at 1:30 AM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
>
> On Thu, Sep 30, 2010 at 12:07:15PM +0300, Dmitry Golubev wrote:
> > Hi,
> >
> > I am not sure what's really happening, but every few hours
> > (unpredictable) two virtual machines (Linux 2.6.32) start to generate
> > huge cpu loads. It looks like some kind of loop is unable to complete
> > or something...
> >
> > So the idea is:
> >
> > 1. I have two linux 2.6.32 x64 (openvz, proxmox project) guests
> > running on linux 2.6.35 x64 (ubuntu maverick) host with a Q6600
> > Core2Quad on qemu-kvm 0.12.5 and libvirt 0.8.3 and another one small
> > 32bit linux virtual machine (16MB of ram) with a router inside (i
> > doubt it contributes to the problem).
> >
> > 2. All these machines use hufetlbfs. The server has 8GB of RAM, I
> > reserved 3696 huge pages (page size is 2MB) on the server, and I am
> > running the main guests each having 3550MB of virtual memory. The
> > third guest, as I wrote before, takes 16MB of virtual memory.
> >
> > 3. Once run, the guests reserve huge pages for themselves normally. As
> > mem-prealloc is default, they grab all the memory they should have,
> > leaving 6 pages unreserved (HugePages_Free - HugePages_Rsvd = 6) all
> > times - so as I understand they should not want to get any more,
> > right?
> >
> > 4. All virtual machines run perfectly normal without any disturbances
> > for few hours. They do not, however, use all their memory, so maybe
> > the issue arises when they pass some kind of a threshold.
> >
> > 5. At some point of time both guests exhibit cpu load over the top
> > (16-24). At the same time, host works perfectly well, showing load of
> > 8 and that both kvm processes use CPU equally and fully. This point of
> > time is unpredictable - it can be anything from one to twenty hours,
> > but it will be less than a day. Sometimes the load disappears in a
> > moment, but usually it stays like that, and everything works extremely
> > slow (even a 'ps' command executes some 2-5 minutes).
> >
> > 6. If I am patient, I can start rebooting the gueat systems - once
> > they have restarted, everything returns to normal. If I destroy one of
> > the guests (virsh destroy), the other one starts working normally at
> > once (!).
> >
> > I am relatively new to kvm and I am absolutely lost here. I have not
> > experienced such problems before, but recently I upgraded from ubuntu
> > lucid (I think it was linux 2.6.32, qemukvm 0.12.3 and libvirt 0.7.5)
> > and started to use hugepages. These two virtual machines are not
> > normally run on the same host system (i have a corosync/pacemaker
> > cluster with drbd storage), but when one of the hosts is not
> > abailable, they start running on the same host. That is the reason I
> > have not noticed this earlier.
> >
> > Unfortunately, I don't have any spare hardware to experiment and this
> > is a production system, so my debugging options are rather limited.
> >
> > Do you have any ideas, what could be wrong?
>
> Is there swapping activity on the host when this happens?
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: KVM with hugepages generate huge load with two guests
  2010-11-17  2:19     ` Dmitry Golubev
@ 2010-11-18  6:53       ` Dmitry Golubev
  2010-11-21  0:24         ` Dmitry Golubev
  0 siblings, 1 reply; 18+ messages in thread
From: Dmitry Golubev @ 2010-11-18  6:53 UTC (permalink / raw)
  To: kvm; +Cc: Marcelo Tosatti, Avi Kivity

Hi,

Sorry to bother you again. I have more info:

> 1. router with 32MB of RAM (hugepages) and 1VCPU
...
> Is it too much to have 3 guests with hugepages?

OK, this router is also out of equation - I disabled hugepages for it.
There should be also additional pages available to guests because of
that. I think this should be pretty reproducible... Two exactly
similar 64bit Linux 2.6.32 guests with 3500MB of virtual RAM and 4
VCPU each, running on a Core2Quad (4 real cores) machine with 8GB of
RAM and 3546 2MB hugepages on a 64bit Linux 2.6.35 host (libvirt
0.8.3) from Ubuntu Maverick.

Still no swapping and the effect is pretty much the same: one guest
runs well, two guests work for some minutes - then slow down few
hundred times, showing huge load both inside (unlimited rapid growth
of loadaverage) and outside (host load is not making it unresponsive
though - but loaded to the max). Load growth on host is instant and
finite ('r' column change indicate this sudden rise):

# vmstat 5
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 1  3      0 194220  30680  76712    0    0   319    28 2633 1960  6  6 67 20
 1  2      0 193776  30680  76712    0    0     4   231 55081 78491  3 39 17 41
10  1      0 185508  30680  76712    0    0     4    87 53042 34212 55 27  9  9
12  0      0 185180  30680  76712    0    0     2    95 41007 21990 84 16  0  0

Thanks,
Dmitry

On Wed, Nov 17, 2010 at 4:19 AM, Dmitry Golubev <lastguru@gmail.com> wrote:
> Hi,
>
> Maybe you remember that I wrote few weeks ago about KVM cpu load
> problem with hugepages. The problem was lost hanging, however I have
> now some new information. So the description remains, however I have
> decreased both guest memory and the amount of hugepages:
>
> Ram = 8GB, hugepages = 3546
>
> Total of 2 virual machines:
> 1. router with 32MB of RAM (hugepages) and 1VCPU
> 2. linux guest with 3500MB of RAM (hugepages) and 4VCPU
>
> Everything works fine until I start the second linux guest with the
> same 3500MB of guest RAM also in hugepages and also 4VCPU. The rest of
> description is the same as before: after a while the host shows
> loadaverage of about 8 (on a Core2Quad) and it seems that both big
> guests consume exactly the same amount of resources. The hosts seems
> responsive though. Inside the guests, however, things are not so good
> - the load sky rockets to at least 20. Guests are not responsive and
> even a 'ps' executes inappropriately slow (may take few minutes -
> here, however, load builds up and it seems that machine becomes slower
> with time, unlike host, which shows the jump in resource consumption
> instantly). It also seem that the more guests uses memory, the faster
> the problem appers. Still at least a gig of RAM is free on each guest
> and there is no swap activity inside the guest.
>
> The most important thing - why I went back and quoted older message
> than the last one, is that there is no more swap activity on host, so
> the previous track of thought may also be wrong and I returned to the
> beginning. There is plenty of RAM now and swap on host is always on 0
> as seen in 'top'. And there is 100% cpu load, equally shared between
> the two large guests. To stop the load I can destroy either large
> guest. Additionally, I have just discovered that suspending any large
> guest works as well. Moreover, after resume, the load does not come
> back for a while. Both methods stop the high load instantly (faster
> than a second). As you were asking for a 'top' inside the guest, here
> it is:
>
> top - 03:27:27 up 42 min,  1 user,  load average: 18.37, 7.68, 3.12
> Tasks: 197 total,  23 running, 174 sleeping,   0 stopped,   0 zombie
> Cpu(s):  0.0%us, 89.2%sy,  0.0%ni, 10.5%id,  0.0%wa,  0.0%hi,  0.2%si,  0.0%st
> Mem:   3510912k total,  1159760k used,  2351152k free,    62568k buffers
> Swap:  4194296k total,        0k used,  4194296k free,   484492k cached
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> 12303 root      20   0     0    0    0 R  100  0.0   0:33.72
> vpsnetclean
> 11772 99        20   0  149m  11m 2104 R   82  0.3   0:15.10 httpd
> 10906 99        20   0  149m  11m 2124 R   73  0.3   0:11.52 httpd
> 10247 99        20   0  149m  11m 2128 R   31  0.3   0:05.39 httpd
>  3916 root      20   0 86468  11m 1476 R   16  0.3   0:15.14
> cpsrvd-ssl
> 10919 99        20   0  149m  11m 2124 R    8  0.3   0:03.43 httpd
> 11296 99        20   0  149m  11m 2112 R    7  0.3   0:03.26 httpd
> 12265 99        20   0  149m  11m 2088 R    7  0.3   0:08.01 httpd
> 12317 root      20   0 99.6m 1384  716 R    7  0.0   0:06.57 crond
> 12326 503       20   0  8872   96   72 R    7  0.0   0:01.13 php
>  3634 root      20   0 74804 1176  596 R    6  0.0   0:12.15 crond
> 11864 32005     20   0 87224  13m 2528 R    6  0.4   0:30.84
> cpsrvd-ssl
> 12275 root      20   0 30628 9976 1364 R    6  0.3   0:24.68 cpgs_chk
> 11305 99        20   0  149m  11m 2104 R    6  0.3   0:02.53 httpd
> 12278 root      20   0  8808 1328  968 R    6  0.0   0:04.63 sim
>  1534 root      20   0     0    0    0 S    6  0.0   0:03.29
> flush-254:2
>  3626 root      20   0  149m  13m 5324 R    6  0.4   0:27.62 httpd
> 12279 32008     20   0 87472 7668 2480 R    6  0.2   0:27.63
> munin-update
> 10243 99        20   0  149m  11m 2128 R    5  0.3   0:08.47 httpd
> 12321 root      20   0 99.6m 1460  792 R    5  0.0   0:07.43 crond
> 12325 root      20   0 74804  672   92 R    5  0.0   0:00.76 crond
>  1531 root      20   0     0    0    0 S    2  0.0   0:02.26 kjournald
>     1 root      20   0 10316  756  620 S    0  0.0   0:02.10 init
>     2 root      20   0     0    0    0 S    0  0.0   0:00.01 kthreadd
>     3 root      RT   0     0    0    0 S    0  0.0   0:01.08
> migration/0
>     4 root      20   0     0    0    0 S    0  0.0   0:00.02
> ksoftirqd/0
>     5 root      RT   0     0    0    0 S    0  0.0   0:00.00
> watchdog/0
>     6 root      RT   0     0    0    0 S    0  0.0   0:00.47
> migration/1
>     7 root      20   0     0    0    0 S    0  0.0   0:00.03
> ksoftirqd/1
>     8 root      RT   0     0    0    0 S    0  0.0   0:00.00
> watchdog/1
>
>
> The tasks are changing in the 'top' view, so it is nothing like a
> single task hanging - it is more like a machine working off a swap.
> The problem is, however that according to vmstat, there is no swap
> activity during this time. Should I try to decrease RAM I give to my
> guests even more? Is it too much to have 3 guests with hugepages?
> Should I try something else? Unfortunately it is a production system
> and I can't play with it very much.
>
> Here is 'top' on the host:
>
> top - 03:32:12 up 25 days, 23:38,  2 users,  load average: 8.50, 5.07, 10.39
> Tasks: 133 total,   1 running, 132 sleeping,   0 stopped,   0 zombie
> Cpu(s): 99.1%us,  0.7%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.2%si,  0.0%st
> Mem:   8193472k total,  8071776k used,   121696k free,    45296k buffers
> Swap: 11716412k total,        0k used, 11714844k free,   197236k cached
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>  8426 libvirt-  20   0 3771m  27m 3904 S  199  0.3  10:28.33 kvm
>  8374 libvirt-  20   0 3815m  32m 3908 S  199  0.4   8:11.53 kvm
>  1557 libvirt-  20   0  225m 7720 2092 S    1  0.1 436:54.45 kvm
>    72 root      20   0     0    0    0 S    0  0.0   6:22.54
> kondemand/3
>   379 root      20   0     0    0    0 S    0  0.0  58:20.99 md3_raid5
>     1 root      20   0 23768 1944 1228 S    0  0.0   0:00.95 init
>     2 root      20   0     0    0    0 S    0  0.0   0:00.24 kthreadd
>     3 root      20   0     0    0    0 S    0  0.0   0:12.66
> ksoftirqd/0
>     4 root      RT   0     0    0    0 S    0  0.0   0:07.58
> migration/0
>     5 root      RT   0     0    0    0 S    0  0.0   0:00.00
> watchdog/0
>     6 root      RT   0     0    0    0 S    0  0.0   0:15.05
> migration/1
>     7 root      20   0     0    0    0 S    0  0.0   0:19.64
> ksoftirqd/1
>     8 root      RT   0     0    0    0 S    0  0.0   0:00.00
> watchdog/1
>     9 root      RT   0     0    0    0 S    0  0.0   0:07.21
> migration/2
>    10 root      20   0     0    0    0 S    0  0.0   0:41.74
> ksoftirqd/2
>    11 root      RT   0     0    0    0 S    0  0.0   0:00.00
> watchdog/2
>    12 root      RT   0     0    0    0 S    0  0.0   0:13.62
> migration/3
>    13 root      20   0     0    0    0 S    0  0.0   0:24.63
> ksoftirqd/3
>    14 root      RT   0     0    0    0 S    0  0.0   0:00.00
> watchdog/3
>    15 root      20   0     0    0    0 S    0  0.0   1:17.11 events/0
>    16 root      20   0     0    0    0 S    0  0.0   1:33.30 events/1
>    17 root      20   0     0    0    0 S    0  0.0   4:15.28 events/2
>    18 root      20   0     0    0    0 S    0  0.0   1:13.49 events/3
>    19 root      20   0     0    0    0 S    0  0.0   0:00.00 cpuset
>    20 root      20   0     0    0    0 S    0  0.0   0:00.02 khelper
>    21 root      20   0     0    0    0 S    0  0.0   0:00.00 netns
>    22 root      20   0     0    0    0 S    0  0.0   0:00.00 async/mgr
>    23 root      20   0     0    0    0 S    0  0.0   0:00.00 pm
>    25 root      20   0     0    0    0 S    0  0.0   0:02.47
> sync_supers
>    26 root      20   0     0    0    0 S    0  0.0   0:03.86
> bdi-default
>
>
> Please help...
>
> Thanks,
> Dmitry
>
> On Sat, Oct 2, 2010 at 1:30 AM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
>>
>> On Thu, Sep 30, 2010 at 12:07:15PM +0300, Dmitry Golubev wrote:
>> > Hi,
>> >
>> > I am not sure what's really happening, but every few hours
>> > (unpredictable) two virtual machines (Linux 2.6.32) start to generate
>> > huge cpu loads. It looks like some kind of loop is unable to complete
>> > or something...
>> >
>> > So the idea is:
>> >
>> > 1. I have two linux 2.6.32 x64 (openvz, proxmox project) guests
>> > running on linux 2.6.35 x64 (ubuntu maverick) host with a Q6600
>> > Core2Quad on qemu-kvm 0.12.5 and libvirt 0.8.3 and another one small
>> > 32bit linux virtual machine (16MB of ram) with a router inside (i
>> > doubt it contributes to the problem).
>> >
>> > 2. All these machines use hufetlbfs. The server has 8GB of RAM, I
>> > reserved 3696 huge pages (page size is 2MB) on the server, and I am
>> > running the main guests each having 3550MB of virtual memory. The
>> > third guest, as I wrote before, takes 16MB of virtual memory.
>> >
>> > 3. Once run, the guests reserve huge pages for themselves normally. As
>> > mem-prealloc is default, they grab all the memory they should have,
>> > leaving 6 pages unreserved (HugePages_Free - HugePages_Rsvd = 6) all
>> > times - so as I understand they should not want to get any more,
>> > right?
>> >
>> > 4. All virtual machines run perfectly normal without any disturbances
>> > for few hours. They do not, however, use all their memory, so maybe
>> > the issue arises when they pass some kind of a threshold.
>> >
>> > 5. At some point of time both guests exhibit cpu load over the top
>> > (16-24). At the same time, host works perfectly well, showing load of
>> > 8 and that both kvm processes use CPU equally and fully. This point of
>> > time is unpredictable - it can be anything from one to twenty hours,
>> > but it will be less than a day. Sometimes the load disappears in a
>> > moment, but usually it stays like that, and everything works extremely
>> > slow (even a 'ps' command executes some 2-5 minutes).
>> >
>> > 6. If I am patient, I can start rebooting the gueat systems - once
>> > they have restarted, everything returns to normal. If I destroy one of
>> > the guests (virsh destroy), the other one starts working normally at
>> > once (!).
>> >
>> > I am relatively new to kvm and I am absolutely lost here. I have not
>> > experienced such problems before, but recently I upgraded from ubuntu
>> > lucid (I think it was linux 2.6.32, qemukvm 0.12.3 and libvirt 0.7.5)
>> > and started to use hugepages. These two virtual machines are not
>> > normally run on the same host system (i have a corosync/pacemaker
>> > cluster with drbd storage), but when one of the hosts is not
>> > abailable, they start running on the same host. That is the reason I
>> > have not noticed this earlier.
>> >
>> > Unfortunately, I don't have any spare hardware to experiment and this
>> > is a production system, so my debugging options are rather limited.
>> >
>> > Do you have any ideas, what could be wrong?
>>
>> Is there swapping activity on the host when this happens?
>>
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: KVM with hugepages generate huge load with two guests
  2010-11-18  6:53       ` Dmitry Golubev
@ 2010-11-21  0:24         ` Dmitry Golubev
  2010-11-21  8:50           ` Michael Tokarev
  2010-11-21 11:28           ` Avi Kivity
  0 siblings, 2 replies; 18+ messages in thread
From: Dmitry Golubev @ 2010-11-21  0:24 UTC (permalink / raw)
  To: kvm; +Cc: Marcelo Tosatti, Avi Kivity

Hi,

Seems that nobody is interested in this bug :(

Anyway I wanted to add a bit more to this investigation.

Once I put "nohz=off highres=off clocksource=acpi_pm" in guest kernel
options, the guests started to behave better - they do not stay in the
slow state, but rather get there for some seconds (usually up to
minute, but sometimes 2-3 minutes) and then get out of it (this cycle
repeats once in a while - every approx 3-6 minutes). Once the
situation became stable, so that I am able to leave the guests without
very much worries, I also noticed that sometimes the predicted
swapping occurs, although rarely (I waited about half an hour to catch
the first swapping on the host). Here is a fragment of vmstat. Note
that when the first column shows 8-9 - the slowness and huge load
happens. You can also see how is appears and disappears (with nohz and
kvm-clock it did not go out of slowness period, but with tsc clock the
probability of getting out is significantly lower):

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 8  0      0  60456  19708 253688    0    0     6   170 5771 1712 97  3  0  0
 9  5      0  58752  19708 253688    0    0    11    57 6457 1500 96  4  0  0
 8  0      0  58192  19708 253688    0    0    55   106 5112 1588 98  3  0  0
 8  0      0  58068  19708 253688    0    0    21     0 2609 1498 100  0  0  0
 8  2      0  57728  19708 253688    0    0     9    96 2645 1620 100  0  0  0
 8  0      0  53852  19716 253680    0    0     2   186 6321 1935 97  4  0  0
 8  0      0  49636  19716 253688    0    0     0    45 3482 1484 99  1  0  0
 8  0      0  49452  19716 253688    0    0     0    34 3253 1851 100  0  0  0
 4  1   1468 126252  16780 182256   53  317   393   788 29318 3498 79 21  0  0
 4  0   1468 135596  16780 182332    0    0     7   360 26782 2459 79 21  0  0
 1  0   1468 169720  16780 182340    0    0    75    81 22024 3194 40 15 42  3
 3  0   1464 167608  16780 182340    6    0    26  1579 9404 5526 22  8 35 35
 0  0   1460 164232  16780 182504    0    0    85   170 4955 3345 21  5 69  5
 0  0   1460 163636  16780 182504    0    0     0    90 1288 1855  5  2 90  3
 1  0   1460 164836  16780 182504    0    0     0    34 1166 1789  4  2 93  1
 1  0   1452 165628  16780 182504    0    0   285    70 1981 2692 10  2 83  4
 1  0   1452 160044  16952 184840    6    0   832   146 5046 3303 11  6 76  7
 1  0   1452 161416  16960 184840    0    0    19   170 1732 2577 10  2 74 13
 0  1   1452 161920  16960 184840    0    0   111    53 1084 1986  0  1 96  3
 0  0   1452 161332  16960 184840    0    0   254    34  856 1505  2  1 95  3
 1  0   1452 159168  16960 184840    0    0   366    46 2137 2774  3  2 94  1
 1  0   1452 157408  16968 184840    0    0     0    69 2423 2991  9  5 84  2
 0  0   1444 157876  16968 184840    0    0     0    45 6343 3079 24 10 65  1
 0  0   1428 159644  16968 184844    6    0     8    52  724 1276  0  0 98  2
 0  0   1428 160336  16968 184844    0    0    31    98 1115 1835  1  1 92  6
 1  0   1428 161360  16968 184844    0    0     0    45 1333 1849  2  1 95  2
 0  0   1428 162092  16968 184844    0    0     0   408 3517 4267 11  2 78  8
 1  1   1428 163868  16968 184844    0    0    24   121 1714 2036 10  2 86  2
 1  3   1428 161292  16968 184844    0    0     3   143 2906 3503 16  4 77  3
 0  0   1428 156448  16976 184836    0    0     1   781 5661 4464 16  7 74  3
 1  0   1428 156924  16976 184844    0    0   588    92 2341 3845  7  2 87  4
 0  0   1428 158816  16976 184844    0    0    27   119 2052 3830  5  1 89  4
 0  0   1428 161420  16976 184844    0    0     1    56 3923 3132 26  4 68  1
 0  0   1428 162724  16976 184844    0    0    10   107 2806 3558 10  2 86  2
 1  0   1428 165244  16976 184844    0    0    34   155 2084 2469  8  2 78 12
 0  0   1428 165204  16976 184844    0    0   390   282 9568 4924 17 11 55 17
 1  0   1392 163864  16976 185064  102    0   218   411 11762 16591  6  9 68 17
 8  0   1384 164992  16984 185056    0    0     9    88 7540 5761 73  6 17  4
 8  0   1384 163620  16984 185076    0    0     1    89 21936 45040 90 10  0  0
 8  0   1384 165324  16992 185076    0    0     5   194 3330 1678 99  1  0  0
 8  0   1384 165704  16992 185076    0    0     1    54 2651 1457 99  1  0  0
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 8  0   1384 163016  17000 185076    0    0     0   126 4988 1536 97  3  0  0
 9  1   1384 162608  17000 185076    0    0    34   477 20106 2351 83 17  0  0
 0  0   1384 184052  17000 185076    0    0   102  1198 48951 3628 48 38  6  8
 0  0   1384 183088  17008 185076    0    0     8   156 1228 1419  2  2 82 14
 0  0   1384 184436  17008 185164    0    0    28   113 3176 2785 12  7 75  6
 0  0   1384 184568  17008 185164    0    0    30   107 1547 1821  4  3 87  6
 4  2   1228 228808  17008 185212   34    0   243     9 1591 1212 10 14 76  1
 9  0   1228 223644  17016 185164    0    0  2872   857 18515 5134 45 20  9 26
 0  3   1228 224840  17016 185164    0    0  1080   786 8281 5490 35 12 21 33
 2  0   1228 222032  17016 185164    0    0  1184    99 21056 3713 26 17 48  9
 1  0   1228 221784  17016 185164    0    0  2075    69 3089 3749  9  7 73 11
 3  0   1228 220544  17016 185164    0    0  1501   150 3815 3520  7  8 73 12
 3  0   1228 219736  17024 185164    0    0  1129   103 7726 4177 20 11 60  9
 0  4   1228 217224  17024 185164    0    0  2844   211 6068 4643  9  7 60 23

Thanks,
Dmitry

On Thu, Nov 18, 2010 at 8:53 AM, Dmitry Golubev <lastguru@gmail.com> wrote:
> Hi,
>
> Sorry to bother you again. I have more info:
>
>> 1. router with 32MB of RAM (hugepages) and 1VCPU
> ...
>> Is it too much to have 3 guests with hugepages?
>
> OK, this router is also out of equation - I disabled hugepages for it.
> There should be also additional pages available to guests because of
> that. I think this should be pretty reproducible... Two exactly
> similar 64bit Linux 2.6.32 guests with 3500MB of virtual RAM and 4
> VCPU each, running on a Core2Quad (4 real cores) machine with 8GB of
> RAM and 3546 2MB hugepages on a 64bit Linux 2.6.35 host (libvirt
> 0.8.3) from Ubuntu Maverick.
>
> Still no swapping and the effect is pretty much the same: one guest
> runs well, two guests work for some minutes - then slow down few
> hundred times, showing huge load both inside (unlimited rapid growth
> of loadaverage) and outside (host load is not making it unresponsive
> though - but loaded to the max). Load growth on host is instant and
> finite ('r' column change indicate this sudden rise):
>
> # vmstat 5
> procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
>  1  3      0 194220  30680  76712    0    0   319    28 2633 1960  6  6 67 20
>  1  2      0 193776  30680  76712    0    0     4   231 55081 78491  3 39 17 41
> 10  1      0 185508  30680  76712    0    0     4    87 53042 34212 55 27  9  9
> 12  0      0 185180  30680  76712    0    0     2    95 41007 21990 84 16  0  0
>
> Thanks,
> Dmitry
>
> On Wed, Nov 17, 2010 at 4:19 AM, Dmitry Golubev <lastguru@gmail.com> wrote:
>> Hi,
>>
>> Maybe you remember that I wrote few weeks ago about KVM cpu load
>> problem with hugepages. The problem was lost hanging, however I have
>> now some new information. So the description remains, however I have
>> decreased both guest memory and the amount of hugepages:
>>
>> Ram = 8GB, hugepages = 3546
>>
>> Total of 2 virual machines:
>> 1. router with 32MB of RAM (hugepages) and 1VCPU
>> 2. linux guest with 3500MB of RAM (hugepages) and 4VCPU
>>
>> Everything works fine until I start the second linux guest with the
>> same 3500MB of guest RAM also in hugepages and also 4VCPU. The rest of
>> description is the same as before: after a while the host shows
>> loadaverage of about 8 (on a Core2Quad) and it seems that both big
>> guests consume exactly the same amount of resources. The hosts seems
>> responsive though. Inside the guests, however, things are not so good
>> - the load sky rockets to at least 20. Guests are not responsive and
>> even a 'ps' executes inappropriately slow (may take few minutes -
>> here, however, load builds up and it seems that machine becomes slower
>> with time, unlike host, which shows the jump in resource consumption
>> instantly). It also seem that the more guests uses memory, the faster
>> the problem appers. Still at least a gig of RAM is free on each guest
>> and there is no swap activity inside the guest.
>>
>> The most important thing - why I went back and quoted older message
>> than the last one, is that there is no more swap activity on host, so
>> the previous track of thought may also be wrong and I returned to the
>> beginning. There is plenty of RAM now and swap on host is always on 0
>> as seen in 'top'. And there is 100% cpu load, equally shared between
>> the two large guests. To stop the load I can destroy either large
>> guest. Additionally, I have just discovered that suspending any large
>> guest works as well. Moreover, after resume, the load does not come
>> back for a while. Both methods stop the high load instantly (faster
>> than a second). As you were asking for a 'top' inside the guest, here
>> it is:
>>
>> top - 03:27:27 up 42 min,  1 user,  load average: 18.37, 7.68, 3.12
>> Tasks: 197 total,  23 running, 174 sleeping,   0 stopped,   0 zombie
>> Cpu(s):  0.0%us, 89.2%sy,  0.0%ni, 10.5%id,  0.0%wa,  0.0%hi,  0.2%si,  0.0%st
>> Mem:   3510912k total,  1159760k used,  2351152k free,    62568k buffers
>> Swap:  4194296k total,        0k used,  4194296k free,   484492k cached
>>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>> 12303 root      20   0     0    0    0 R  100  0.0   0:33.72
>> vpsnetclean
>> 11772 99        20   0  149m  11m 2104 R   82  0.3   0:15.10 httpd
>> 10906 99        20   0  149m  11m 2124 R   73  0.3   0:11.52 httpd
>> 10247 99        20   0  149m  11m 2128 R   31  0.3   0:05.39 httpd
>>  3916 root      20   0 86468  11m 1476 R   16  0.3   0:15.14
>> cpsrvd-ssl
>> 10919 99        20   0  149m  11m 2124 R    8  0.3   0:03.43 httpd
>> 11296 99        20   0  149m  11m 2112 R    7  0.3   0:03.26 httpd
>> 12265 99        20   0  149m  11m 2088 R    7  0.3   0:08.01 httpd
>> 12317 root      20   0 99.6m 1384  716 R    7  0.0   0:06.57 crond
>> 12326 503       20   0  8872   96   72 R    7  0.0   0:01.13 php
>>  3634 root      20   0 74804 1176  596 R    6  0.0   0:12.15 crond
>> 11864 32005     20   0 87224  13m 2528 R    6  0.4   0:30.84
>> cpsrvd-ssl
>> 12275 root      20   0 30628 9976 1364 R    6  0.3   0:24.68 cpgs_chk
>> 11305 99        20   0  149m  11m 2104 R    6  0.3   0:02.53 httpd
>> 12278 root      20   0  8808 1328  968 R    6  0.0   0:04.63 sim
>>  1534 root      20   0     0    0    0 S    6  0.0   0:03.29
>> flush-254:2
>>  3626 root      20   0  149m  13m 5324 R    6  0.4   0:27.62 httpd
>> 12279 32008     20   0 87472 7668 2480 R    6  0.2   0:27.63
>> munin-update
>> 10243 99        20   0  149m  11m 2128 R    5  0.3   0:08.47 httpd
>> 12321 root      20   0 99.6m 1460  792 R    5  0.0   0:07.43 crond
>> 12325 root      20   0 74804  672   92 R    5  0.0   0:00.76 crond
>>  1531 root      20   0     0    0    0 S    2  0.0   0:02.26 kjournald
>>     1 root      20   0 10316  756  620 S    0  0.0   0:02.10 init
>>     2 root      20   0     0    0    0 S    0  0.0   0:00.01 kthreadd
>>     3 root      RT   0     0    0    0 S    0  0.0   0:01.08
>> migration/0
>>     4 root      20   0     0    0    0 S    0  0.0   0:00.02
>> ksoftirqd/0
>>     5 root      RT   0     0    0    0 S    0  0.0   0:00.00
>> watchdog/0
>>     6 root      RT   0     0    0    0 S    0  0.0   0:00.47
>> migration/1
>>     7 root      20   0     0    0    0 S    0  0.0   0:00.03
>> ksoftirqd/1
>>     8 root      RT   0     0    0    0 S    0  0.0   0:00.00
>> watchdog/1
>>
>>
>> The tasks are changing in the 'top' view, so it is nothing like a
>> single task hanging - it is more like a machine working off a swap.
>> The problem is, however that according to vmstat, there is no swap
>> activity during this time. Should I try to decrease RAM I give to my
>> guests even more? Is it too much to have 3 guests with hugepages?
>> Should I try something else? Unfortunately it is a production system
>> and I can't play with it very much.
>>
>> Here is 'top' on the host:
>>
>> top - 03:32:12 up 25 days, 23:38,  2 users,  load average: 8.50, 5.07, 10.39
>> Tasks: 133 total,   1 running, 132 sleeping,   0 stopped,   0 zombie
>> Cpu(s): 99.1%us,  0.7%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.2%si,  0.0%st
>> Mem:   8193472k total,  8071776k used,   121696k free,    45296k buffers
>> Swap: 11716412k total,        0k used, 11714844k free,   197236k cached
>>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>>  8426 libvirt-  20   0 3771m  27m 3904 S  199  0.3  10:28.33 kvm
>>  8374 libvirt-  20   0 3815m  32m 3908 S  199  0.4   8:11.53 kvm
>>  1557 libvirt-  20   0  225m 7720 2092 S    1  0.1 436:54.45 kvm
>>    72 root      20   0     0    0    0 S    0  0.0   6:22.54
>> kondemand/3
>>   379 root      20   0     0    0    0 S    0  0.0  58:20.99 md3_raid5
>>     1 root      20   0 23768 1944 1228 S    0  0.0   0:00.95 init
>>     2 root      20   0     0    0    0 S    0  0.0   0:00.24 kthreadd
>>     3 root      20   0     0    0    0 S    0  0.0   0:12.66
>> ksoftirqd/0
>>     4 root      RT   0     0    0    0 S    0  0.0   0:07.58
>> migration/0
>>     5 root      RT   0     0    0    0 S    0  0.0   0:00.00
>> watchdog/0
>>     6 root      RT   0     0    0    0 S    0  0.0   0:15.05
>> migration/1
>>     7 root      20   0     0    0    0 S    0  0.0   0:19.64
>> ksoftirqd/1
>>     8 root      RT   0     0    0    0 S    0  0.0   0:00.00
>> watchdog/1
>>     9 root      RT   0     0    0    0 S    0  0.0   0:07.21
>> migration/2
>>    10 root      20   0     0    0    0 S    0  0.0   0:41.74
>> ksoftirqd/2
>>    11 root      RT   0     0    0    0 S    0  0.0   0:00.00
>> watchdog/2
>>    12 root      RT   0     0    0    0 S    0  0.0   0:13.62
>> migration/3
>>    13 root      20   0     0    0    0 S    0  0.0   0:24.63
>> ksoftirqd/3
>>    14 root      RT   0     0    0    0 S    0  0.0   0:00.00
>> watchdog/3
>>    15 root      20   0     0    0    0 S    0  0.0   1:17.11 events/0
>>    16 root      20   0     0    0    0 S    0  0.0   1:33.30 events/1
>>    17 root      20   0     0    0    0 S    0  0.0   4:15.28 events/2
>>    18 root      20   0     0    0    0 S    0  0.0   1:13.49 events/3
>>    19 root      20   0     0    0    0 S    0  0.0   0:00.00 cpuset
>>    20 root      20   0     0    0    0 S    0  0.0   0:00.02 khelper
>>    21 root      20   0     0    0    0 S    0  0.0   0:00.00 netns
>>    22 root      20   0     0    0    0 S    0  0.0   0:00.00 async/mgr
>>    23 root      20   0     0    0    0 S    0  0.0   0:00.00 pm
>>    25 root      20   0     0    0    0 S    0  0.0   0:02.47
>> sync_supers
>>    26 root      20   0     0    0    0 S    0  0.0   0:03.86
>> bdi-default
>>
>>
>> Please help...
>>
>> Thanks,
>> Dmitry
>>
>> On Sat, Oct 2, 2010 at 1:30 AM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
>>>
>>> On Thu, Sep 30, 2010 at 12:07:15PM +0300, Dmitry Golubev wrote:
>>> > Hi,
>>> >
>>> > I am not sure what's really happening, but every few hours
>>> > (unpredictable) two virtual machines (Linux 2.6.32) start to generate
>>> > huge cpu loads. It looks like some kind of loop is unable to complete
>>> > or something...
>>> >
>>> > So the idea is:
>>> >
>>> > 1. I have two linux 2.6.32 x64 (openvz, proxmox project) guests
>>> > running on linux 2.6.35 x64 (ubuntu maverick) host with a Q6600
>>> > Core2Quad on qemu-kvm 0.12.5 and libvirt 0.8.3 and another one small
>>> > 32bit linux virtual machine (16MB of ram) with a router inside (i
>>> > doubt it contributes to the problem).
>>> >
>>> > 2. All these machines use hufetlbfs. The server has 8GB of RAM, I
>>> > reserved 3696 huge pages (page size is 2MB) on the server, and I am
>>> > running the main guests each having 3550MB of virtual memory. The
>>> > third guest, as I wrote before, takes 16MB of virtual memory.
>>> >
>>> > 3. Once run, the guests reserve huge pages for themselves normally. As
>>> > mem-prealloc is default, they grab all the memory they should have,
>>> > leaving 6 pages unreserved (HugePages_Free - HugePages_Rsvd = 6) all
>>> > times - so as I understand they should not want to get any more,
>>> > right?
>>> >
>>> > 4. All virtual machines run perfectly normal without any disturbances
>>> > for few hours. They do not, however, use all their memory, so maybe
>>> > the issue arises when they pass some kind of a threshold.
>>> >
>>> > 5. At some point of time both guests exhibit cpu load over the top
>>> > (16-24). At the same time, host works perfectly well, showing load of
>>> > 8 and that both kvm processes use CPU equally and fully. This point of
>>> > time is unpredictable - it can be anything from one to twenty hours,
>>> > but it will be less than a day. Sometimes the load disappears in a
>>> > moment, but usually it stays like that, and everything works extremely
>>> > slow (even a 'ps' command executes some 2-5 minutes).
>>> >
>>> > 6. If I am patient, I can start rebooting the gueat systems - once
>>> > they have restarted, everything returns to normal. If I destroy one of
>>> > the guests (virsh destroy), the other one starts working normally at
>>> > once (!).
>>> >
>>> > I am relatively new to kvm and I am absolutely lost here. I have not
>>> > experienced such problems before, but recently I upgraded from ubuntu
>>> > lucid (I think it was linux 2.6.32, qemukvm 0.12.3 and libvirt 0.7.5)
>>> > and started to use hugepages. These two virtual machines are not
>>> > normally run on the same host system (i have a corosync/pacemaker
>>> > cluster with drbd storage), but when one of the hosts is not
>>> > abailable, they start running on the same host. That is the reason I
>>> > have not noticed this earlier.
>>> >
>>> > Unfortunately, I don't have any spare hardware to experiment and this
>>> > is a production system, so my debugging options are rather limited.
>>> >
>>> > Do you have any ideas, what could be wrong?
>>>
>>> Is there swapping activity on the host when this happens?
>>>
>>
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: KVM with hugepages generate huge load with two guests
  2010-11-21  0:24         ` Dmitry Golubev
@ 2010-11-21  8:50           ` Michael Tokarev
  2010-11-21 11:22             ` Dmitry Golubev
  2010-11-21 11:28           ` Avi Kivity
  1 sibling, 1 reply; 18+ messages in thread
From: Michael Tokarev @ 2010-11-21  8:50 UTC (permalink / raw)
  To: Dmitry Golubev; +Cc: kvm, Marcelo Tosatti, Avi Kivity

21.11.2010 03:24, Dmitry Golubev wrote:
> Hi,
> 
> Seems that nobody is interested in this bug :(
> 
> Anyway I wanted to add a bit more to this investigation.
> 
> Once I put "nohz=off highres=off clocksource=acpi_pm" in guest kernel
> options, the guests started to behave better - they do not stay in the
> slow state, but rather get there for some seconds (usually up to
> minute, but sometimes 2-3 minutes) and then get out of it (this cycle

Just out of curiocity: did you try updating the BIOS on your
motherboard?  The issus you're facing seems to be quite unique,
and I've seen more than once how various different weird issues
were fixed just by updating the BIOS.  Provided they actually
did they own homework and fixed something and released the fixes
too... ;)

P.S.  I'm Not A Guru (tm) :)

/mjt

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: KVM with hugepages generate huge load with two guests
  2010-11-21  8:50           ` Michael Tokarev
@ 2010-11-21 11:22             ` Dmitry Golubev
  0 siblings, 0 replies; 18+ messages in thread
From: Dmitry Golubev @ 2010-11-21 11:22 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: kvm, Marcelo Tosatti, Avi Kivity

> Just out of curiocity: did you try updating the BIOS on your
> motherboard?  The issus you're facing seems to be quite unique,
> and I've seen more than once how various different weird issues
> were fixed just by updating the BIOS.  Provided they actually
> did they own homework and fixed something and released the fixes
> too... ;)

Thank you for reply, I really appreciate that somebody found time to
answer. Unfortunately for this investigation I managed to upgrade BIOS
version few months ago. I just checked - there are no newer versions.

I do see, however, that many people advise to change to acpi_pm
ckocksource (and, thus, disable nohz option) in case similar problems
are experienced - I did not invent this workaround (got the idea here:
http://forum.proxmox.com/threads/5144-100-CPU-on-host-VM-hang-every-night?p=29143#post29143
). Looks like an ancient bug. I even upgraded my qemu-kvm to version
0.13 without any significant changes to this behavior.

It is really weird, however how one guest can work fine, but two start
messing with each other. Shouldn't there be some kind of isolation
between them? As they both start to behave exactly the same at exactly
the same time. And it does not happen once a month or a year, but
pretty frequently.

Thanks,
Dmitry

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: KVM with hugepages generate huge load with two guests
  2010-11-21  0:24         ` Dmitry Golubev
  2010-11-21  8:50           ` Michael Tokarev
@ 2010-11-21 11:28           ` Avi Kivity
  2010-11-21 15:03             ` Dmitry Golubev
  1 sibling, 1 reply; 18+ messages in thread
From: Avi Kivity @ 2010-11-21 11:28 UTC (permalink / raw)
  To: Dmitry Golubev; +Cc: kvm, Marcelo Tosatti

On 11/21/2010 02:24 AM, Dmitry Golubev wrote:
> Hi,
>
> Seems that nobody is interested in this bug :(
>

It's because the information is somewhat confused.  There's a way to 
prepare bug reports that gets developers competing to see who solves it 
first.


> Anyway I wanted to add a bit more to this investigation.
>
> Once I put "nohz=off highres=off clocksource=acpi_pm" in guest kernel
> options, the guests started to behave better - they do not stay in the
> slow state, but rather get there for some seconds (usually up to
> minute, but sometimes 2-3 minutes) and then get out of it (this cycle
> repeats once in a while - every approx 3-6 minutes). Once the
> situation became stable, so that I am able to leave the guests without
> very much worries, I also noticed that sometimes the predicted
> swapping occurs, although rarely (I waited about half an hour to catch
> the first swapping on the host). Here is a fragment of vmstat. Note
> that when the first column shows 8-9 - the slowness and huge load
> happens. You can also see how is appears and disappears (with nohz and
> kvm-clock it did not go out of slowness period, but with tsc clock the
> probability of getting out is significantly lower):
>

Are you sure it is hugepages related?

Can you post kvm_stat output while slowness is happening? 'perf top' on 
the host?  and on the guest?

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: KVM with hugepages generate huge load with two guests
  2010-11-21 11:28           ` Avi Kivity
@ 2010-11-21 15:03             ` Dmitry Golubev
  2010-12-01  3:38               ` Dmitry Golubev
  0 siblings, 1 reply; 18+ messages in thread
From: Dmitry Golubev @ 2010-11-21 15:03 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, Marcelo Tosatti

Thanks for the answer.

> Are you sure it is hugepages related?

Well, empirically it looked like either hugepages-related, or
regression of qemu-kvm 0.12.3 -> 0.12.5, as this did not happen until
I upgraded (needed to avoid disk corruption caused by a bug in 0.12.3)
and put hugepages. However as frequency of problem does seem related
to memory each guest consumes (more memory = faster the problem
appears) and in the beginning it might have been that the memory
consumption of the guests did not hit some kind of threshold, maybe it
is not really hugepages related.

> Can you post kvm_stat output while slowness is happening? 'perf top' on the host?  and on the guest?

OK, I will test this and write back.

Thanks,
Dmitry

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: KVM with hugepages generate huge load with two guests
  2010-11-21 15:03             ` Dmitry Golubev
@ 2010-12-01  3:38               ` Dmitry Golubev
  2010-12-14  7:26                 ` Dmitry Golubev
  0 siblings, 1 reply; 18+ messages in thread
From: Dmitry Golubev @ 2010-12-01  3:38 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, Marcelo Tosatti

Hi,

Sorry it took so slow to reply you - there are only few moments when I
can poke a production server and I need to notify people in advance
about that :(

> Can you post kvm_stat output while slowness is happening? 'perf top' on the host?  and on the guest?

I took 'perf top' and first thing I saw is that while guest is on
acpi_pm, it shows more or less normal amount of IRQs (under 1000/s),
however when I switched back to the default (which is nohz with
kvm_clock), there are 40 times (!!!) more IRQs under normal operation
(about 40 000/s). When the slowdown is happening, there are a lot of
_spin_lock events and a lot of messages like: "WARNING: failed to keep
up with mmap data.  Last read 810 msecs ago."

As I told before, switching to acpi_pm does not save the day, but
makes situation a lot more workable (i.e., servers recover faster from
the period of slowness). During slowdowns on acpi_pm I also see
"_spin_lock"

Raw data follows:



vmstat -5 on the host:

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 0  0      0 131904  13952 205872    0    0     0    24 2495 9813  6  3 91  0
 0  0      0 132984  13952 205872    0    0     0    47 2596 9851  5  3 91  1
 1  0      0 132148  13952 205872    0    0     0    54 2644 10559  3  3 93  1
 0  1      0 129084  13952 205872    0    0     0    38 3039 9752  7  3 87  2
 6  0      0 126388  13952 205872    0    0     0   311 15619 9009 42 17 39  2
 9  0      0 125868  13960 205872    0    0     6    86 4659 6504 98  2  0  0
 8  0      0 123320  13960 205872    0    0     0    26 4682 6649 98  2  0  0
 8  0      0 126252  13960 205872    0    0     0   124 4923 6776 98  2  0  0
 8  0      0 125376  13960 205872    0    0   136    11 4287 5865 98  2  0  0
 9  0      0 123812  13960 205872    0    0   205    51 4497 6134 98  2  0  0
 8  0      0 126020  13960 205872    0    0   904    26 4483 5999 98  2  0  0
 8  0      0 124052  13960 205872    0    0    15    10 4397 6200 98  2  0  0
 8  0      0 125928  13960 205872    0    0    14    41 4335 5823 98  2  0  0
 8  0      0 126184  13960 205872    0    0     6    14 4966 6588 98  2  0  0
 8  0      0 123588  13960 205872    0    0   143    18 5234 6891 98  2  0  0
 8  0      0 126640  13960 205872    0    0     6    91 5554 7334 98  2  0  0
 8  0      0 123144  13960 205872    0    0   146    11 5235 7145 98  2  0  0
 8  0      0 125856  13968 205872    0    0  1282    98 5481 7159 98  2  0  0
 9 19      0 124124  13968 205872    0    0   782  2433 8587 8987 97  3  0  0
 8  0      0 122584  13968 205872    0    0   432    90 5359 6960 98  2  0  0
 8  0      0 125320  13968 205872    0    0  3074    52 5448 7095 97  3  0  0
 8  0      0 121436  13968 205872    0    0  2519    81 5714 7279 98  2  0  0
 8  0      0 124436  13968 205872    0    0     1    56 5242 6864 98  2  0  0
 8  0      0 111324  13968 205872    0    0     2    22 10660 6686 97  3  0  0
 8  0      0 107824  13968 205872    0    0     0    24 14329 8147 97  3  0  0
 8  0      0 110420  13968 205872    0    0     0    68 13486 6985 98  2  0  0
 8  0      0 110024  13968 205872    0    0     0    19 13085 6659 98  2  0  0
 8  0      0 109932  13968 205872    0    0     0     3 12952 6415 98  2  0  0
 8  0      0 108552  13968 205880    0    0     2    41 13400 7349 98  2  0  0

Few shots with kvm_stat on the host:

Every 2.0s: kvm_stat -1

  Wed Dec  1 04:45:47 2010

efer_reload                    0         0
exits                   56264102     14074
fpu_reload                311506        50
halt_exits               4733166       935
halt_wakeup              3845079       840
host_state_reload        8795964      4085
hypercalls                     0         0
insn_emulation          13573212      7249
insn_emulation_fail            0         0
invlpg                   1846050        20
io_exits                 3579406       843
irq_exits                3038887      4879
irq_injections           5242157      3681
irq_window                124361       540
largepages                  2253         0
mmio_exits                 64274        20
mmu_cache_miss            664011        16
mmu_flooded               164506         1
mmu_pde_zapped            212686         8
mmu_pte_updated           729268         0
mmu_pte_write           81323616       551
mmu_recycled                 277         0
mmu_shadow_zapped         652691        23
mmu_unsync                  5630         8
nmi_injections                 0         0
nmi_window                     0         0
pf_fixed                17470658       218
pf_guest                13352205        81
remote_tlb_flush         1898930        96
request_irq                    0         0
signal_exits                   0         0
tlb_flush                5827433       108

Every 2.0s: kvm_stat -1

  Wed Dec  1 04:47:33 2010

efer_reload                    0         0
exits                   58155746     18954
fpu_reload                318003        61
halt_exits               4839340      1082
halt_wakeup              3940964       984
host_state_reload        9267420      4803
hypercalls                     0         0
insn_emulation          14376685      7721
insn_emulation_fail            0         0
invlpg                   1855758        13
io_exits                 3676471       993
irq_exits                3609310      5363
irq_injections           5648007      3922
irq_window                181397       517
largepages                  2253         0
mmio_exits                 65862        14
mmu_cache_miss            666017         2
mmu_flooded               164784         0
mmu_pde_zapped            213208         1
mmu_pte_updated           731301         1
mmu_pte_write           81455666        14
mmu_recycled                 277         0
mmu_shadow_zapped         653910         1
mmu_unsync                  5461        -2
nmi_injections                 0         0
nmi_window                     0         0
pf_fixed                17530360       154
pf_guest                13388143        89
remote_tlb_flush         1915787        39
request_irq                    0         0
signal_exits                   0         0
tlb_flush                5857307        68

Every 2.0s: kvm_stat -1

  Wed Dec  1 04:47:46 2010

efer_reload                    0         0
exits                   58382002     14542
fpu_reload                318544        32
halt_exits               4850647       541
halt_wakeup              3951056       441
host_state_reload        9317479      2669
hypercalls                     0         0
insn_emulation          14464287      5075
insn_emulation_fail            0         0
invlpg                   1856481        34
io_exits                 3686632       456
irq_exits                3670192      3742
irq_injections           5692201      2471
irq_window                186987       126
largepages                  2253         0
mmio_exits                 65981         6
mmu_cache_miss            666184        14
mmu_flooded               164819         1
mmu_pde_zapped            213264         5
mmu_pte_updated           731432         2
mmu_pte_write           81473978       563
mmu_recycled                 277         0
mmu_shadow_zapped         654130        10
mmu_unsync                  5410         4
nmi_injections                 0         0
nmi_window                     0         0
pf_fixed                17536667       653
pf_guest                13391345       300
remote_tlb_flush         1917634       120
request_irq                    0         0
signal_exits                   0         0
tlb_flush                5860221       205


'perf top' on the host:

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
   PerfTop:    3894 irqs/sec  kernel: 2.4%  exact:  0.0% [1000Hz
cycles],  (all, 4 CPUs)
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------

             samples  pcnt function                 DSO
             _______ _____ ________________________
______________________________________________________________

              109.00 15.1% vmx_vcpu_run
/lib/modules/2.6.35-23-server/kernel/arch/x86/kvm/kvm-intel.ko
               64.00  8.9% copy_user_generic_string [kernel.kallsyms]
               61.00  8.4% __ticket_spin_lock       [kernel.kallsyms]
               47.00  6.5% vcpu_enter_guest
/lib/modules/2.6.35-23-server/kernel/arch/x86/kvm/kvm.ko
               23.00  3.2% schedule                 [kernel.kallsyms]
               18.00  2.5% kvm_read_guest
/lib/modules/2.6.35-23-server/kernel/arch/x86/kvm/kvm.ko
               18.00  2.5% gfn_to_hva
/lib/modules/2.6.35-23-server/kernel/arch/x86/kvm/kvm.ko
               17.00  2.4% native_write_msr_safe    [kernel.kallsyms]
               17.00  2.4% x86_decode_insn
/lib/modules/2.6.35-23-server/kernel/arch/x86/kvm/kvm.ko
               13.00  1.8% rtl8169_interrupt
/lib/modules/2.6.35-23-server/kernel/drivers/net/r8169.ko
               13.00  1.8% native_read_msr_safe     [kernel.kallsyms]
               11.00  1.5% paging64_walk_addr
/lib/modules/2.6.35-23-server/kernel/arch/x86/kvm/kvm.ko
               10.00  1.4% vmcs_writel
/lib/modules/2.6.35-23-server/kernel/arch/x86/kvm/kvm-intel.ko
               10.00  1.4% PyEval_EvalFrameEx       /usr/bin/python2.6
               10.00  1.4% update_curr              [kernel.kallsyms]
               10.00  1.4% x86_emulate_insn
/lib/modules/2.6.35-23-server/kernel/arch/x86/kvm/kvm.ko
                9.00  1.2% vmx_get_cpl
/lib/modules/2.6.35-23-server/kernel/arch/x86/kvm/kvm-intel.ko
                9.00  1.2% __srcu_read_lock         [kernel.kallsyms]
                9.00  1.2% emulate_instruction
/lib/modules/2.6.35-23-server/kernel/arch/x86/kvm/kvm.ko
                9.00  1.2% __memcpy                 [kernel.kallsyms]
                9.00  1.2% handle_exception
/lib/modules/2.6.35-23-server/kernel/arch/x86/kvm/kvm-intel.ko
                8.00  1.1% vmx_complete_interrupts
/lib/modules/2.6.35-23-server/kernel/arch/x86/kvm/kvm-intel.ko
                8.00  1.1% native_read_tsc          [kernel.kallsyms]
                8.00  1.1% __vcpu_run
/lib/modules/2.6.35-23-server/kernel/arch/x86/kvm/kvm.ko
                7.00  1.0% fput                     [kernel.kallsyms]
                7.00  1.0% __switch_to              [kernel.kallsyms]
                6.00  0.8% vmx_cache_reg
/lib/modules/2.6.35-23-server/kernel/arch/x86/kvm/kvm-intel.ko
                6.00  0.8% _raw_spin_lock_irqsave   [kernel.kallsyms]


'perf top' on guest:

------------------------------------------------------------------------------
   PerfTop:   13547 irqs/sec  kernel:99.4% [100000 cpu-clock-msecs],
(all, 4 CPUs)
------------------------------------------------------------------------------

             samples    pcnt   kernel function
             _______   _____   _______________

           119287.00 - 94.8% : _spin_lock
             1108.00 -  0.9% : do_page_fault
              599.00 -  0.5% : _spin_unlock_irqrestore
              512.00 -  0.4% : finish_task_switch
              432.00 -  0.3% : clear_page_c
              361.00 -  0.3% : __do_softirq
              214.00 -  0.2% : native_flush_tlb
              158.00 -  0.1% : native_set_pte_at
              147.00 -  0.1% : flush_tlb_page
              141.00 -  0.1% : retint_careful
              101.00 -  0.1% : kmem_cache_alloc
               95.00 -  0.1% : get_page_from_freelist
               94.00 -  0.1% : unmap_vmas
               82.00 -  0.1% : native_set_pmd
               74.00 -  0.1% : virtnet_poll	[virtio_net]
WARNING: failed to keep up with mmap data.  Last read 810 msecs ago.
WARNING: failed to keep up with mmap data.  Last read 0 msecs ago.
WARNING: failed to keep up with mmap data.  Last read 0 msecs ago.
WARNING: failed to keep up with mmap data.  Last read 0 msecs ago.


'perf top' on the guest under normal operation (nohz, clocksource=kvm_clock):

------------------------------------------------------------------------------
   PerfTop:   39501 irqs/sec  kernel:99.8% [100000 cpu-clock-msecs],
(all, 4 CPUs)
------------------------------------------------------------------------------

             samples    pcnt   kernel function
             _______   _____   _______________

           256987.00 - 98.7% : native_safe_halt
              377.00 -  0.1% : __do_softirq
              332.00 -  0.1% : finish_task_switch
              257.00 -  0.1% : do_page_fault
              201.00 -  0.1% : _spin_unlock_irqrestore
              121.00 -  0.0% : tick_nohz_stop_sched_tick
              113.00 -  0.0% : pvclock_clocksource_read
              106.00 -  0.0% : flush_tlb_page
               99.00 -  0.0% : tick_nohz_restart_sched_tick
               59.00 -  0.0% : system_call_after_swapgs
               55.00 -  0.0% : fget_light
               52.00 -  0.0% : kmem_cache_alloc
               52.00 -  0.0% : do_sys_poll
               50.00 -  0.0% : perf_poll
               47.00 -  0.0% : native_flush_tlb

'perf top' on the guest under normal operation (with
clocksource=acpi_pm nohz=off highres=off):

------------------------------------------------------------------------------
   PerfTop:     949 irqs/sec  kernel:89.3% [100000 cpu-clock-msecs],
(all, 4 CPUs)
------------------------------------------------------------------------------

             samples    pcnt   kernel function
             _______   _____   _______________

            10196.00 - 92.3% : native_safe_halt
              145.00 -  1.3% : clear_page_c
              139.00 -  1.3% : do_page_fault
              100.00 -  0.9% : acpi_pm_read
               52.00 -  0.5% : flush_tlb_page
               49.00 -  0.4% : finish_task_switch
               32.00 -  0.3% : native_flush_tlb
               27.00 -  0.2% : __do_softirq
               24.00 -  0.2% : _spin_unlock_irqrestore
               22.00 -  0.2% : native_set_pmd
               20.00 -  0.2% : native_set_pte_at
               19.00 -  0.2% : _spin_lock
               18.00 -  0.2% : generic_unplug_device
               17.00 -  0.2% : tick_nohz_stop_sched_tick
               13.00 -  0.1% : unmap_vmas


'perf top' on the guest when slowdown is happening (with
clocksource=acpi_pm nohz=off highres=off):

------------------------------------------------------------------------------
   PerfTop:     966 irqs/sec  kernel:94.0% [100000 cpu-clock-msecs],
(all, 4 CPUs)
------------------------------------------------------------------------------

             samples    pcnt   kernel function
             _______   _____   _______________

            10799.00 - 91.2% : native_safe_halt
              514.00 -  4.3% : _spin_lock
              115.00 -  1.0% : clear_page_c
               87.00 -  0.7% : do_page_fault
               68.00 -  0.6% : __do_softirq
               49.00 -  0.4% : acpi_pm_read
               44.00 -  0.4% : _spin_unlock_irqrestore
               39.00 -  0.3% : finish_task_switch
               23.00 -  0.2% : flush_tlb_page
               21.00 -  0.2% : native_flush_tlb
                8.00 -  0.1% : native_set_pmd
                8.00 -  0.1% : native_set_pte_at
                7.00 -  0.1% : tick_nohz_restart_sched_tick
                6.00 -  0.1% : tick_nohz_stop_sched_tick



Thanks,
Dmitry

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: KVM with hugepages generate huge load with two guests
  2010-12-01  3:38               ` Dmitry Golubev
@ 2010-12-14  7:26                 ` Dmitry Golubev
  0 siblings, 0 replies; 18+ messages in thread
From: Dmitry Golubev @ 2010-12-14  7:26 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, Marcelo Tosatti

Hi,

So, nobody has any idea what's going wrong with all these massive IRQs
and spin_locks that cause virtual machines to almost completely stop?
:(

Thanks,
Dmitry

On Wed, Dec 1, 2010 at 5:38 AM, Dmitry Golubev <lastguru@gmail.com> wrote:
> Hi,
>
> Sorry it took so slow to reply you - there are only few moments when I
> can poke a production server and I need to notify people in advance
> about that :(
>
>> Can you post kvm_stat output while slowness is happening? 'perf top' on the host?  and on the guest?
>
> I took 'perf top' and first thing I saw is that while guest is on
> acpi_pm, it shows more or less normal amount of IRQs (under 1000/s),
> however when I switched back to the default (which is nohz with
> kvm_clock), there are 40 times (!!!) more IRQs under normal operation
> (about 40 000/s). When the slowdown is happening, there are a lot of
> _spin_lock events and a lot of messages like: "WARNING: failed to keep
> up with mmap data.  Last read 810 msecs ago."
>
> As I told before, switching to acpi_pm does not save the day, but
> makes situation a lot more workable (i.e., servers recover faster from
> the period of slowness). During slowdowns on acpi_pm I also see
> "_spin_lock"
>
> Raw data follows:
>
>
>
> vmstat -5 on the host:
>
> procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
>  0  0      0 131904  13952 205872    0    0     0    24 2495 9813  6  3 91  0
>  0  0      0 132984  13952 205872    0    0     0    47 2596 9851  5  3 91  1
>  1  0      0 132148  13952 205872    0    0     0    54 2644 10559  3  3 93  1
>  0  1      0 129084  13952 205872    0    0     0    38 3039 9752  7  3 87  2
>  6  0      0 126388  13952 205872    0    0     0   311 15619 9009 42 17 39  2
>  9  0      0 125868  13960 205872    0    0     6    86 4659 6504 98  2  0  0
>  8  0      0 123320  13960 205872    0    0     0    26 4682 6649 98  2  0  0
>  8  0      0 126252  13960 205872    0    0     0   124 4923 6776 98  2  0  0
>  8  0      0 125376  13960 205872    0    0   136    11 4287 5865 98  2  0  0
>  9  0      0 123812  13960 205872    0    0   205    51 4497 6134 98  2  0  0
>  8  0      0 126020  13960 205872    0    0   904    26 4483 5999 98  2  0  0
>  8  0      0 124052  13960 205872    0    0    15    10 4397 6200 98  2  0  0
>  8  0      0 125928  13960 205872    0    0    14    41 4335 5823 98  2  0  0
>  8  0      0 126184  13960 205872    0    0     6    14 4966 6588 98  2  0  0
>  8  0      0 123588  13960 205872    0    0   143    18 5234 6891 98  2  0  0
>  8  0      0 126640  13960 205872    0    0     6    91 5554 7334 98  2  0  0
>  8  0      0 123144  13960 205872    0    0   146    11 5235 7145 98  2  0  0
>  8  0      0 125856  13968 205872    0    0  1282    98 5481 7159 98  2  0  0
>  9 19      0 124124  13968 205872    0    0   782  2433 8587 8987 97  3  0  0
>  8  0      0 122584  13968 205872    0    0   432    90 5359 6960 98  2  0  0
>  8  0      0 125320  13968 205872    0    0  3074    52 5448 7095 97  3  0  0
>  8  0      0 121436  13968 205872    0    0  2519    81 5714 7279 98  2  0  0
>  8  0      0 124436  13968 205872    0    0     1    56 5242 6864 98  2  0  0
>  8  0      0 111324  13968 205872    0    0     2    22 10660 6686 97  3  0  0
>  8  0      0 107824  13968 205872    0    0     0    24 14329 8147 97  3  0  0
>  8  0      0 110420  13968 205872    0    0     0    68 13486 6985 98  2  0  0
>  8  0      0 110024  13968 205872    0    0     0    19 13085 6659 98  2  0  0
>  8  0      0 109932  13968 205872    0    0     0     3 12952 6415 98  2  0  0
>  8  0      0 108552  13968 205880    0    0     2    41 13400 7349 98  2  0  0
>
> Few shots with kvm_stat on the host:
>
> Every 2.0s: kvm_stat -1
>
>  Wed Dec  1 04:45:47 2010
>
> efer_reload                    0         0
> exits                   56264102     14074
> fpu_reload                311506        50
> halt_exits               4733166       935
> halt_wakeup              3845079       840
> host_state_reload        8795964      4085
> hypercalls                     0         0
> insn_emulation          13573212      7249
> insn_emulation_fail            0         0
> invlpg                   1846050        20
> io_exits                 3579406       843
> irq_exits                3038887      4879
> irq_injections           5242157      3681
> irq_window                124361       540
> largepages                  2253         0
> mmio_exits                 64274        20
> mmu_cache_miss            664011        16
> mmu_flooded               164506         1
> mmu_pde_zapped            212686         8
> mmu_pte_updated           729268         0
> mmu_pte_write           81323616       551
> mmu_recycled                 277         0
> mmu_shadow_zapped         652691        23
> mmu_unsync                  5630         8
> nmi_injections                 0         0
> nmi_window                     0         0
> pf_fixed                17470658       218
> pf_guest                13352205        81
> remote_tlb_flush         1898930        96
> request_irq                    0         0
> signal_exits                   0         0
> tlb_flush                5827433       108
>
> Every 2.0s: kvm_stat -1
>
>  Wed Dec  1 04:47:33 2010
>
> efer_reload                    0         0
> exits                   58155746     18954
> fpu_reload                318003        61
> halt_exits               4839340      1082
> halt_wakeup              3940964       984
> host_state_reload        9267420      4803
> hypercalls                     0         0
> insn_emulation          14376685      7721
> insn_emulation_fail            0         0
> invlpg                   1855758        13
> io_exits                 3676471       993
> irq_exits                3609310      5363
> irq_injections           5648007      3922
> irq_window                181397       517
> largepages                  2253         0
> mmio_exits                 65862        14
> mmu_cache_miss            666017         2
> mmu_flooded               164784         0
> mmu_pde_zapped            213208         1
> mmu_pte_updated           731301         1
> mmu_pte_write           81455666        14
> mmu_recycled                 277         0
> mmu_shadow_zapped         653910         1
> mmu_unsync                  5461        -2
> nmi_injections                 0         0
> nmi_window                     0         0
> pf_fixed                17530360       154
> pf_guest                13388143        89
> remote_tlb_flush         1915787        39
> request_irq                    0         0
> signal_exits                   0         0
> tlb_flush                5857307        68
>
> Every 2.0s: kvm_stat -1
>
>  Wed Dec  1 04:47:46 2010
>
> efer_reload                    0         0
> exits                   58382002     14542
> fpu_reload                318544        32
> halt_exits               4850647       541
> halt_wakeup              3951056       441
> host_state_reload        9317479      2669
> hypercalls                     0         0
> insn_emulation          14464287      5075
> insn_emulation_fail            0         0
> invlpg                   1856481        34
> io_exits                 3686632       456
> irq_exits                3670192      3742
> irq_injections           5692201      2471
> irq_window                186987       126
> largepages                  2253         0
> mmio_exits                 65981         6
> mmu_cache_miss            666184        14
> mmu_flooded               164819         1
> mmu_pde_zapped            213264         5
> mmu_pte_updated           731432         2
> mmu_pte_write           81473978       563
> mmu_recycled                 277         0
> mmu_shadow_zapped         654130        10
> mmu_unsync                  5410         4
> nmi_injections                 0         0
> nmi_window                     0         0
> pf_fixed                17536667       653
> pf_guest                13391345       300
> remote_tlb_flush         1917634       120
> request_irq                    0         0
> signal_exits                   0         0
> tlb_flush                5860221       205
>
>
> 'perf top' on the host:
>
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------
>   PerfTop:    3894 irqs/sec  kernel: 2.4%  exact:  0.0% [1000Hz
> cycles],  (all, 4 CPUs)
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>             samples  pcnt function                 DSO
>             _______ _____ ________________________
> ______________________________________________________________
>
>              109.00 15.1% vmx_vcpu_run
> /lib/modules/2.6.35-23-server/kernel/arch/x86/kvm/kvm-intel.ko
>               64.00  8.9% copy_user_generic_string [kernel.kallsyms]
>               61.00  8.4% __ticket_spin_lock       [kernel.kallsyms]
>               47.00  6.5% vcpu_enter_guest
> /lib/modules/2.6.35-23-server/kernel/arch/x86/kvm/kvm.ko
>               23.00  3.2% schedule                 [kernel.kallsyms]
>               18.00  2.5% kvm_read_guest
> /lib/modules/2.6.35-23-server/kernel/arch/x86/kvm/kvm.ko
>               18.00  2.5% gfn_to_hva
> /lib/modules/2.6.35-23-server/kernel/arch/x86/kvm/kvm.ko
>               17.00  2.4% native_write_msr_safe    [kernel.kallsyms]
>               17.00  2.4% x86_decode_insn
> /lib/modules/2.6.35-23-server/kernel/arch/x86/kvm/kvm.ko
>               13.00  1.8% rtl8169_interrupt
> /lib/modules/2.6.35-23-server/kernel/drivers/net/r8169.ko
>               13.00  1.8% native_read_msr_safe     [kernel.kallsyms]
>               11.00  1.5% paging64_walk_addr
> /lib/modules/2.6.35-23-server/kernel/arch/x86/kvm/kvm.ko
>               10.00  1.4% vmcs_writel
> /lib/modules/2.6.35-23-server/kernel/arch/x86/kvm/kvm-intel.ko
>               10.00  1.4% PyEval_EvalFrameEx       /usr/bin/python2.6
>               10.00  1.4% update_curr              [kernel.kallsyms]
>               10.00  1.4% x86_emulate_insn
> /lib/modules/2.6.35-23-server/kernel/arch/x86/kvm/kvm.ko
>                9.00  1.2% vmx_get_cpl
> /lib/modules/2.6.35-23-server/kernel/arch/x86/kvm/kvm-intel.ko
>                9.00  1.2% __srcu_read_lock         [kernel.kallsyms]
>                9.00  1.2% emulate_instruction
> /lib/modules/2.6.35-23-server/kernel/arch/x86/kvm/kvm.ko
>                9.00  1.2% __memcpy                 [kernel.kallsyms]
>                9.00  1.2% handle_exception
> /lib/modules/2.6.35-23-server/kernel/arch/x86/kvm/kvm-intel.ko
>                8.00  1.1% vmx_complete_interrupts
> /lib/modules/2.6.35-23-server/kernel/arch/x86/kvm/kvm-intel.ko
>                8.00  1.1% native_read_tsc          [kernel.kallsyms]
>                8.00  1.1% __vcpu_run
> /lib/modules/2.6.35-23-server/kernel/arch/x86/kvm/kvm.ko
>                7.00  1.0% fput                     [kernel.kallsyms]
>                7.00  1.0% __switch_to              [kernel.kallsyms]
>                6.00  0.8% vmx_cache_reg
> /lib/modules/2.6.35-23-server/kernel/arch/x86/kvm/kvm-intel.ko
>                6.00  0.8% _raw_spin_lock_irqsave   [kernel.kallsyms]
>
>
> 'perf top' on guest:
>
> ------------------------------------------------------------------------------
>   PerfTop:   13547 irqs/sec  kernel:99.4% [100000 cpu-clock-msecs],
> (all, 4 CPUs)
> ------------------------------------------------------------------------------
>
>             samples    pcnt   kernel function
>             _______   _____   _______________
>
>           119287.00 - 94.8% : _spin_lock
>             1108.00 -  0.9% : do_page_fault
>              599.00 -  0.5% : _spin_unlock_irqrestore
>              512.00 -  0.4% : finish_task_switch
>              432.00 -  0.3% : clear_page_c
>              361.00 -  0.3% : __do_softirq
>              214.00 -  0.2% : native_flush_tlb
>              158.00 -  0.1% : native_set_pte_at
>              147.00 -  0.1% : flush_tlb_page
>              141.00 -  0.1% : retint_careful
>              101.00 -  0.1% : kmem_cache_alloc
>               95.00 -  0.1% : get_page_from_freelist
>               94.00 -  0.1% : unmap_vmas
>               82.00 -  0.1% : native_set_pmd
>               74.00 -  0.1% : virtnet_poll     [virtio_net]
> WARNING: failed to keep up with mmap data.  Last read 810 msecs ago.
> WARNING: failed to keep up with mmap data.  Last read 0 msecs ago.
> WARNING: failed to keep up with mmap data.  Last read 0 msecs ago.
> WARNING: failed to keep up with mmap data.  Last read 0 msecs ago.
>
>
> 'perf top' on the guest under normal operation (nohz, clocksource=kvm_clock):
>
> ------------------------------------------------------------------------------
>   PerfTop:   39501 irqs/sec  kernel:99.8% [100000 cpu-clock-msecs],
> (all, 4 CPUs)
> ------------------------------------------------------------------------------
>
>             samples    pcnt   kernel function
>             _______   _____   _______________
>
>           256987.00 - 98.7% : native_safe_halt
>              377.00 -  0.1% : __do_softirq
>              332.00 -  0.1% : finish_task_switch
>              257.00 -  0.1% : do_page_fault
>              201.00 -  0.1% : _spin_unlock_irqrestore
>              121.00 -  0.0% : tick_nohz_stop_sched_tick
>              113.00 -  0.0% : pvclock_clocksource_read
>              106.00 -  0.0% : flush_tlb_page
>               99.00 -  0.0% : tick_nohz_restart_sched_tick
>               59.00 -  0.0% : system_call_after_swapgs
>               55.00 -  0.0% : fget_light
>               52.00 -  0.0% : kmem_cache_alloc
>               52.00 -  0.0% : do_sys_poll
>               50.00 -  0.0% : perf_poll
>               47.00 -  0.0% : native_flush_tlb
>
> 'perf top' on the guest under normal operation (with
> clocksource=acpi_pm nohz=off highres=off):
>
> ------------------------------------------------------------------------------
>   PerfTop:     949 irqs/sec  kernel:89.3% [100000 cpu-clock-msecs],
> (all, 4 CPUs)
> ------------------------------------------------------------------------------
>
>             samples    pcnt   kernel function
>             _______   _____   _______________
>
>            10196.00 - 92.3% : native_safe_halt
>              145.00 -  1.3% : clear_page_c
>              139.00 -  1.3% : do_page_fault
>              100.00 -  0.9% : acpi_pm_read
>               52.00 -  0.5% : flush_tlb_page
>               49.00 -  0.4% : finish_task_switch
>               32.00 -  0.3% : native_flush_tlb
>               27.00 -  0.2% : __do_softirq
>               24.00 -  0.2% : _spin_unlock_irqrestore
>               22.00 -  0.2% : native_set_pmd
>               20.00 -  0.2% : native_set_pte_at
>               19.00 -  0.2% : _spin_lock
>               18.00 -  0.2% : generic_unplug_device
>               17.00 -  0.2% : tick_nohz_stop_sched_tick
>               13.00 -  0.1% : unmap_vmas
>
>
> 'perf top' on the guest when slowdown is happening (with
> clocksource=acpi_pm nohz=off highres=off):
>
> ------------------------------------------------------------------------------
>   PerfTop:     966 irqs/sec  kernel:94.0% [100000 cpu-clock-msecs],
> (all, 4 CPUs)
> ------------------------------------------------------------------------------
>
>             samples    pcnt   kernel function
>             _______   _____   _______________
>
>            10799.00 - 91.2% : native_safe_halt
>              514.00 -  4.3% : _spin_lock
>              115.00 -  1.0% : clear_page_c
>               87.00 -  0.7% : do_page_fault
>               68.00 -  0.6% : __do_softirq
>               49.00 -  0.4% : acpi_pm_read
>               44.00 -  0.4% : _spin_unlock_irqrestore
>               39.00 -  0.3% : finish_task_switch
>               23.00 -  0.2% : flush_tlb_page
>               21.00 -  0.2% : native_flush_tlb
>                8.00 -  0.1% : native_set_pmd
>                8.00 -  0.1% : native_set_pte_at
>                7.00 -  0.1% : tick_nohz_restart_sched_tick
>                6.00 -  0.1% : tick_nohz_stop_sched_tick
>
>
>
> Thanks,
> Dmitry
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: KVM with hugepages generate huge load with two guests
  2010-09-30  9:07 KVM with hugepages generate huge load with two guests Dmitry Golubev
  2010-10-01 22:30 ` Marcelo Tosatti
@ 2010-10-03  9:28 ` Avi Kivity
  2010-10-03 20:24   ` Dmitry Golubev
  1 sibling, 1 reply; 18+ messages in thread
From: Avi Kivity @ 2010-10-03  9:28 UTC (permalink / raw)
  To: Dmitry Golubev; +Cc: kvm

  On 09/30/2010 11:07 AM, Dmitry Golubev wrote:
> Hi,
>
> I am not sure what's really happening, but every few hours
> (unpredictable) two virtual machines (Linux 2.6.32) start to generate
> huge cpu loads. It looks like some kind of loop is unable to complete
> or something...
>

What does 'top' inside the guest show when this is happening?

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: KVM with hugepages generate huge load with two guests
  2010-10-03  9:28 ` Avi Kivity
@ 2010-10-03 20:24   ` Dmitry Golubev
  2010-10-04  7:39     ` Avi Kivity
  0 siblings, 1 reply; 18+ messages in thread
From: Dmitry Golubev @ 2010-10-03 20:24 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm

So, I started anew. I decreased the memory allocated to each guest to
3500MB (from 3500MB as I told earlier), but have not decreased number
of hugepages - it is still 3696.

On one host I started one guest. it looked like this:

HugePages_Total:    3696
HugePages_Free:     1933
HugePages_Rsvd:       19
HugePages_Surp:        0
Hugepagesize:       2048 kB

top - 22:05:53 up 2 days,  3:44,  1 user,  load average: 0.29, 0.33, 0.29
Tasks: 131 total,   1 running, 130 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.9%us,  4.6%sy,  0.0%ni, 90.8%id,  1.0%wa,  0.0%hi,  2.7%si,  0.0%st
Mem:   8193472k total,  8118248k used,    75224k free,    29036k buffers
Swap: 11716412k total,        0k used, 11716412k free,    75864k cached

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 0  0      0  74668  29036  75864    0    0     1     8   54   51  1  7 91  1

Now I am starting the second virtual machine, and that's what happens:

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 0  0      0  74272  29216  76664    0    0     0     0  447  961  0  0 100  0
 0  0      0  73172  29216  77464    0    0   192    16  899 1575  1  1 96  2
 0  0      0  72528  29224  77464    0    0     0    14  475 1022  1  0 99  0
 0  0      0  72720  29232  77456    0    0     0    49  519  999  0  0 97  3
 1  0     52  77988  28776  40492    0   10  1191    17  988 2285  8  9 72 11
 4  0     52  68868  28784  40492    0    0  2854    38 7452 2817 17 16 67  1
 2  0     52  66052  28784  40984    0    0  1906    18 24057 4620 25 20 48  7
 1  0     52  67044  28792  40984    0    0  1630    35 3175 3966  9 12 72  7
 0  0     52  63684  28800  40980    0    0  1433   228 6021 4479 10 11 65 14
 0  1     52  65516  28800  40984    0    0  1288   109 4143 4179 10 10 58 21
 2  2     52  62216  28808  40984    0    0  1698   241 4357 4183  9  8 58 25
 2  2     52  60292  28816  40984    0    0  2874   258 11538 5324 15 14 39 33
 2  2     52  57352  28816  40984    0    0  5303   278 8528 5176  9 11 39 42
 0  7     52  54000  28824  40980    0    0  5263   249 10580 6214 16 10 32 42
 0  4    396  55180  19740  40188    0   70 10304   315 7359 9633 19  8 44 28
 1  0    320  61520  19748  40480    0    0  5361   302 2509 5743 23  2 50 25
 1  5    316  59940  19748  40728    0    0  2343     8 2225 4690 13  3 75 10
 3  1    316  55616  19748  40728    0    0  4435   215 7660 6057 15  6 51 28
 0 16   2528  53596  17392  38468    0  529   832   834 6600 4675  8  5 11 76
 3  0   2404  56176  17392  38480    1    0  6530   301 8371 5646 20  7 14 59
 2  5   7480  58012  14836  33720   13 1082  3666  3155 12290 7752 17 10 20 54
 2  1   7340  59628  14836  33884    0    0  5550   690 9513 7258 13  9 38 41
 2  1   7288  59124  14844  34472    0    0  1524   481 4597 4688  5  6 58 31
 0  3   7284  58848  14844  34472    0    0  1365   364 2171 3813  3  2 58 38
 0  1   7056  59324  14844  34472    7    0   841   372 2159 3940  3  2 48 47
 0 30   7056  54456  14844  34472    0    0     2   248 1402 2705  2  1 85 13
 0  1   6892  55336  14828  38396    1    0   888   268 1927 4124  2  2 41 55
 0  0   6892  57808  14060  36988    0    0    17    92  948 1682  1  1 93  5
 0  0   6888  58616  14060  37696    0    0   140    43  747 1566  1  1 94  5
 1  0   6884  59444  14060  37696    0    0     7    14  942 1747  3  1 95  1
 1  0   6884  58820  14060  37696    0    0     0    46  722 1480  1  1 97  2
 0  0   6884  58608  14060  37696    0    0     0    41  858 1564  3  1 93  3
 3  8   6884  51752  14060  37792    0    0   354   147 8243 2447 20  7 71  2
 2  0   6880  52840  14060  37792    0    0   604   281 10430 5859 21 15 50 14
 0  0   6880  55176  14060  37792    0    0   699   232 3271 3656 20  4 66 10
 0  0   6880  56120  14060  37792    0    0     0   280 1064 2116  1  1 85 14
 0  0   6880  55628  14060  37792    0    0     0     0  616 1367  1  0 98  0
 1  0   6880  56388  14060  37792    0    0     0    18  689 1381  1  1 97  2

Unlike I have expected given that in the previous case I had only 6
unreserved pages, and I thought I would have 56 now, I have 156 free
unreserved pages:

HugePages_Total:    3696
HugePages_Free:     1113
HugePages_Rsvd:      957
HugePages_Surp:        0

Then at one moment both guests almost stopped working for a minute or
so - both went up to huge load and became unresponsive. I didn't get
to catch hot they looked in 'top', but they did not use any swap
themselves (they have at least 1GB of free memory each) and their load
average went to something like 10. vmstat from the host looked like
this:

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 0  0   6740  61948  11140  34104    0    0     0     3  663 1435  1  0 99  0
 0  3   6740  62796  11140  34104    0    0     0    10  992 1365  1  1 97  1
 0  0   6740  62788  11140  34104    0    0     0    35  854 1647  3  1 95  0
 7  0   6740  57664  11140  34104    0    0     0   238 7011 1918 34  6 57  3
 9  0   6740  58212  11140  34104    0    0     0    66 3107 2065 99  1  0  0
 8  0   6740  58120  11140  34104    0    0     0    61 2509 1770 100  0  0  0
 8  0   6740  57624  11140  34104    0    0     0    69 2610 1955 100  0  0  0
 7  0   6740  57996  11140  34104    0    0     0    10 2385 2035 100  0  0  0
 7  0   6740  58348  11140  34104    0    0     0    26 2580 2296 99  1  0  0
 7  0   6740  58348  11140  34104    0    0     0    10 2477 2132 100  0  0  0
 8  0   6740  58580  11148  34104    0    0     0   107 2687 2221 99  1  0  0
 8  0   6740  58364  11148  34104    0    0     0    20 2461 1706 100  0  0  0
 8  0   6740  58504  11148  34104    0    0     0     3 2301 1678 100  0  0  0
 8  0   6740  58464  11148  34104    0    0     0    32 2355 1709 100  0  0  0
 8  0   6740  58704  11148  34104    0    0     0    24 2380 1749 100  0  0  0
 8  0   6740  58820  11148  34104    0    0     0    26 2340 1630 100  0  0  0
 8  0   6740  59208  11148  34104    0    0     0    41 2360 1711 99  1  0  0
 8  0   6740  58952  11148  34104    0    0     0    22 2387 1648 99  1  0  0
 8  0   6740  59184  11148  34104    0    0     0     0 2315 1690 100  1  0  0
 8  0   6740  59332  11148  34104    0    0     0     0 2383 1637 100  0  0  0
 8  0   6740  59084  11148  34104    0    0     0     0 2357 1692 100  0  0  0
 8  0   6740  58472  11148  34104    0    0     0    45 2497 1790 96  4  0  0
 8  0   6740  58828  11156  34104    0    0     1    56 2483 1788 89 11  0  0
 8  0   6740  58852  11156  34104    0    0     0    27 2385 1707 95  5  0  0
 8  0   6740  59240  11156  34104    0    0     0    20 2387 1735 97  3  0  0
 8  0   6740  59216  11156  34104    0    0     1    49 2392 1698 99  1  0  0
 8  0   6740  58936  11156  34104    0    0     0     0 2327 1678 99  1  0  0
 8  0   6740  59092  11156  34104    0    0     0    63 2490 1732 99  1  0  0
 9  0   6740  58688  11156  34104    0    0     0   106 2487 1826 99  1  0  0
 8  0   6740  58596  11156  34104    0    0     0    87 2509 1839 99  1  0  0
10  0   6740  58524  11156  34104    0    0     0    61 2550 1839 100  0  0  0
 8  0   6740  58820  11156  34104    0    0     0    66 2450 1796 99  1  0  0
 8  0   6740  56712  11156  34104    0    0     0    52 4595 1721 99  1  0  0
 2  0   7672  68052   6924  28256    2  190   568   306 14201 2760 70 11 15  4
 0  0   7668  69880   6932  28428    2    0   407   355 2387 3029 13  2 78  7
 0  1   7632  68056   6940  29836    0    0   283   221 1016 1801  1  1 90  9
 0  0   7632  68200   6940  29836    0    0     1   195 1165 1963  1  1 88 11
 0  3   7564  66728   6940  29904    0    0     0   100  885 1788  1  1 89  9
 0  0   7560  69176   6940  29840    0    0     0    19  647 1327  0  1 98  1

A few minutes later it happened again, but for longer time (I waited
for at least 5 minutes, and it didn't get any better). vmstat showed
this:

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 0  0   7528  67192   7080  31920    0    0     0     4  658 1401  1  0 99  0
 0  0   7528  67264   7080  31920    0    0     0    16  867 1555  4  1 96  0
 9  0   7528  62304   7080  31920    0    0     0   153 7415 1852 37  4 58  1
 8  4   7528  62132   7080  31920    0    0     0    78 2743 1821 100  0  0  0
 8  0   7528  61488   7080  31920    0    0     0    79 2854 1833 100  0  0  0
 9  0   7528  61448   7080  31920    0    0     0    48 2628 1698 100  0  0  0
 8  0   7528  61680   7080  31920    0    0     0    24 2643 1733 100  1  0  0
 8  0   7528  61572   7080  31920    0    0     0    26 2488 1661 100  0  0  0
 8  0   7528  61168   7080  31920    0    0     0    50 2493 1804 100  0  0  0
 8  0   7528  61432   7088  31920    0    0     0    48 2461 1669 100  0  0  0
 8  0   7528  61852   7088  31920    0    0     0    52 2529 1841 100  1  0  0
 8  0   7528  62132   7088  31920    0    0     0    39 2471 1642 100  1  0  0
 8  0   7528  61404   7088  31920    0    0     0    26 2468 1722 100  0  0  0
 8  0   7528  61256   7088  31920    0    0     0    27 2457 1644 100  0  0  0
 8  0   7528  61908   7088  31920    0    0     0    46 2477 1749 100  0  0  0
 8  0   7528  61000   7088  31920    0    0     0    46 2954 1675 99  1  0  0
 8  0   7528  60000   7088  31920    0    0     1    67 2834 1786 99  1  0  0
 8  0   7528  59504   7088  31920    0    0   253    26 2410 1618 100  0  0  0
 8  0   7528  59620   7088  31920    0    0    34    83 2631 1791 100  0  0  0
 8  0   7528  59728   7088  31920    0    0    29    58 2531 1646 100  0  0  0
 8  0   7528  59752   7088  31920    0    0     0    29 2517 1787 99  1  0  0
 8  0   7528  59620   7096  31920    0    0     3    52 2448 1648 99  1  0  0
 8  0   7528  58992   7096  31920    0    0     0    45 2536 1745 100  0  0  0
 9  1   7528  59024   7096  31920    0    0     1    38 2548 1635 99  1  0  0
 8  0   7528  58768   7096  31920    0    0     0    44 2496 1741 100  0  0  0
 8  0   7528  59388   7096  31920    0    0     0    18 2429 1617 100  0  0  0
 8  0   7528  58868   7096  31920    0    0     0    51 2600 1745 100  0  0  0
 8  0   7528  59156   7104  31920    0    0     0    47 2441 1682 100  0  0  0
 9  0   7528  57380   7104  31920    0    0     0    40 2709 1690 100  1  0  0
 8  0   7528  58056   7104  31920    0    0     1     8 3127 1629 100  0  0  0
 9  0   7528  57544   7104  31920    0    0     0    36 2615 1704 100  0  0  0
 8  0   7528  57196   7104  31920    0    0     0    26 2530 1710 100  0  0  0
 8  0   7528  59792   6836  28648    0    0     0    42 2613 1761 100  0  0  0
 8  0   7528  59156   6844  29036    0    0    78    64 2641 1757 100  0  0  0
 8  0   7528  59576   6844  29164    0    0    26    26 2462 1632 100  0  0  0
 8  0   7528  59716   6844  29164    0    0     0     8 2414 1706 100  0  0  0
 8  0   7528  59600   6844  29164    0    0     0    48 2505 1649 100  0  0  0
 8  0   7528  59600   6844  29164    0    0     0     9 2373 1648 99  1  0  0
 8  0   7528  59492   6844  29164    0    0     0    10 2387 1564 100  1  0  0
 8  0   7528  59624   6844  29164    0    0     0    40 2551 1691 100  0  0  0
 8  0   7528  59080   6844  29164    0    0     0    50 2733 1643 100  0  0  0
 8  0   7528  58956   6844  29164    0    0     0    29 2823 1652 100  0  0  0
 8  1   7528  58624   6844  29164    0    0     0    34 2478 1633 100  0  0  0
 8  0   7528  58716   6844  29164    0    0     0    39 2398 1688 99  1  0  0
 8  0   7528  57592   6844  30228    0    0   212    30 2373 1666 100  1  0  0
 8  0   7528  57468   6852  30228    0    0     0    51 2453 1695 100  0  0  0
 8  0   7528  58244   6852  30228    0    0     0    26 2756 1617 99  1  0  0
 8  0   7528  58244   6852  30228    0    0     0   112 3872 1952 99  1  0  0
 9  1   7528  58320   6852  30228    0    0     0    48 2718 1719 100  0  0  0
 8  0   7528  58204   6852  30228    0    0     0    17 2692 1697 100  0  0  0
 8  0   7528  59220   6852  30228    0    0     5    48 4666 1651 98  2  0  0
 9  0   7528  57716   6852  30228    0    0     0   101 5128 1874 98  2  0  0
 9  0   7528  55692   6860  30228    0    0     5   100 5875 1825 97  3  0  0
 9  1   7528  55668   6860  30228    0    0     0   156 3910 1960 99  1  0  0
 8  0   7528  55668   6860  30228    0    0     0    38 2578 1671 100  0  0  0
 8  0   7528  55600   6860  30228    0    0     2    81 2783 1888 100  0  0  0
 9  1   7528  59660   5188  28320    0    0     0    50 2601 1918 100  0  0  0
 8  0   7528  63280   5196  28328    0    0     2    63 4347 1855 99  1  0  0
 8  0   7528  62560   5196  28328    0    0     0   101 3383 1748 99  1  0  0
 9  0   7528  62132   5196  28328    0    0     1    50 2656 1724 100  0  0  0

One guest showed this:
top - 23:11:35 up 53 min,  1 user,  load average: 20.42, 14.61, 7.26
Tasks: 205 total,  40 running, 165 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.2%us, 99.8%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   3510920k total,   972876k used,  2538044k free,    30256k buffers
Swap:  4194296k total,        0k used,  4194296k free,   321288k cached

The other one:
top - 23:11:12 up 1 day,  9:19,  1 user,  load average: 19.38, 14.54, 7.40
Tasks: 219 total,  15 running, 204 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.2%us, 99.6%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.1%si,  0.0%st
Mem:   3510920k total,  1758688k used,  1752232k free,   298000k buffers
Swap:  4194296k total,        0k used,  4194296k free,   577068k cached

Thanks,
Dmitry

On Sun, Oct 3, 2010 at 12:28 PM, Avi Kivity <avi@redhat.com> wrote:
>  On 09/30/2010 11:07 AM, Dmitry Golubev wrote:
>>
>> Hi,
>>
>> I am not sure what's really happening, but every few hours
>> (unpredictable) two virtual machines (Linux 2.6.32) start to generate
>> huge cpu loads. It looks like some kind of loop is unable to complete
>> or something...
>>
>
> What does 'top' inside the guest show when this is happening?
>
> --
> error compiling committee.c: too many arguments to function
>
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: KVM with hugepages generate huge load with two guests
  2010-10-03 20:24   ` Dmitry Golubev
@ 2010-10-04  7:39     ` Avi Kivity
  2010-10-04  9:01       ` Dmitry Golubev
  0 siblings, 1 reply; 18+ messages in thread
From: Avi Kivity @ 2010-10-04  7:39 UTC (permalink / raw)
  To: Dmitry Golubev; +Cc: kvm

  On 10/03/2010 10:24 PM, Dmitry Golubev wrote:
> So, I started anew. I decreased the memory allocated to each guest to
> 3500MB (from 3500MB as I told earlier), but have not decreased number
> of hugepages - it is still 3696.
>

Please don't top post.

Please use 'top' to find out which processes are busy, the aggregate 
statistics don't help to find out what the problem is.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: KVM with hugepages generate huge load with two guests
  2010-10-04  7:39     ` Avi Kivity
@ 2010-10-04  9:01       ` Dmitry Golubev
  0 siblings, 0 replies; 18+ messages in thread
From: Dmitry Golubev @ 2010-10-04  9:01 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm

> Please don't top post.

Sorry

> Please use 'top' to find out which processes are busy, the aggregate
> statistics don't help to find out what the problem is.

The thing is - all more or less active processes become busy, like
httpd, etc - I can't identify any single process that generates all
the load. I see at least 10 different processes in the list that look
busy in each guest... From what I see, there is nothing out of the
ordinary in guest 'top', except that the whole guest becomes extremely
slow. But OK, I will try to repeat the problem few hours later and
send you the whole 'top' output if it is required.

Thanks,
Dmitry

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2010-12-14  7:27 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-09-30  9:07 KVM with hugepages generate huge load with two guests Dmitry Golubev
2010-10-01 22:30 ` Marcelo Tosatti
2010-10-01 23:50   ` Dmitry Golubev
2010-10-02  0:56     ` Dmitry Golubev
2010-10-02  8:03     ` Michael Tokarev
     [not found]   ` <AANLkTinJDLoWjiXwX1MOpuVf4RUuGE3qjrawS=d+5Swu@mail.gmail.com>
2010-11-17  2:19     ` Dmitry Golubev
2010-11-18  6:53       ` Dmitry Golubev
2010-11-21  0:24         ` Dmitry Golubev
2010-11-21  8:50           ` Michael Tokarev
2010-11-21 11:22             ` Dmitry Golubev
2010-11-21 11:28           ` Avi Kivity
2010-11-21 15:03             ` Dmitry Golubev
2010-12-01  3:38               ` Dmitry Golubev
2010-12-14  7:26                 ` Dmitry Golubev
2010-10-03  9:28 ` Avi Kivity
2010-10-03 20:24   ` Dmitry Golubev
2010-10-04  7:39     ` Avi Kivity
2010-10-04  9:01       ` Dmitry Golubev

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox