From mboxrd@z Thu Jan  1 00:00:00 1970
From: Marcelo Tosatti <mtosatti@redhat.com>
Subject: Re: KVM with hugepages generate huge load with two guests
Date: Fri, 1 Oct 2010 19:30:48 -0300
Message-ID: <20101001223048.GA31596@amt.cnet>
References: <AANLkTi=_X-n8RkLW5CkzLn7YwySqQkaOwbbZoZbo2UYj@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Cc: kvm@vger.kernel.org
To: Dmitry Golubev <lastguru@gmail.com>
Return-path: <kvm-owner@vger.kernel.org>
Received: from mx1.redhat.com ([209.132.183.28]:43162 "EHLO mx1.redhat.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1751268Ab0JAW4H (ORCPT <rfc822;kvm@vger.kernel.org>);
	Fri, 1 Oct 2010 18:56:07 -0400
Content-Disposition: inline
In-Reply-To: <AANLkTi=_X-n8RkLW5CkzLn7YwySqQkaOwbbZoZbo2UYj@mail.gmail.com>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

On Thu, Sep 30, 2010 at 12:07:15PM +0300, Dmitry Golubev wrote:
> Hi,
> 
> I am not sure what's really happening, but every few hours
> (unpredictable) two virtual machines (Linux 2.6.32) start to generate
> huge cpu loads. It looks like some kind of loop is unable to complete
> or something...
> 
> So the idea is:
> 
> 1. I have two linux 2.6.32 x64 (openvz, proxmox project) guests
> running on linux 2.6.35 x64 (ubuntu maverick) host with a Q6600
> Core2Quad on qemu-kvm 0.12.5 and libvirt 0.8.3 and another one small
> 32bit linux virtual machine (16MB of ram) with a router inside (i
> doubt it contributes to the problem).
> 
> 2. All these machines use hufetlbfs. The server has 8GB of RAM, I
> reserved 3696 huge pages (page size is 2MB) on the server, and I am
> running the main guests each having 3550MB of virtual memory. The
> third guest, as I wrote before, takes 16MB of virtual memory.
> 
> 3. Once run, the guests reserve huge pages for themselves normally. As
> mem-prealloc is default, they grab all the memory they should have,
> leaving 6 pages unreserved (HugePages_Free - HugePages_Rsvd = 6) all
> times - so as I understand they should not want to get any more,
> right?
> 
> 4. All virtual machines run perfectly normal without any disturbances
> for few hours. They do not, however, use all their memory, so maybe
> the issue arises when they pass some kind of a threshold.
> 
> 5. At some point of time both guests exhibit cpu load over the top
> (16-24). At the same time, host works perfectly well, showing load of
> 8 and that both kvm processes use CPU equally and fully. This point of
> time is unpredictable - it can be anything from one to twenty hours,
> but it will be less than a day. Sometimes the load disappears in a
> moment, but usually it stays like that, and everything works extremely
> slow (even a 'ps' command executes some 2-5 minutes).
> 
> 6. If I am patient, I can start rebooting the gueat systems - once
> they have restarted, everything returns to normal. If I destroy one of
> the guests (virsh destroy), the other one starts working normally at
> once (!).
> 
> I am relatively new to kvm and I am absolutely lost here. I have not
> experienced such problems before, but recently I upgraded from ubuntu
> lucid (I think it was linux 2.6.32, qemukvm 0.12.3 and libvirt 0.7.5)
> and started to use hugepages. These two virtual machines are not
> normally run on the same host system (i have a corosync/pacemaker
> cluster with drbd storage), but when one of the hosts is not
> abailable, they start running on the same host. That is the reason I
> have not noticed this earlier.
> 
> Unfortunately, I don't have any spare hardware to experiment and this
> is a production system, so my debugging options are rather limited.
> 
> Do you have any ideas, what could be wrong?

Is there swapping activity on the host when this happens?