From mboxrd@z Thu Jan 1 00:00:00 1970 From: Boris Ostrovsky Subject: Re: vpmu=1 and running 'perf top' within a PVHVM guest eventually hangs dom0 and hypervisor has stuck vCPUS. Romley-EP (model=45, stepping=2) Date: Tue, 12 Mar 2013 14:50:59 -0400 Message-ID: <513F7913.2090003@oracle.com> References: <20130312173055.GA11000@phenom.dumpdata.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20130312173055.GA11000@phenom.dumpdata.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Konrad Rzeszutek Wilk Cc: xen-devel@lists.xensource.com, jun.nakajima@intel.com, qing.he@intel.com, eddie.dong@intel.com, dietmar.hahn@ts.fujitsu.com, jbeulich@suse.com, suravee.suthikulpanit@amd.com, jiongxi.li@intel.com List-Id: xen-devel@lists.xenproject.org On 03/12/2013 01:30 PM, Konrad Rzeszutek Wilk wrote: > This issue I am encountering seems to only happen on multi-socket > machines. I believe I was able to reproduce this (once) on my laptop. > It also does not help that the only multi-socket box I have is > an Romley-EP (so two socket SandyBridge CPUs). The other > SandyBridge boxes I've (one socket) are not showing this. Granted > they are also a different model (42). > > The problem is that when I run 'perf top' within an SMP PVHVM > guest, after a couple of seconds or minutes the guest hangs. > Hypervisor ends up stuck too looping, and then the dom0 ends > up hanging as well. > > Dumping the cpu registers (Ctrl-A x3, then 'd' > shows that the guest is pretty firmly stuck in vmx_vmexit_handler: > > (XEN) [] vmx_vmexit_handler+0x22f/0x174 And in my case this address is the second instruction after STI, i.e. we are right at the point where interrupts got enabled. So I am wondering whether this has something to do with the counter overflow interrupt (which I believe is an NMI). -boris > > and if I let this stay for some time, dom0 detects that some > of its VCPUs are hanged and it resorts to sending NMI. NMI > is not implemented in pv-ops and then dom0 wedges. In some > cases it also wedges itself when doing 'xl list' or any up-calls > to the hypervisor. > > Anyhow, following 'Ctrl-A x3, then 'v' tells me: > > (XEN) Virtual processor ID = 0x0c02 > .. snip.. > (XEN) Virtual processor ID = 0x0fc4 > (XEN) VCPU 3 > > and stays stuck there. Doing the 'Ctrl-A x3' and 'd' to > see where it is stuck tells me: >