All of lore.kernel.org
 help / color / mirror / Atom feed
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: Jan Beulich <JBeulich@suse.com>
Cc: suravee.suthikulpanit@amd.com, qing.he@intel.com,
	eddie.dong@intel.com, dietmar.hahn@ts.fujitsu.com,
	xen-devel <xen-devel@lists.xen.org>,
	jun.nakajima@intel.com, jiongxi.li@intel.com,
	boris.ostrovsky@oracle.com
Subject: Re: vpmu=1 and running 'perf top' within a PVHVM guest eventually hangs dom0 and hypervisor has stuck vCPUS. Romley-EP (model=45, stepping=2)
Date: Wed, 13 Mar 2013 17:27:42 -0400	[thread overview]
Message-ID: <20130313212742.GB27445@phenom.dumpdata.com> (raw)
In-Reply-To: <514047DB02000078000C53E4@nat28.tlf.novell.com>

On Wed, Mar 13, 2013 at 08:33:15AM +0000, Jan Beulich wrote:
> >>> On 12.03.13 at 18:30, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> > This issue I am encountering seems to only happen on multi-socket
> > machines.
> > 
> > It also does not help that the only multi-socket box I have is
> > an Romley-EP (so two socket SandyBridge CPUs). The other
> > SandyBridge boxes I've (one socket) are not showing this. Granted
> > they are also a different model (42).
> > 
> > The problem is that when I run 'perf top' within an SMP PVHVM
> > guest, after a couple of seconds or minutes the guest hangs.
> > Hypervisor ends up stuck too looping, and then the dom0 ends
> > up hanging as well.
> > 
> > Dumping the cpu registers (Ctrl-A x3, then 'd'
> > shows that the guest is pretty firmly stuck in vmx_vmexit_handler:
> > 
> > (XEN)    [<ffff82c4c01d386f>] vmx_vmexit_handler+0x22f/0x174
> > 
> > and if I let this stay for some time, dom0 detects that some
> > of its VCPUs are hanged and it resorts to sending NMI. NMI
> > is not implemented in pv-ops and then dom0 wedges. In some
> > cases it also wedges itself when doing 'xl list' or any up-calls
> > to the hypervisor.
> 
> Did you try running Xen with its watchdog (and perhaps Dom0
> without)?

Just now I ran it with 'watchdog=1' on the Xen hypervisor line and it
did not spot any issues with the guest. It naturally spotted an
issue with the vcpu_sleep_sync as it was hung/spinning and gave me a grave
stack-trace - which was the exactly same as what 'd' showed.

> 
> > Anyhow, following 'Ctrl-A x3, then 'v' tells me:
> > 
> > (XEN) Virtual processor ID = 0x0c02
> > .. snip..
> > (XEN) Virtual processor ID = 0x0fc4
> > (XEN)   VCPU 3
> > 
> > and stays stuck there. Doing the 'Ctrl-A x3' and 'd' to
> > see where it is stuck tells me:
> 
> Perhaps sending 'd' without first sending 'v' might better show where
> the original hang is?

Did that too (I think it was part of the serial output). If I did 'd'
it would tell me that the VCPUs for the guest were all in vmx_vmexit_handler
and also give me a stack dump of the guest. There were no 'vcpu_sleep_sync'
as well, the 'vmcs_dump' had never run.

I originally thought that this meant the vmx_vmexit_handler is somehow stuck
- but maybe that is the OK state - meaning when a guest is busily doing
VMEXIT/VMENTER continously that is what we would see on the hypervisor
stack?

Looking at the guest stack provided with 'd' made for some interesting
observation. It looks as if one vcpu is doing something in
__switch_context, and two others are in ticket_spin_lock! Then I realized
that in the past I did have to provide a PauseLoopExit value as the
default would never let me launch an RHEL5 HVM guest.

Adding 'ple_gap=0' in the Xen hypervisor line is now masking the issue
it seems (or perhaps fixing it?) But I doubt it is the fix as Boris saw this
exact similar issue when running an UP PVHVM guest - and dom0 was running
on a laptop.

  reply	other threads:[~2013-03-13 21:27 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-03-12 17:30 vpmu=1 and running 'perf top' within a PVHVM guest eventually hangs dom0 and hypervisor has stuck vCPUS. Romley-EP (model=45, stepping=2) Konrad Rzeszutek Wilk
2013-03-12 18:50 ` Boris Ostrovsky
2013-03-12 20:31   ` Konrad Rzeszutek Wilk
2013-03-12 20:54     ` Boris Ostrovsky
2013-03-13  7:51       ` Dietmar Hahn
2013-03-13  8:02         ` Dietmar Hahn
2013-03-13 15:04           ` Konrad Rzeszutek Wilk
2013-03-29  1:14         ` Haitao Shan
2013-03-13  8:26       ` Jan Beulich
2013-03-13 19:59       ` Boris Ostrovsky
2013-03-13  8:28   ` Jan Beulich
2013-03-13 14:12     ` Konrad Rzeszutek Wilk
2013-03-13  8:33 ` Jan Beulich
2013-03-13 21:27   ` Konrad Rzeszutek Wilk [this message]
  -- strict thread matches above, loose matches on Subject: below --
2013-03-29 12:29 Boris Ostrovsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130313212742.GB27445@phenom.dumpdata.com \
    --to=konrad.wilk@oracle.com \
    --cc=JBeulich@suse.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=dietmar.hahn@ts.fujitsu.com \
    --cc=eddie.dong@intel.com \
    --cc=jiongxi.li@intel.com \
    --cc=jun.nakajima@intel.com \
    --cc=qing.he@intel.com \
    --cc=suravee.suthikulpanit@amd.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.