All of lore.kernel.org
 help / color / mirror / Atom feed
From: Boris Ostrovsky <boris.ostrovsky@oracle.com>
To: maillists.shan@gmail.com
Cc: xen-devel@lists.xensource.com, suravee.suthikulpanit@amd.com,
	qing.he@intel.com, konrad.wilk@oracle.com, eddie.dong@intel.com,
	donald.d.dugger@intel.com, xen-devel@lists.xen.org,
	jbeulich@suse.com, dietmar.hahn@ts.fujitsu.com,
	jun.nakajima@intel.com, jiongxi.li@intel.com
Subject: Re: vpmu=1 and running 'perf top' within a PVHVM guest eventually hangs dom0 and hypervisor has stuck vCPUS. Romley-EP (model=45, stepping=2)
Date: Fri, 29 Mar 2013 05:29:04 -0700 (PDT)	[thread overview]
Message-ID: <2e28076a-79cf-4b3c-b5da-e9517180e45e@default> (raw)


----- maillists.shan@gmail.com wrote:

> We also met the issue as fixed by Dietmar's workaround. I remember we
> two had some email discussion at that time.
> 
> The issue causing interrupt loop is:
> It seems that on NHM (at that time) when a PMI arrives at CPU, the
> counter has a value to zero (instead of some other small value, say 3
> or 5, seen on Core 2 Duo). In this case, unmasking the PMI via APIC
> will trigger immediately another PMI.
> This does not produce problem with native kernel, since it typically
> programs the counter with another value (as needed by making yet
> another sampling point) before unmasking.
> For Xen, PMI handler cannot handle the counter immediately since it
> should be handled by guests. It just records a virtual PMI to guests
> and unmasks the PMI before return.
> 
> We don't know whether this is a desired HW behavior. But we hope we
> can get confirm from internal HW team quickly.


I will note that this workaround appeared not to be needed on Haswell. I 
have run my tests there for fairly long period of time without any problems.

Of course, this doesn't *prove* that the workaround is not needed but
I'd usually trigger this hang withing 20-30 minutes at the most on other
processors. On Haswell I ran for 6 or 7 hours.

-boris



> 
> Shan Haitao
> 
> 2013/3/13 Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>:
> > Am Dienstag 12 März 2013, 16:54:11 schrieb Boris Ostrovsky:
> >> On 03/12/2013 04:31 PM, Konrad Rzeszutek Wilk wrote:
> >> > On Tue, Mar 12, 2013 at 02:50:59PM -0400, Boris Ostrovsky wrote:
> >> >> On 03/12/2013 01:30 PM, Konrad Rzeszutek Wilk wrote:
> >> >>> This issue I am encountering seems to only happen on
> multi-socket
> >> >>> machines.
> >> >> I believe I was able to reproduce this (once) on my laptop.
> >> >>
> >> >>> It also does not help that the only multi-socket box I have is
> >> >>> an Romley-EP (so two socket SandyBridge CPUs). The other
> >> >>> SandyBridge boxes I've (one socket) are not showing this.
> Granted
> >> >>> they are also a different model (42).
> >> >>>
> >> >>> The problem is that when I run 'perf top' within an SMP PVHVM
> >> >>> guest, after a couple of seconds or minutes the guest hangs.
> >> >>> Hypervisor ends up stuck too looping, and then the dom0 ends
> >> >>> up hanging as well.
> >> >>>
> >> >>> Dumping the cpu registers (Ctrl-A x3, then 'd'
> >> >>> shows that the guest is pretty firmly stuck in
> vmx_vmexit_handler:
> >> >>>
> >> >>> (XEN)    [<ffff82c4c01d386f>] vmx_vmexit_handler+0x22f/0x174
> >> >> And in my case this address is the second instruction after STI,
> i.e. we
> >> >> are right at the point where interrupts got enabled.
> >> >>
> >> >> So I am wondering whether this has something to do with the
> counter
> >> >> overflow interrupt (which I believe is an NMI).
> >> > Interestingly enough, if I run the PVHVM guest with 'nowatchdog'
> >> > it runs fine!
> >>
> >> I think by default perf top runs off timer interrupt so it does not
> use
> >> HW counters. But watchdog
> >> is implemented on top of the counters so perhaps it fires the
> interrupt
> >> at a bad time, messing
> >> something up.
> >
> > This looks like a strange behavior we had on nehalem cpus see
> > http://lists.xen.org/archives/html/xen-devel/2010-11/msg01157.html
> > For this I added a quirk, see check_pmc_quirk() in vpmu_core2.c
> > The model 42 is in the quirk list and it seems to work but Romley-EP
> is model
> > 43 I think which is not in the list.
> > Maybe you should add this model and give it a try.
> >
> >
> > Dietmar.
> >
> > --
> > Company details: http://ts.fujitsu.com/imprint.html
> >
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xen.org
> > http://lists.xen.org/xen-devel
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

             reply	other threads:[~2013-03-29 12:29 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-03-29 12:29 Boris Ostrovsky [this message]
  -- strict thread matches above, loose matches on Subject: below --
2013-03-12 17:30 vpmu=1 and running 'perf top' within a PVHVM guest eventually hangs dom0 and hypervisor has stuck vCPUS. Romley-EP (model=45, stepping=2) Konrad Rzeszutek Wilk
2013-03-12 18:50 ` Boris Ostrovsky
2013-03-12 20:31   ` Konrad Rzeszutek Wilk
2013-03-12 20:54     ` Boris Ostrovsky
2013-03-13  7:51       ` Dietmar Hahn
2013-03-13  8:02         ` Dietmar Hahn
2013-03-13 15:04           ` Konrad Rzeszutek Wilk
2013-03-29  1:14         ` Haitao Shan
2013-03-13  8:26       ` Jan Beulich
2013-03-13 19:59       ` Boris Ostrovsky
2013-03-13  8:28   ` Jan Beulich
2013-03-13 14:12     ` Konrad Rzeszutek Wilk
2013-03-13  8:33 ` Jan Beulich
2013-03-13 21:27   ` Konrad Rzeszutek Wilk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2e28076a-79cf-4b3c-b5da-e9517180e45e@default \
    --to=boris.ostrovsky@oracle.com \
    --cc=dietmar.hahn@ts.fujitsu.com \
    --cc=donald.d.dugger@intel.com \
    --cc=eddie.dong@intel.com \
    --cc=jbeulich@suse.com \
    --cc=jiongxi.li@intel.com \
    --cc=jun.nakajima@intel.com \
    --cc=konrad.wilk@oracle.com \
    --cc=maillists.shan@gmail.com \
    --cc=qing.he@intel.com \
    --cc=suravee.suthikulpanit@amd.com \
    --cc=xen-devel@lists.xen.org \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.