public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Jack Steiner <steiner@sgi.com>
To: Don Zickus <dzickus@redhat.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>, Ingo Molnar <mingo@elte.hu>,
	tglx@linutronix.de, hpa@zytor.com, x86@kernel.org,
	linux-kernel@vger.kernel.org,
	Peter Zijlstra <a.p.zijlstra@chello.nl>
Subject: Re: [PATCH] x86, UV: Fix NMI handler for UV platforms
Date: Mon, 21 Mar 2011 15:37:46 -0500	[thread overview]
Message-ID: <20110321203746.GA17419@sgi.com> (raw)
In-Reply-To: <20110321193740.GN1239@redhat.com>

On Mon, Mar 21, 2011 at 03:37:40PM -0400, Don Zickus wrote:
> On Mon, Mar 21, 2011 at 01:22:35PM -0500, Jack Steiner wrote:
> > On Mon, Mar 21, 2011 at 01:51:10PM -0400, Don Zickus wrote:
> > > On Mon, Mar 21, 2011 at 07:26:51PM +0300, Cyrill Gorcunov wrote:
> > > > On 03/21/2011 07:14 PM, Ingo Molnar wrote:
> > > > > 
> > > > > * Jack Steiner <steiner@sgi.com> wrote:
> > > > > 
> > > > >> This fixes a problem seen on UV systems handling NMIs from the node controller.
> > > > >> The original code used the DIE notifier as the hook to get to the UV NMI
> > > > >> handler. This does not work if performance counters are active - the hw_perf
> > > > >> code consumes the NMI and the UV handler is not called.
> > > 
> > > Well that is a bug in the perf code.  We have been dealing with 'perf'
> > > swallowing NMIs for a couple of releases now.  I think we got rid of most
> > > of the cases (p4 and acme's core2 quad are the only cases I know that are
> > > still an issue).
> > > 
> > > I would much prefer to investigate the reason why this is happening
> > > because the perf nmi handler is supposed to check the global interrupt bit
> > > to determine if the perf counters caused the nmi or not otherwise fall
> > > through to other handler like SGI's nmi button in this case.
> > 
> > The patch that I posted is based on a RHEL6.1 patch that I'm running internally.
> > Unless something has very recently changed in the RH sources, the perf
> > NMI handler unconditionally returns NOTIFY_STOP if it handles an NMI.
> > If no NMI was handled, it returns NOTIFY_DONE. This sometimes works
> > and allows the platform generated NMI to be processed but if both NMI
> > sources trigger at about he same time, the lower priority event
> > will be lost.
> 
> Not necessarily, if both are triggered, you should still get _two_ NMIs.
> It may get processed in the wrong order but it should still get correctly
> processed.


Let me do some more testing with the UV NMI priority set higher than the hw_perf
priority. When I tried this earlier, I thought I saw problems but I'm
not certain that it was not caused by a different error.


> 
> > 
> > The root cause of the problem is that architecturally, x86 does not
> > have a way to identifies the source(s) that cause an NMI. If multiple
> > events occur at about the same time, there is no way that I can see that the
> > OS can detect it.
> 
> There are registers we can check to see who owns trigger the NMI (at least
> for the perf code, the SGI code maybe not, which is why I set it to a
> lower priority to be a catch-all).
> 
> I'm not aware of the x86 architecture dropping NMIs, so they should all
> get processed.  It is just a matter of which subsystems get determine if
> they are the source of the NMI or not.
> 
> > 
> > > 
> > > My first impression is the skip nmi logic in the perf handler is probably
> > > accidentally thinking the SGI external nmi is the perf's 'extra' nmi it is
> > > supposed to skip and thus swallows it.  At least that is the impression I
> > 
> > Agree
> > 
> > 
> > > get from the RedHat bugzilla which says SGI is running 'perf top', getting
> > > a hang, then pressing their nmi button to see the stack traces.
> > > 
> > > Jack,
> > > 
> > > I worked through a number of these issues upstream and I already talked to
> > > George and Russ over here at RedHat about working through the issue over
> > > here with them.  They can help me get access to your box to help debug.
> > 
> > Russ is right down the hall.
> 
> Great!
> 
> Cheers,
> Don

  reply	other threads:[~2011-03-21 20:38 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-03-21 16:01 [PATCH] x86, UV: Fix NMI handler for UV platforms Jack Steiner
2011-03-21 16:14 ` Ingo Molnar
2011-03-21 16:26   ` Cyrill Gorcunov
2011-03-21 16:43     ` Cyrill Gorcunov
2011-03-21 17:00       ` Cyrill Gorcunov
2011-03-21 17:08         ` Jack Steiner
2011-03-21 17:19           ` Cyrill Gorcunov
2011-03-21 17:34             ` Jack Steiner
2011-03-21 17:48               ` Cyrill Gorcunov
2011-03-21 17:55                 ` Cyrill Gorcunov
2011-03-21 18:15           ` Cyrill Gorcunov
2011-03-21 18:24             ` Jack Steiner
2011-03-21 17:53       ` Don Zickus
2011-03-21 17:51     ` Don Zickus
2011-03-21 18:00       ` Cyrill Gorcunov
2011-03-21 18:22       ` Jack Steiner
2011-03-21 19:37         ` Don Zickus
2011-03-21 20:37           ` Jack Steiner [this message]
2011-03-22 17:11           ` Jack Steiner
2011-03-22 18:44             ` Don Zickus
2011-03-22 20:02               ` Jack Steiner
2011-03-22 21:25               ` Jack Steiner
2011-03-22 22:02                 ` Cyrill Gorcunov
2011-03-23 13:36                   ` Jack Steiner
2011-03-22 22:05                 ` Don Zickus
2011-03-23 16:32                   ` Jack Steiner
2011-03-23 17:53                     ` Don Zickus
2011-03-23 20:00                       ` Don Zickus
2011-03-23 20:41                         ` Cyrill Gorcunov
2011-03-23 20:45                         ` Cyrill Gorcunov
2011-03-23 21:22                           ` Don Zickus
2011-03-23 20:46                         ` Jack Steiner
2011-03-23 21:23                           ` Don Zickus
2011-03-24 17:09                             ` Jack Steiner
2011-03-24 18:43                               ` Don Zickus
2011-03-21 16:56   ` Jack Steiner
2011-03-21 18:05     ` Ingo Molnar
2011-03-21 19:23       ` [PATCH V2] " Jack Steiner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110321203746.GA17419@sgi.com \
    --to=steiner@sgi.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=dzickus@redhat.com \
    --cc=gorcunov@gmail.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox