linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Don Zickus <dzickus@redhat.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Richter <robert.richter@amd.com>,
	Lin Ming <ming.m.lin@intel.com>, Ingo Molnar <mingo@elte.hu>,
	Cyrill Gorcunov <gorcunov@gmail.com>,
	"fweisbec@gmail.com" <fweisbec@gmail.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"Huang, Ying" <ying.huang@intel.com>
Subject: Re: A question of perf NMI handler
Date: Wed, 4 Aug 2010 10:52:03 -0400	[thread overview]
Message-ID: <20100804145203.GP3353@redhat.com> (raw)
In-Reply-To: <1280931093.1923.1194.camel@laptop>

On Wed, Aug 04, 2010 at 04:11:33PM +0200, Peter Zijlstra wrote:
> On Wed, 2010-08-04 at 10:00 -0400, Don Zickus wrote:
> > On Wed, Aug 04, 2010 at 12:01:16PM +0200, Robert Richter wrote:
> > > There is no general mechanism for recording the NMI source (except if
> > > it was external triggered, e.g. by the southbridge). Also, all nmis
> > > are mapped to NMI vector 2 and therefore there is no way to find out
> > > the reason by using apic mask registers.
> > 
> > This is no different than a shared interrupt, no?  All the nmi handlers
> > need to check their own sources to see if they triggered it.  You can't
> > expect the generic nmi handler to determine this.
> 
> Sure, but the problem is that the PMU can't reliably do that.

Right, but that is because there is no bit that says the PMU generated the
nmi.  But for the most part checking to see if the PMU is >0 is good
enough, no?

> 
> > > Now, if multiple perfctrs trigger an nmi, it may happen that a handler
> > > has nothing to do because the counter was already handled by the
> > > previous one. Thus, it is valid to have unhandled nmis caused by
> > > perfctrs.
> > > 
> > > So, with counters enabled we always have to return stop for *all* nmis
> > > as we cannot detect that it was an perfctr nmi. Otherwise we could
> > > trigger an unhandled nmi. To ensure that all other nmi handlers are
> > > called, the perfctr's nmi handler must have the lowest priority. Then,
> > > the handler will be the last in the chain.
> > 
> > But the cases this break are, external NMI buttons, broken firmware that
> > causes SERRs on the PCI bus, and any other general hardware failures.
> 
> It breaks broken firmware? :-) and you care?

Absolutely.  When a customer complains they upgraded their RHEL kernel and
the box suddenly hangs on boot trying to access the storage device, yes I
care.  Because a flood of NMIs would indiciate something is fishy with
the firmware (in this case it was a network card though it hung on storage
access).  Swallowing the NMIs would just cause everyone to waste weeks of
their time trying to figure it out (you don't want to know how many weeks
were wasted in RHEL-6 across multiple machines only to find out it was
broken firmware on a card that no one suspected as being the culprit).

As much as I hate broken firmware, it is becoming common place, and the
faster the kernel can point it out through unknown nmis, the faster we can
get the vendor involved to fix it.

> 
> > So what the perf handler does is really unacceptable.  The only reason we
> > are noticing this now is because I put the nmi_watchdog on top of the perf
> > subsystem, so it always has a user and will trigger NOTIFY_STOP.  Before,
> > it never had a registerd user so instead returned NOTIFY_DONE and
> > everything worked great.
> 
> Right so I looked up your thing and while that limits the damage in that
> at some point it will let NMIs pass, it will still consume too many.
> Meaning that Yinghai will have to potentially press his NMI button
> several times before it registers.

Ok.  Thanks for reviewing.  How does it consume to many?  I probably don't
understand how perf is being used in the non-simple scenarios.

Cheers,
Don


  reply	other threads:[~2010-08-04 14:52 UTC|newest]

Thread overview: 85+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-08-04  9:21 A question of perf NMI handler Lin Ming
2010-08-04  9:50 ` Peter Zijlstra
2010-08-04 10:01 ` Robert Richter
2010-08-04 10:24   ` Peter Zijlstra
2010-08-04 10:29     ` Robert Richter
2010-08-04 14:00   ` Don Zickus
2010-08-04 14:11     ` Peter Zijlstra
2010-08-04 14:52       ` Don Zickus [this message]
2010-08-04 15:02         ` Peter Zijlstra
2010-08-04 15:18           ` Cyrill Gorcunov
2010-08-04 15:50             ` Don Zickus
2010-08-04 16:10               ` Cyrill Gorcunov
2010-08-04 16:20                 ` Don Zickus
2010-08-04 16:39                   ` Cyrill Gorcunov
2010-08-04 18:48                     ` Robert Richter
2010-08-04 19:22                       ` Andi Kleen
2010-08-04 19:26                       ` Cyrill Gorcunov
2010-08-06  6:52                         ` Robert Richter
2010-08-06 14:21                           ` Don Zickus
2010-08-09 19:48                             ` [PATCH] perf, x86: try to handle unknown nmis with running perfctrs Robert Richter
2010-08-09 20:02                               ` Cyrill Gorcunov
2010-08-10  7:42                                 ` Robert Richter
2010-08-10 16:16                                   ` Cyrill Gorcunov
2010-08-10 16:41                                     ` Robert Richter
2010-08-10 17:24                                       ` Cyrill Gorcunov
2010-08-10 19:05                                         ` Robert Richter
2010-08-10 19:24                                           ` Cyrill Gorcunov
2010-08-12 13:24                                             ` Robert Richter
2010-08-12 14:31                                               ` Cyrill Gorcunov
2010-08-10 20:48                               ` Don Zickus
2010-08-11  2:44                                 ` Frederic Weisbecker
2010-08-11 11:10                                   ` Robert Richter
2010-08-11 12:44                                     ` Don Zickus
2010-08-11 14:03                                       ` Robert Richter
2010-08-11 14:32                                         ` Don Zickus
2010-08-13  4:37                                     ` Frederic Weisbecker
2010-08-13  8:22                                       ` Robert Richter
2010-08-14  1:28                                         ` Frederic Weisbecker
2010-08-14  2:29                                           ` Robert Richter
2010-08-11 12:39                                   ` Don Zickus
2010-08-11  3:19                                 ` Huang Ying
2010-08-11 12:36                                   ` Don Zickus
2010-08-16 14:37                                     ` Peter Zijlstra
2010-08-11 22:00                               ` [PATCH -v2] " Robert Richter
2010-08-12 13:10                                 ` Robert Richter
2010-08-12 18:21                                   ` Don Zickus
2010-08-16  7:37                                     ` Robert Richter
2010-08-12 13:52                                 ` Don Zickus
2010-08-13  4:25                                 ` Frederic Weisbecker
2010-08-16 14:48                                 ` Peter Zijlstra
2010-08-16 16:27                                   ` Cyrill Gorcunov
2010-08-16 17:16                                     ` Robert Richter
2010-08-16 19:06                                       ` Cyrill Gorcunov
2010-08-16 19:13                                         ` Peter Zijlstra
2010-08-16 19:18                                           ` Cyrill Gorcunov
2010-08-16 22:55                                         ` Robert Richter
2010-08-17 15:23                                           ` Cyrill Gorcunov
2010-08-17 15:22                               ` [PATCH -v3] " Robert Richter
2010-08-17 16:17                                 ` Cyrill Gorcunov
2010-08-19 10:45                                 ` Peter Zijlstra
2010-08-19 12:39                                   ` Robert Richter
2010-08-19 14:12                                   ` Don Zickus
2010-08-19 14:27                                     ` Peter Zijlstra
2010-08-19 15:20                                       ` Don Zickus
2010-08-19 17:43                                       ` Cyrill Gorcunov
2010-08-19 17:53                                         ` Peter Zijlstra
2010-08-19 21:58                                       ` Don Zickus
2010-08-20  8:50                                         ` Peter Zijlstra
2010-08-20  1:50                                       ` Don Zickus
2010-08-20  8:16                                         ` Ingo Molnar
2010-08-20 10:04                                           ` Peter Zijlstra
2010-08-20 10:30                                             ` Cyrill Gorcunov
2010-08-20 12:39                                             ` Don Zickus
2010-08-20 13:27                                               ` Ingo Molnar
2010-08-20 13:51                                                 ` Don Zickus
2010-08-20 14:17                                                   ` Ingo Molnar
2010-08-20 20:45                                                     ` Cyrill Gorcunov
2010-08-24 21:48                                                     ` Don Zickus
2010-08-20  8:36                                         ` Robert Richter
2010-08-20 14:17                                       ` [tip:perf/urgent] perf, x86: Fix handle_irq return values tip-bot for Peter Zijlstra
2010-08-20 14:17                                 ` [tip:perf/urgent] perf, x86: Try to handle unknown nmis with an enabled PMU tip-bot for Robert Richter
2010-08-06 15:35                   ` A question of perf NMI handler Andi Kleen
2010-08-04 15:45           ` Don Zickus
2010-08-06 15:37           ` Andi Kleen
2010-08-04 13:54 ` Don Zickus

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100804145203.GP3353@redhat.com \
    --to=dzickus@redhat.com \
    --cc=fweisbec@gmail.com \
    --cc=gorcunov@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ming.m.lin@intel.com \
    --cc=mingo@elte.hu \
    --cc=peterz@infradead.org \
    --cc=robert.richter@amd.com \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).